Building Custom Graphics Cards for Cloud Gaming

26 November 2025
Eleanor Brash

The global cloud gaming market will reach over $20Bn by 2030, with Asia Pacific representing 45% of the opportunity according to Grandview Research. However, incumbent GPU solutions were designed for data centre compute, not the unique economics of cloud gaming, where profitability depends on maximising concurrent users per GPU while maintaining a premium user experience.

For companies developing cloud gaming hardware, choosing a GPU IP can mean the difference between market success and costly delays. Imagination’s E-Series offers a proven, scalable architecture that helps teams build differentiated products faster, with lower risk and better ROI.

Any new hardware needs to offer the right range of performance, features and price to successfully enter and establish themselves in a competitive market. Essential attributes include the ability to scale and sustain performance, the ability to support multiple users in a manner appropriate to the service, the ability to support the target applications – and the ability to do all of this in a small area (to limit silicon costs) and at low power (to lower operating costs).

Imagination is the company behind the renowned PowerVR GPU architecture. While Imagination’s IP originated in the mobile, consumer and automotive space, it is increasingly popular among companies building custom graphics cards for personal and cloud computing; products with Imagination GPUs inside include, for example, Innosilicon’s Fantasy graphics cards.

Earlier this year Imagination introduced its E-Series core: a power-efficient GPU IP solution that comes with 32 TOPS INT8 of on-GPU AI acceleration and 16 virtual environments per core. This blog outlines how designers of graphics cards for the cloud gaming market could differentiate themselves with E-Series.

Sustained FPS For Cloud Gaming

Imagination’s GPUs for the desktop market scale up to 72GPixel/s per core, and this performance can be increased with our multi-core technology (which we’ll cover more later on). Performance is sustained by architectural fundamentals which maximise GPU utilisation and minimise thermal throttling:

Efficient SIMT execution: executing the same instruction across multiple threads improves throughput, reduces control overhead and, as threads share instruction fetch and decode stages, it improves resource utilisation. Imagination’s latest GPUs feature 128 parallel threads per shading cluster.
Advanced scheduling: fine-grained scheduling keeps the shader cores busy and avoids stalls, maintaining high utilisation even under varying game workloads.
Focus on local memory: storing intermediate results in fast local memory delivers better performance for complex effects by avoiding costly round trips to external DRAM.
Tile-based deferred rendering: the process of breaking frames into small tiles and processing them on-chip minimises external memory bandwidth usage, while reducing overdraw delivers higher efficiency for scenes with lots of geometry and effects. nFor more information on the suitability of tile-based deferred rendering architectures to the desktop and data centre market, read this blog.
Advanced compression technologies: reducing the volume of data moving around the system massively lowers power consumption and encourages higher sustained frame rates.

These features work together to deliver consistent, jitter-free experiences to cloud gaming customers. They are complemented by the new, efficient E-Series Burst Processors solution which, in addition to reducing power consumption, helps to keep the graphics pipelines fed.

Burst Processors changes the way the ALUs handle tasks. Rather than changing tasks every cycle, operations are submitted to the ALU in indivisible units which prevents tasks from being interrupted and reduces the number of unproductive reads / writes to register storage. The solution also allows intermediary data to be stored inside the ALUs, further limiting the volume of reads / writes to register storage. By reducing the overall demand on register storage, the GPU can actually increase its utilisation and deliver higher performance for longer periods – ideal for sustained performance for cloud games.

Extra Performance for Compute-Heavy Effects

Games, whether they’re AAA or casual, all increasingly include compute intensive details. Popular effects like blur and depth of field require a GPU to perform complex data sampling and mathematical operations. To handle these types of effects efficiently, GPUs initially evolved to feature general purpose compute shaders, but E-Series takes this idea one step further.

E-Series' on-GPU AI acceleration runs lower precision operations (like FP16 or INT8) 4x faster than D-Series equivalents. At 1GHz, a single core E-Series GPU can deliver up to:

2 FP32 TFLOPS for traditional shader workloads
16 FP16 TFLOPS for AI-accelerated rendering
32 TOPS INT8 for AI workloads and rendering

Neural Core E-Series AI

E-Series on-GPU AI acceleration is located inside the unified shading clusters (USC) to maximise performance and lower bandwidth consumption.

At its simplest, E-Series GPUs layer matmul acceleration within the Arithmetic Logic Units (ALUs). Keeping this acceleration inside the unified shading clusters (USCs) – rather than adding it as a different unit far away from the shading clusters - reduces data movement, which in turn improves performance while lowering bandwidth and power consumption. It also means that developers can easily tap into its extra performance via popular GPU APIs and industry-standard extensions.

The result of this is that graphics cards based on E-Series GPUs can efficiently leverage AI to accelerate parts of the rendering pipeline. Cloud gaming companies can leverage popular solutions, like super resolution, to produce high resolution frames faster and more efficiently, which in turn means that a single GPU can host more gamers.

Seamless Scaling

Our GPUs can be clocked higher for greater performance; also – and importantly for the designers of graphics cards for cloud gaming - they can be built up into a multi-core solution.

Imagination’s novel approach to multi-core instantiates a flexible number of GPU cores without a direct dependency on a connection to a central unit. This is different to the traditional approach to GPU scalability, which is limited by the fact that all shader cores need to be connected to this single centralised block with centralised memory data path, job manager and geometry tiling engines. This traditional approach hits issues with congestion and layout flexibility.

Our multi-core scaling process is de-centralised and loosely coupled. It provides chip designers with layout and design flexibility, while maximising bandwidth efficiency. Careful design means that graphics workloads scale successfully across different cores with minimal bottlenecks. For example, E-Series tiling accelerators on different cores can handle non-dependent geometry workloads at the same time, allowing the GPU to render complex AAA games quickly.

For more information on how Imagination GPUs scale efficiently, read this blog.

For cloud gaming graphics cards, this firstly means that E-Series GPUs can be scaled to reach the performance levels that service providers expect. Imagination’s multi-core solution also offers an added benefit: as each core is its own GPU, they can be dynamically reconfigured to work together (for maximum performance for a single user) or separately (for maximum multi-tenancy flexibility).

Imagination multi-core GPU grid in primary-secondary mode

A multi-core grid operating together in primary-secondary mode for maximum single user performance.

A multi-core grid operating in primary-primary mode for maximum flexibility.

Maximise Revenue with Flexible Multi-tenancy

Cloud service providers offer different pricing tiers to maximise their revenue generation. The result of this is that different customers need access to different levels of graphics performance, and a cloud gaming GPU needs a huge amount of workload allocation flexibility to support this.

To this end, our IP features an intelligent firmware processor that handles GPU events directly. It manages interactions with other GPU cores and third-party processors, prioritises rendering tasks and handles errors and debug. This is different to other GPU IP vendors, who rely on the CPU and driver stack for workload allocation and don’t have the same level of flexibility.

This smart firmware processor means that E-Series GPUs can support a variety of cloud gaming scenarios, from hosting multiple containers on a single GPU core to merging cores together and dynamically co-ordinating workloads across those cores for maximum, single user experiences.

Virtualisation is another valuable multi-tasking feature for a cloud environment. It can be used to guarantee security and privacy to premium users, or to support a remote desktop use case. For the service provider, hosting multiple virtual machines (VMs) on one GPU can also help isolate faults and prevent a crash on one VM affecting other gamers.

E-Series GPUs offer an advanced hardware-based virtualisation solution (HyperLane) that runs up to sixteen different operating systems per core with full memory isolation, freedom from interference, quality of service and workload prioritisation. Because HyperLane is hardware-based, it delivers higher performance and lower software complexity than other, software-based virtualisation solutions.

Find out more about Imagination's unique virtualisation technology in this white paper.

ROI-Focused Design

Imagination’s products scale in performance up to cloud levels, but our GPU architecture retains its well-known area and power efficiency even at the higher performance levels. Our focus on efficiency, and doing more with less, makes a big difference in the cloud market. An area-conscious design lowers the cost of developing custom silicon and improves the competitiveness of new solutions. A power-conscious product helps cloud service providers keep their energy bills under control.

E-Series has taken Imagination’s efficiency to another level. It offers over 3x the compute performance density of D-Series and up to 35% better power efficiency through the combination of Imagination’s Neural Cores and Burst Processors.

Established Software Ecosystem

Cloud gaming platforms can count on Imagination GPUs to deliver the software support needed to power today’s most demanding titles. With casual, cloud-driven Android games dominating the market, Imagination’s deep roots in the Android ecosystem — built on years of leadership in mobile devices — make it a trusted choice. Our GPUs fully support the Khronos Group’s leading mobile graphics APIs, Vulkan™ and OpenGL® ES, ensuring smooth, scalable gaming experiences from device to cloud.

When it comes to PC games, our latest generations have added hardware-based support for DirectX®, which E-Series extends to cover DirectX 12, along with reference drivers for high-performance PC gaming. OpenGL, a popular API for older games, is also supported up to v.4.6 via the ZINK framework.

Conclusion

As cloud gaming continues to grow, especially in markets like China, hardware designers need GPU IP that delivers high performance, scalability, and efficiency without compromise. Imagination’s E-Series GPU architecture offers a compelling solution—combining compute-rich graphics rendering, AI acceleration, multi-tenancy, and power-aware design.

Whether you're building for mass-market casual gaming or premium AAA experiences, E-Series provides the flexibility and capability to differentiate your product and accelerate your time-to-market. With proven deployment across industries and support for leading APIs, E-Series is the smart choice for next-generation cloud gaming hardware.

To find out more about E-Series, read this preview paper or reach out to the Imagination team to organise an evaluation.