Imagination’s PowerVR GPU architecture is synonymous with efficiency. Our IP built up its name in mobile, consumer and other embedded devices where battery preservation or silicon area play a significant role in SoC design decisions.
In the desktop market, however, the requirements of a GPU IP for a new graphics card is very different.
Recent generations of Imagination GPUs have increased the number of processing units inside the main GPU cores, and these are complemented by our highly capable multi-core scaling technology. Used together, it is possible for customers in the desktop space to achieve mainstream desktop performance levels with an Imagination GPU.
One question we still get asked is: can a tile-based deferred rendering (TBDR) architecture really operate in the desktop space where “immediate-mode renderers” (IMRs) have ruled for so long?
Well... yes, we can. Because in fact, the two architectural styles aren’t that different. Let’s explore how.
In 3D graphics, each object submitted to the renderer is transformed, rasterised and coloured immediately before the next object is processed. This is the foundation of the name “immediate mode rendering”.
Of course, in a 3D scene, objects further from the camera can be hidden (entirely or partially) by objects in the foreground. By waiting until after the “texture and shade” step to run a depth test, fragments that are already processed may later be “overdrawn” by triangles / fragments that are closer to the camera. This equates to unnecessary work being done by the shaders – and a lot of unnecessary (and power hungry) data movement. All colour and depth data is stored in system memory, and the excessive read / modify / write operations to handle colour blending and depth buffer updates trigger a substantial large memory bandwidth overhead or the requirement for very large L2 and/or L3 caches.
This level of waste or cost was acceptable on devices where silicon and power had fewer limits, but it was not suitable for more constrained environments like smartphones. This is where Imagination’s tile-based deferred rendering approach comes in.
The tiling part of the tile-based deferred rendering takes place early in the pipeline, specifically within the geometry phase. Here the vertex data is processed, and the scene is split into small regions called tiles. These tiles enable the use of on-chip buffering, rather than expensive data trips to and from system memory. Tiling also helps to improve workload distribution as each tile is independent and can be processed in parallel across different cores or shader units. Performance can scale linearly in this way, unlike with classic immediate-mode renderers that operate on a per-triangle basis. The final benefit here is that the data for each tile is so small, processing can stay on chip with only a single write out per tile.
Imagination has three key tiling technologies that reduce memory bandwidth for the lowest possible power consumption:
1. Perfect TilingImagination GPUs bin triangles into tiles with perfect accuracy, ensuring work is only done where needed. Most other vendors use bounding boxes which can more than double the workload due to over fetched data – and GPUs with hierarchical tiling can be even worse.
2. Perfect CullingWe have several patents around early culling covering, for example, small objects and depth based as well as traditional culling areas such as offscreen and back-facing triangles.
3. Geometry CompressionOur GPUs are the only ones with hardware-based geometry compression. This compresses vertex data (positions, normals, texture coordinates, etc.) before they are stored or transmitted. This reduces the size of the vertex buffers which in turn lowers memory bandwidth. The GPU can compress the data on-the-fly during vertex processing, which allows for the efficient use of internal caches and reduces external memory accesses.
Each of these help to make sure that even on a desktop-class machine, the GPU remains energy-efficient and quiet while delivering the required performance for gaming and productivity tasks.
Tiling is supported by the main desktop APIs (OpenGL and DirectX) and game engines. The front end of the tile-based deferred rendering pipeline, before the tiling stage, is no different from classic immediate-mode renderers. And even from there modern, immediate mode renderers have developed their own tiling solutions, with NVIDIA GPUs shipping with tiled caching and AMD’s GPUs offering a “Draw Stream Binning Rasterizer”.
The key difference between Imagination GPUs and AMD/NVIDIA solutions is the immediate-mode renderer’s use of on-chip cache memory rather than system memory for their form of “tiling”. However, this is not a barrier to our desktop customers: our Imagination GPUs can be configured such that tiling and geometry data is stored in on-chip memory (SRAM); this would offer lower latency and reduced external DDR bandwidth. The reason our design doesn’t work this way automatically is that it comes at an area cost not acceptable to our partners in the embedded, cost sensitive market segments.
In essence, tile-based renderers and immediate-mode renderers have converged with immediate-mode renderers becoming more power/processing efficient by adopting tiling mechanisms too, and therefore the software compatibility challenge is non-existent and a historical invalid and misleading claim.
Classic Imagination GPUs for the embedded market focus on area efficiency, as in the embedded market there is normally a limited silicon budget for the GPU and limited budget for the larger cache sizes required on-chip for geometry tiling. This is different to the desktop market where massive caches are common; AMD for example offers up to 128MB of Infinity Cache.
Customers using Imagination GPU IP in the desktop market are able to make the following adjustments to align with the desktop space:
As mentioned earlier, immediate mode renderers transform, rasterise and colour scene objects without first understanding what is and isn’t visible on the screen.
In addition to tiling, Imagination GPUs deploy a deferred rendering approach. This introduces a depth test early in the fragment phase which checks for and removes overdrawn triangles. Only after this takes place does the pipeline apply textures and shading. This technique of only rendering what is needed reduces computational workload and lowers bandwidth and power consumption.
Deferred rendering is completely invisible to software and fully compliant with modern APIs. Nothing is made impossible by using a deferred rendering solution, the only influence it has is on internal GPU operations.
This is because at its simplest form deferred rendering is just out-of-order depth calculations. Early-Z, used by both NVIDIA and AMD is another example of this. Other vendors use similar solutions like Forward Pixel Kill or Fragment Pre-Pass hence out of order depth testing is compatible and very common and hence not a compatibility issue with desktop APIs.
As we’ve seen in this article, the main differences between immediate mode and tile-based deferred rendering GPUs are the timing of the visibility test, the storage location of the colour/depth data and the requirements placed on the L2 cache. In each case, tile-based deferred rendering GPUs were originally designed to prioritise system efficiency and lowering data movement throughout the chip.
However, the differences between the two rendering styles are not as extreme as many would expect. Modern immediate-mode renderers have adopted techniques like tiling and early depth tests into their solutions to improve workload distribution and overall processing efficiency. Furthermore, Imagination’s GPU IP has sufficient flexibility that customers in the desktop market can adjust to more closely align with their requirements.
These architectural similarities make a high-performance , tile-based deferred rendering GPU a compelling choice for modern desktop systems. Whether for gaming, content creation or AI-enhanced applications, Imagination GPUs offers a future-ready alternative to traditional immediate-mode renderers.
For more information on the range of Imagination GPUs available for desktop, visit the Imagination website.