Does Tile-Based Deferred Rendering have a place in Desktop?

18 November 2025
Kristof Beets

Imagination’s PowerVR GPU architecture is synonymous with efficiency. Our IP built up its name in mobile, consumer and other embedded devices where battery preservation or silicon area play a significant role in SoC design decisions.

In the desktop market, however, the requirements of a GPU IP for a new graphics card is very different.

High performance: to compete successfully, mainstream cards need to reach 20 TFLOPS and 300GPixel/s; premium PC gaming cards need to reach even higher levels of performance.
Advanced features: AI-enhanced features, like super resolution, are becoming commonplace and GPUs are a key driver behind the GenAI revolution.
Software support: the GPU IP needs to run Windows games (via hardware-based DirectX support).
Power efficiency: even in desktop, efficiency is important; consumers want silent machines with minimal cooling requirements.

Recent generations of Imagination GPUs have increased the number of processing units inside the main GPU cores, and these are complemented by our highly capable multi-core scaling technology. Used together, it is possible for customers in the desktop space to achieve mainstream desktop performance levels with an Imagination GPU.

One question we still get asked is: can a tile-based deferred rendering (TBDR) architecture really operate in the desktop space where “immediate-mode renderers” (IMRs) have ruled for so long?

Well... yes, we can. Because in fact, the two architectural styles aren’t that different. Let’s explore how.

Back to basics: a simplified data flow for conventional 3D rendering

a simplified data flow for conventional 3D rendering Diagram

In 3D graphics, each object submitted to the renderer is transformed, rasterised and coloured immediately before the next object is processed. This is the foundation of the name “immediate mode rendering”.

Of course, in a 3D scene, objects further from the camera can be hidden (entirely or partially) by objects in the foreground. By waiting until after the “texture and shade” step to run a depth test, fragments that are already processed may later be “overdrawn” by triangles / fragments that are closer to the camera. This equates to unnecessary work being done by the shaders – and a lot of unnecessary (and power hungry) data movement. All colour and depth data is stored in system memory, and the excessive read / modify / write operations to handle colour blending and depth buffer updates trigger a substantial large memory bandwidth overhead or the requirement for very large L2 and/or L3 caches.

This level of waste or cost was acceptable on devices where silicon and power had fewer limits, but it was not suitable for more constrained environments like smartphones. This is where Imagination’s tile-based deferred rendering approach comes in.

Section 1: Tile-Based Rendering on Desktop

Understanding tiling with Imagination

Section 1 Tile-Based Rendering on Desktop

The tiling part of the tile-based deferred rendering takes place early in the pipeline, specifically within the geometry phase. Here the vertex data is processed, and the scene is split into small regions called tiles. These tiles enable the use of on-chip buffering, rather than expensive data trips to and from system memory. Tiling also helps to improve workload distribution as each tile is independent and can be processed in parallel across different cores or shader units. Performance can scale linearly in this way, unlike with classic immediate-mode renderers that operate on a per-triangle basis. The final benefit here is that the data for each tile is so small, processing can stay on chip with only a single write out per tile.

Tiling Optimisations Just for Imagination GPUs

Imagination has three key tiling technologies that reduce memory bandwidth for the lowest possible power consumption:

1. Perfect Tiling

Imagination GPUs bin triangles into tiles with perfect accuracy, ensuring work is only done where needed. Most other vendors use bounding boxes which can more than double the workload due to over fetched data – and GPUs with hierarchical tiling can be even worse.

2. Perfect Culling

We have several patents around early culling covering, for example, small objects and depth based as well as traditional culling areas such as offscreen and back-facing triangles.

3. Geometry Compression

Our GPUs are the only ones with hardware-based geometry compression. This compresses vertex data (positions, normals, texture coordinates, etc.) before they are stored or transmitted. This reduces the size of the vertex buffers which in turn lowers memory bandwidth. The GPU can compress the data on-the-fly during vertex processing, which allows for the efficient use of internal caches and reduces external memory accesses.

Each of these help to make sure that even on a desktop-class machine, the GPU remains energy-efficient and quiet while delivering the required performance for gaming and productivity tasks.

So tiling is efficient, but how compatible is it with desktop software?

Tiling is supported by the main desktop APIs (OpenGL and DirectX) and game engines. The front end of the tile-based deferred rendering pipeline, before the tiling stage, is no different from classic immediate-mode renderers. And even from there modern, immediate mode renderers have developed their own tiling solutions, with NVIDIA GPUs shipping with tiled caching and AMD’s GPUs offering a “Draw Stream Binning Rasterizer”.

The key difference between Imagination GPUs and AMD/NVIDIA solutions is the immediate-mode renderer’s use of on-chip cache memory rather than system memory for their form of “tiling”. However, this is not a barrier to our desktop customers: our Imagination GPUs can be configured such that tiling and geometry data is stored in on-chip memory (SRAM); this would offer lower latency and reduced external DDR bandwidth. The reason our design doesn’t work this way automatically is that it comes at an area cost not acceptable to our partners in the embedded, cost sensitive market segments.

In essence, tile-based renderers and immediate-mode renderers have converged with immediate-mode renderers becoming more power/processing efficient by adopting tiling mechanisms too, and therefore the software compatibility challenge is non-existent and a historical invalid and misleading claim.

Optimising Imagination GPUs for desktop

Classic Imagination GPUs for the embedded market focus on area efficiency, as in the embedded market there is normally a limited silicon budget for the GPU and limited budget for the larger cache sizes required on-chip for geometry tiling. This is different to the desktop market where massive caches are common; AMD for example offers up to 128MB of Infinity Cache.

Customers using Imagination GPU IP in the desktop market are able to make the following adjustments to align with the desktop space:

Allowing parameter / tile buffers to be mapped to any memory region (not just the system memory).
Limiting buffers to a specific constrained size.
Enabling “Smart Parameter Management” (SPM) which allows our hardware to flush partial tile renders thus freeing up parameter memory on chip at the trade-off of hidden surface removal efficiency e.g. the flushed workload may later be hidden by another object.
Spilling to system memory, if desired.

Section 2: Deferred Rendering on Desktop

Understanding deferred rendering with Imagination

Section 2 Deferred Rendering on Desktop Diagram

As mentioned earlier, immediate mode renderers transform, rasterise and colour scene objects without first understanding what is and isn’t visible on the screen.

In addition to tiling, Imagination GPUs deploy a deferred rendering approach. This introduces a depth test early in the fragment phase which checks for and removes overdrawn triangles. Only after this takes place does the pipeline apply textures and shading. This technique of only rendering what is needed reduces computational workload and lowers bandwidth and power consumption.

How does it work?

Each tile is fetched, and the transformed geometry is rasterised, which only requires positional data.
The Hidden Surface Removal (HSR) stage uses an on-chip buffer to determine visible fragments.
The Fragment Processing stage is responsible for fetching attribute and texture data.
Pixel processing runs the pixel shader code to apply shading techniques such as per-pixel lighting, all blending is done using on-chip tile memory avoiding off-chip read/modify/write.
The final 3D frame is rendered tile-by-tile by writing out the on-chip buffer to memory.

How compatible is deferred rendering with desktop software?

Deferred rendering is completely invisible to software and fully compliant with modern APIs. Nothing is made impossible by using a deferred rendering solution, the only influence it has is on internal GPU operations.

This is because at its simplest form deferred rendering is just out-of-order depth calculations. Early-Z, used by both NVIDIA and AMD is another example of this. Other vendors use similar solutions like Forward Pixel Kill or Fragment Pre-Pass hence out of order depth testing is compatible and very common and hence not a compatibility issue with desktop APIs.

Final Thoughts: Efficiency Meets Performance

As we’ve seen in this article, the main differences between immediate mode and tile-based deferred rendering GPUs are the timing of the visibility test, the storage location of the colour/depth data and the requirements placed on the L2 cache. In each case, tile-based deferred rendering GPUs were originally designed to prioritise system efficiency and lowering data movement throughout the chip.

However, the differences between the two rendering styles are not as extreme as many would expect. Modern immediate-mode renderers have adopted techniques like tiling and early depth tests into their solutions to improve workload distribution and overall processing efficiency. Furthermore, Imagination’s GPU IP has sufficient flexibility that customers in the desktop market can adjust to more closely align with their requirements.

These architectural similarities make a high-performance , tile-based deferred rendering GPU a compelling choice for modern desktop systems. Whether for gaming, content creation or AI-enhanced applications, Imagination GPUs offers a future-ready alternative to traditional immediate-mode renderers.

For more information on the range of Imagination GPUs available for desktop, visit the Imagination website.