Pipelined Data Masters in D-Series GPUs

16 August 2024
Eleanor Brash

Imagination’s PowerVR GPUs are driven by a firmware processor responsible for high-level scheduling and priority of workloads. It does this in combination with fixed function units: the data masters. To allow concurrent processing of different types of jobs PowerVR GPUs have a data master per job type, including geometry, 3D, compute and 2D (or data movement).

These data masters are responsible for the low-level running of these jobs, including setup work. Previous generations had single-tasking data masters, meaning that the data master would be executing a specific job and changing the job would require work by the firmware processor to set up the next job.

This approach meant that most of the setup work occurred when changing from one render to the next which would often lead to idle time during which the firmware processor was setting up the next job and reprogramming registers. This setup work may have required data access and other complex synchronisation tasks, which due to latency could result in 1000s of cycles of firmware work during which no work was scheduled for the specific data master. This would often lead to idle time or even power-gating of the GPU core and thus lost performance and reduced scaling efficiency.

As PowerVR GPUs became ever faster with more powerful SPUs, higher SPU counts and multicore, the rendering/processing performance of the GPU went up significantly. This meant that the time to process compute kernels and/or graphical renders reduced (as we have a larger, faster GPU core), but, we still had a single firmware processor which meant the setup time stayed the same.

As an example, comparing AXT-16-512 to DXT-72-2304, our processing is theoretically 4.5x faster, but the firmware processing time stayed the same, so without change it represented a larger proportion of the total time. This is illustrated below:

Screenshot 2024-08-15 at 10.36.40

IMG D-Series GPUs address this setup-time issue by introducing pipelining to the data masters. Pipelining means that the firmware can set up (pipeline) the next job while a previous job is still processing within the GPU. Effectively, the firmware work is now overlapped with GPU work instead of running serialised in-between jobs. This approach enables 5% higher performance for the same level of core, as we avoid idle cycles and improve utilisation of the GPU processing hardware, which means a better return on investment.

The concept is illustrated below:

Screenshot 2024-08-15 at 10.37.58

The idle time in previous generations could also be seen using our PVRTune profiling tool:

Screenshot 2024-08-15 at 10.39.07

Further details on changes made to the PowerVR architecture in IMG DXT can be found in the white paper Ray Tracing for the Masses.

Pipelined Data Masters in D-Series GPUs

Share this post

About the Author

Read Next

How fast a GPU do you need for your user interface?

Why Your Heterogeneous Compute System Isn’t Performing… and What to Do About It

The Role of GPU in AI: Tech Impact & Imagination Technologies