Imagination Blog

Reducing memory bandwidth with PVRIC - Imagination

Written by Robin Britton | Jul 2, 2018 11:00:03 AM

Thanks to increases in screen resolutions and ever more complex render pipelines, games and other apps are becoming more and more bandwidth intensive, requiring larger amounts of data to be copied to-and-from memory. Devices are now commonly expected to drive 2K screen resolutions without a hitch; high-end games can have huge geometric complexity, and rendering pipelines typically involve multiple intermediate render targets before a final image is presented on screen. Even casual games might have some full-screen post-processing effects requiring framebuffer data to be sent back and forth to off-chip memory. Memory accesses are power-hungry operations, and more bandwidth consumption means more power consumption, which is especially problematic in the embedded space where power budgets are tight.

PowerVR GPUs address this problem with PVR3C Triple Compression, a ‘triple-threat solution’ if you will, consisting of texture compression (including PVRTC and ASTC), geometry compression (PVRGC), and the subject of this article – image compression (PVRIC).

PVR3C compression technologies for PowerVR GPUs

PowerVR Image Compression (PVRIC)

As previously mentioned, one of the largest strains on memory bandwidth in modern real-time graphics applications are the numerous intermediate render-targets that can be required to produce a high-quality final image. An obvious example is the creation of cube maps, typically used for reflections. This involves rendering a scene in six directions from a fixed point, the result of which forms a cube map texture, which is then sampled in a subsequent pass to approximate reflections on objects in the scene. Other examples include resolution scaling, rendering mini-maps or other scene viewpoints, rendering planar reflection maps, not to mention endless screen-space and post-processing effects such as separable blurs, SSAO, depth of field, tone-mapping, and so on.

The six faces of a cube map used to approximate reflections on a car. The sky can be seen reflected here in the car window.

 

PVRIC mitigates the cost of this onslaught on memory bandwidth with a highly efficient, lossless compression scheme that typically results in a 50% reduction in image size (depending on a number of factors). The aforementioned render targets are compressed before being written out of the GPU, and then decompressed when read back in from memory, and because the compression is lossless, a pixel-perfect copy of the original image is reconstructed from the decompressed data, meaning no reduction in image quality.

This scheme covers interactions with the GPU, but as a further benefit it’s also possible for PVRIC to be integrated into the SoC-level display pipeline, allowing for the final rendered image to be compressed before being written to memory, and then decompressed in the display controller. This gives even greater overall bandwidth savings.

In many mobile games and apps, textures will be compressed using well-known formats, such as some flavour of ETC, or our own PVRTC, but developers may choose to leave some textures uncompressed (for example font or UI textures, which need to remain crisp and artifact free when scaled). An additional benefit of the PVRIC scheme is that it enables the same lossless compression to be applied to any uncompressed textures as they’re uploaded to the GPU (if the texture is using one of the many supported formats). Depending on the app, this can lead to even more significant bandwidth reductions.

App Bandwidth Analysis

We ran API traces of a number of well-known apps on a Synaptics BG5CT board, featuring our PowerVR Series8XE GE8310 GPU, which includes the latest iteration of PVRIC technology – PVRIC3.  We’re looking at the performance benefits of the compression scheme, specifically bandwidth reduction. The side-by-side video below shows some of these app traces running, with bandwidth graphs overlaid:

 

 

The table below shows the results gathered from these traces. Note that on this particular device, PVRIC is not integrated into the SoC’s display pipeline. If it were, we would see even greater savings from compressing the final frame buffer image, as described previously. However, because we’re running Android, framebuffer compression is applied to the final renders from these workloads, as they are written to memory and read back to the GPU for Android’s SurfaceFlinger. The extra SoC display pipeline benefit would then come from the SurfaceFlinger’s final composition if it were present.


Now, what do these results mean exactly? The numbers can be a bit misleading. We can see that there was an overall bandwidth reduction in all cases, but some are much greater than others. As discussed, PVRIC covers things like renders-to-texture and uncompressed textures, but that’s only a portion of the total bandwidth. Geometry, shader uniforms and so on also contribute to the total bandwidth cost but are unaffected by PVRIC, so the total amount of the bandwidth cost we can make a difference to is reduced. So while the system-wide figures are interesting at a high level, we want to isolate the bandwidth numbers affected by PVRIC. Using our internal profiling tools enables us to do this more accurately (although we can’t eliminate everything – compressed textures for example).

Here we are isolating just the bandwidth used during the renderer tasks being run on the GPU, and ignoring the geometry handling of the tiler tasks. To make this easier, overlapping of these tasks has been disabled on the test platform. We’re also able to identify and ignore tasks from other processes. Now we need to analyse why some apps performed the way they did. This requires a bit more knowledge about how the apps work at the graphics API level – enter PVRTrace.

Minecraft

Examining Minecraft with PVRTrace, we found that the overall bandwidth saving was low (2.42%), but isolating the image/texture bandwidth showed a bigger saving (17.76%). Minecraft uses fully uncompressed textures, so PVRIC comes into play, although there are very few of them compared to the amount of geometry in a typical scene (~134K triangles here). Not to mention that many of the textures are very small. So PVRIC is doing what it can, but with Minecraft, there isn’t much to work with. Remember, with display pipeline integration we would see an additional benefit on the final framebuffer being sent to the display controller.

Angry Birds 2

With Angry Birds 2, the ratio of geometry (16K triangles) to texture content in the bandwidth budget is much more balanced, so we see the benefits of PVRIC more clearly, with a system-wide saving of 43%, and an isolated saving of 56%. The game uses a mixture of compressed and uncompressed textures of various formats, so a reasonable win for PVRIC there. Additionally, the entire scene is rendered to an intermediate render target, before a final blit to the default framebuffer (a technique used by numerous games, for resolution scaling or applying post-processes). These two elements combined are a great showcase for PVRIC.

Real Racing 3

Real Racing 3 is another high-geometry workload (160K triangles in our test frame), although textures/surfaces represent a reasonable chunk of the considerable bandwidth budget as well. Texture compression (PVRTC on PowerVR) is used extensively by this app, which is great, but there are still a number of uncompressed textures for PVRIC to sink its teeth into. There’s also the matter of a set of six 512×512 render-to-textures for a cube map, and another full-screen render to texture before the final pass of overlaying some UI elements. The final percentage may not seem significant, but this is actually a great showcase of the combined benefits of PVRIC and PVRTC working well together – PVRTC is used for the bulk of the textures, and PVRIC is doing its part with what’s left.
As seen with these workloads, whether or not the volume of image data makes up the bulk of an app’s bandwidth budget, applying PVRIC has a significant impact, greatly reducing image bandwidth.

Power Analysis

To see more clearly how PVRIC and reduced bandwidth impacts power consumption, we modified the platform to allow for power analysis and hooked up a data acquisition device to gather some power usage data. The graphs below show our findings.

Over the same five-second clip of Angry Birds 2, we consistently saw lower power usage by the device’s memory when PVRIC was enabled. Taking the delta between the two data sets and smoothing it a little, we see up to an 18% reduction in power drawn by memory. This is a significant saving, which on battery-powered devices such as mobile phones could extend the time before another charge is needed. Of course, the precise effect on power and battery life depends on many factors and will vary per device.

Conclusion

To recap, PVRIC is a key component of the PVR3C compression strategy, capable of compression ratios up to – and sometimes exceeding – 2:1 for applicable bandwidth. This can have a material effect on the power drawn by device memory. Covering the other bases with our texture and geometry compression schemes, it’s clear that PVR3C is a great all-round bandwidth-efficiency strategy, leading to better overall system efficiency and enabling us to provide a fully comprehensive low-power solution.