It can be tough trying to figure out the best way to optimise your applications during development. Luckily, we’re here to help – this is the first of an ongoing series of blog posts highlighting useful tips and tricks from our brand-new documentation website. Today, we’re looking at effectively sorting objects and geometry on PowerVR hardware from our revised and updated PowerVR Performance Recommendations. This document is packed with useful information that will help you squeeze the most out of graphics applications running on PowerVR hardware.
As a rule, for opaque objects on PowerVR hardware, it is best to not sort draw calls based on the depth of the object in the scene. This is because, with deferred rendering systems, like PowerVR’s TBDR (tile-based deferred rendering) architecture, all the draw calls which affect a particular frame buffer are captured upfront, before beginning the render. This means the GPU will have a complete picture of all the geometry in the scene and can decide which bits can be discarded. This is different from immediate mode rendering (IMR) where all objects are shaded and textured regardless of visibility.
PowerVR has dedicated hardware for Hidden Surface Removal (HSR) which tests whether particular fragments are occluded based on their depth, and then only stores the visible ones. This means HSR cuts down on the amount of overdraw in a scene, helping to improve your application’s performance. It’s clear then that there’s not really any point trying to sort opaque geometry because the PowerVR GPU does all of this work for you. It just ends up being a waste of valuable resources!
HSR also avoids having to implement techniques like depth pre-pass, where an additional pass is used to fill a depth buffer with the depth value of the nearest opaque object at each pixel. On the next pass, these depth values are used to determine which fragments are visible and then remove the occluded ones, which is exactly what HSR does anyway, at the cost of extra clock cycles and memory bandwidth. To re-iterate: let HSR do its job!
PowerVR Hidden Surface Removal means you don’t have to do a depth pre-pass, but sorting is still necessary for alpha-blended objects.
The story for sorting objects and geometry in PowerVR is slightly different for objects with transparency. Fragments which are occluded by partially transparent objects can still impact the final rendered image, as the colours of the foreground and background fragments need to be blended to produce the final fragment colour. This means that these partially-occluded fragments can’t simply be discarded by the HSR hardware, eliminating one of the key benefits of deferred rendering and likely resulting in some amount of overdraw. This is why our general advice is to keep the number of alpha-blended objects in a scene to a minimum.
If a scene must contain opaque, alpha-blended, and alpha-tested objects, then the application should sort and then render these objects in the order shown below:
This ensures all objects will appear as expected.
While sorting objects and geometry based on depth usually isn’t a good idea with PowerVR hardware, it is really important to try to sort objects with render state in mind. This means grouping based on shared resources such as materials or shaders. This helps to reduce CPU workload by minimising the number of state changes which need to be made by the driver during rendering.
Finally, when using OpenGL ES™ you can get significant performance benefits by sorting draw calls based on the frame buffer object (FBO). FBOs can contain several attachments in the form of textures which can be rendered into. We recommend submitting all the draw calls associated with the textures of a particular FBO, before moving on to the next one. This means that the FBOs are rendered in series, eliminating the extra bandwidth usage caused by storing and retrieving partially rendered FBOs from system memory.
Sometimes sorting the render order isn’t quite enough . In cases where the application is rendering a large number of vertices, such as automotive displays (millions of triangles!), performance can really suffer if vertex data is stored sub-optimally. You need to ensure the vertex buffers are storing the data in a cache efficient way. This basically means trying to reorder the vertices to improve spatial locality, so the application doesn’t have to bounce around memory quite so much.
Luckily for you, sorting vertex data is really simple using PVRGeoPOD. This tool can be set to automatically sort mesh data when exporting a model to the POD file format. If you want to work with individual meshes the PVRGeoPOD GUI is simple, easy to use, and can be integrated with popular 3D graphics software, such as Autodesk 3ds Max and Blender. Or, if you want to churn through many meshes, there is also a command-line application which can be used for batch processing.
For more PowerVR performance recommendations and other information, please visit our website at docs.imgtec.com. It is regularly updated with new documents and features including Getting Started tutorials for Vulkan and OpenGL ES, optimisations and recommendations for PowerVR hardware, and guides to new graphics techniques.
Please do feel free to leave feedback through our forum.