Vulkan: High efficiency on mobile

05 November 2015
Imagination Technologies

Welcome to the second article in our series of blogs on the Vulkan API! This time I’ll be describing why Vulkan is so important for mobile and embedded systems, with a particular focus on efficiency.

One of Vulkan’s top aims was to enable a much more efficient API. OpenGL ES required a lot of effort on the part of the driver to get things working, and meant that the CPU could easily be the bottleneck in a graphics application.

GPUs waiting on the CPU?!

One of the most jarringly obvious effects of having a high CPU overhead in a graphics API is an app being bottlenecked by the CPU. It’s fairly well known that various platforms have a maximum draw call throughput*; and it’s not very high, particularly on mobile. We’re talking about programming a GPU, and instead of things like texture fillrate or triangles per second, one of the most common graphics bottlenecks is caused by an unrelated processor!

The first 12 seconds of our Gnome Horde demo show just how bad this can be – we’re pushing the same number of draw calls to Vulkan and OpenGL ES, with each object as an individual draw call (Vulkan is on the left, OpenGL ES on the right):

OpenGL ES is severely limited by the number of draw calls it can generate as it zooms out, and it really shows – it drops to about 6fps at the maximum distance!

Vulkan to the rescue

With Vulkan’s increased efficiency, it can push out a lot more draws per frame – you’re more likely to be bottlenecked by some portion of the GPU, rather than the CPU. Well-written applications will be batching more work per-draw, avoiding the worst effects of this – but its a delicate balancing act, and not every app is well-written!

However, that’s a performance problem, and this post is supposed to be about efficiency! However this illustrates an important point, even when an application doesn’t hit this bottleneck, each draw call is still doing unnecessary work – it certainly isn’t free. The Gnome Horde demo is just an attempt to visualize this in an obvious way, but the problem is pervasive in all OpenGL ES applications.

Before I continue, it’s worth pointing out that everything on a modern operating system (OS) uses graphics APIs in some way. Largely gone are the days of software rendered UIs – most of what you see on a screen has been drawn by a GPU via the use of an API like OpenGL ES or DirectX. This in turn means that the core OS is even affected by these issues.

Thermal budgets on a System-on-Chip

On a desktop PC, the CPUs or GPUs are several meters from you in a sealed metal box – if one of those processors gets too hot, a fan will whir up, dissipating heat to the environment. If a mobile processor gets too hot it burns you – there’s typically only a thin casing between the die and anyone holding the device. It’s thus really important to keep things cool!

The thing that causes a processor to generate heat, is doing work. Every processor operation will generate some amount of heat, and the more operations you do per second, the hotter it gets. Reducing this workload means less heat being generated, so the typical way to keep a chip cool on mobile is to limit the amount of work that can be done in a given period of time. If more can be done with less processing effort – all the better.

It’s also important to note here that mobile devices typically use a System-on-Chip (SoC), meaning each processor is on the same die. The implication of this is that heat generated from one processor is indistinguishable from any other – if the die gets too hot, the whole chip may have to go idle to compensate.

Vulkan reduces the amount of work required to instruct the GPU than previous APIs, and effectively means that the CPU can run cooler than it otherwise would have. Even the ambient home screen of most operating sytems uses the GPU to be drawn on the screen – so unless your phone’s screen is off, a desktop using Vulkan to draw its widgets can avoid significant heat generation.

One nice side effect of this on a SoC platform, is that by running the CPU cooler, it actually allows the GPU to be pushed that much harder, and can result in improved GPU performance over what is achievable with more CPU-intensive APIs.

Power and battery life

Related, but separate from the thermal budget, is the battery life. Just as each processor operation generates heat, it requires electrical current to run. Mobile devices are limited by two things related to power – how much power can be supplied to elements of the chip, and possibly more importantly – battery capacity.

A SoC only has so much space to lay power lines, and as more processors get crammed into ever-shrinking dies, space is definitely at a premium.

The battery is often one of the largest components of any mobile device – second only to the screen and casing. Battery life is a critical element of selling any mobile device, and key to a good application experience. Nobody would play Angry Birds if it ran for 5 seconds and then the battery ran out!

Gnome Horde CPU usage: Vulkan (left) vs OpenGL ES (right)

Vulkan’s increased efficiency can reduce the need for so much current draw on mobile devices. As it becomes better adopted, and more applications and OS components shift to using Vulkan exclusively, this effect may become quite apparent, and the battery life of devices will trend upwards as a result.

How Vulkan achieves better efficiency

Vulkan is not a magic bullet. It’s entirely possible to port an application to Vulkan and have it be slower than using an earlier API! Vulkan simply has the potential for much better efficiency – but it requires applications to really consider their usage of the API, and understand what they’re doing. Porting something that’s been using OpenGL ES for years can be difficult, because the typical usage pattern for this API is different. A modern OpenGL ES driver hides a lot of the costs that would otherwise be incurred (e.g. object patching/caching).

However if you’re willing to spend the time learning your way around a new API, and to really think about what you’re doing – there’s a lot of benefit to be had. Yet again I refer you to the Gnome Horde!

Validation and errors

The first relatively simple thing that Vulkan does to remove overhead is that there is almost no runtime error checking. If an application does something wrong, the driver will not catch it – it may lead up to anything including program termination.

This might seem harsh – but nobody really checks errors outside of development and debug anyway, so when you’re actually using an application, even if it’s behaving well there’s an overhead to validating everything that benefits nobody. Instead, Vulkan plans to make use of tooling and layers to catch application errors before they ever make it to a consumer device.

Hazard tracking and synchronization

OpenGL ES performs a lot of implicit tracking of resource usage and synchronization for an application, with concessions for explicit management only made in more recent versions via fence sync objects, queries and memory barriers. Even with these explicit operations – a lot still goes on in the background, and it’s really very expensive for a driver to manage it. As OpenGL ES has to manage this generically for all applications, it’s often overly conservative (costing more) or requires complex heuristics to determine a better path.

Vulkan leaves it entirely up to the application developer to decide how things are executed, in what order they execute, and how resources are managed. Applications tend to know better how they plan to use resources and synchronize work, which gives the opportunity for a much lower overhead than a general-purpose driver can.

Pipeline State Objects

Changing any piece of state can be a costly operation, made worse by the fact that many pieces of state may require modifying shader code. For instance, it’s no secret that since PowerVR GPUs support programmable blending, we have to patch blend state into the fragment shader. Other GPU vendors face similar problems with other bits of what is otherwise apparently fixed-function state. Even state that doesn’t affect the shader directly may need to be compiled or translated in other ways.

Vulkan notes that an application should be able to provide most of this information well in advance of draw time, baking it into Pipeline State Objects (PSOs) far outside of the main render loop. PSOs are responsible for possibly the most dramatic saving of CPU work during draw command generation, as they handle all the validation, compilation and translation of this API state to GPU code.

Command Buffer reuse

Vulkan’s command submission model requires that draw calls are first generated into command buffers, and then submitted. This model is in contrast to something like OpenGL ES where a draw command is effectively executed immediately.

Command Buffers are dedicated objects in Vulkan, and are not discarded once they are submitted – allowing them to be re-used multiple times. Command generation remains the largest expense in Vulkan, so the best way to reduce this cost is in fact to avoid doing it all together, by re-using already created command buffers.

Resources referenced by a command buffer are allowed to be modified when not in use by the GPU, allowing a large amount of dynamism in the scene. As a fairly basic example, our Library demo uses just two command buffers for the entire Library scene, which are never regenerated. Scene to scene, the camera transformation and fadeout values are simply updated via data stored in Uniform Buffers that are referenced by the command buffers. The only reason to have two command buffers is so that one can be rendered by the GPU, whilst the other’s Uniform Buffers are being updated by the CPU.

As applications become more and more data driven, and more work can be specified on the GPU, almost all command buffers in an application will be potentially reusable a great number of times.

Multithreading

Multiple cores running at a lower frequency or with a smaller workload will generally run cooler and consume less power than doing all your work on a single core and maxing it out. By enabling much better multithreading, Vulkan enables applications to spread their workload wider and take advantage of this – again resulting in a cooler chip and consuming less power. I’ll talk more about how multithreading works in my next post; “Vulkan: Scaling to multiple threads”, but it’s worth mentioning here as it does have an impact on efficiency.

GPU efficiency

This post only really talks about CPU efficiency, but Vulkan does in fact provide some GPU efficiency wins as well. The final post in this series; ‘Vulkan – Architecture positive: how the Vulkan API works for PowerVR GPUs’ will go into detail about various aspects of the API that give us much better performance and efficiency than previous generation APIs.

Conclusions

Vulkan is a highly efficient API, and should enable applications to draw far less power and generate less heat than with previous graphics APIs. Almost every application on a mobile device uses graphics in some way, even just to draw the UI on the screen, including the operating system itself in many cases – so this is potentially a huge win for mobile devices.

Since this does have the potential for such a broad impact, consumers can expect battery life to go up, or applications to do more with the increased headroom they now have. In the long term, this might help shift the balance of power sufficiently to subtly influence hardware design, including making things like IoT devices and wearables even more practical.

The only downside is that on those cold winter nights, your phone will no longer be quite such an effective hand warmer!

Remember to also follow us on Twitter (@ImaginationTech) for the latest news and announcements from the PowerVR team.

*There does exist a limit on the GPU as well for “number of draw calls” – which is actually to do with the number of state changes, but typically with OpenGL ES you’ll find the limit is on the CPU, well before you hit these limits on the GPU.