Trying out the new Vulkan graphics API on PowerVR GPUs

03 March 2015
Imagination Technologies

Vulkan™ is a next-generation, high-performance graphics and compute API developed by the Khronos Group. Previously known as glNext, Vulkan has been designed to address some of the shortcomings of the original OpenGL® API which was introduced 22 years ago.

Here is a summary of Vulkan extracted from the official press release:

Ground-up redesign of the API: enables high-efficiency access to graphics and compute on modern GPUs
Explicit: the application has direct, predictable control over the operation of the GPU.

Introducing one of the first demos using the Vulkan API

Imagination is a promoting member of the Khronos Group and has been working on developing a proof-of-concept driver for Vulkan for our PowerVR Rogue GPUs. Our PowerVR demo team has also spent the last two months porting one of our new OpenGL ES 3.0 demos to the new API and today we are able to show you a snapshot of our work.

The Library demo was originally created using the OpenGL ES 3.0 API and we worked on porting it to the Vulkan API at the same time as the API was being designed. We needed to remove some of the effects compared to the OpenGL ES 3.0 version because of time constraints but the demo still maintains a lot of features implemented in the original app. Here is a summary of what you can see in the video below:

High-quality, physically-based shading
HDR (High dynamic range) rendering
20 unique 2K PVRTC textures
2 GiB of texture data compressed to 266 MiB using Imagination’s PVRTC texture compression standard
4 x MSAA (Multi-sample anti-aliasing)
16 x Anisotropic texture filtering
Physically-correct material parameters
Low CPU usage, very efficient GPU usage
Correct specular reflections on reflective materials
More than 250,000 triangles
Post processing effects: saturation, exposure and tone mapping

Please note that this is an alpha driver and performance is not representative of the final product.

Less CPU work

The new Vulkan interface is designed to be as close to the architecture of modern GPUs as possible. This means that both the code size and the amount of work going on in user and kernel space for the Vulkan driver is very small and therefore will be more efficient than OpenGL ES.

For example, there are no glUniform*() equivalent entry points in Vulkan; instead, writing to GPU memory is the only way to pass data to shaders.

When you call glUniform*(), the OpenGL ES driver typically needs to allocate a driver managed buffer and copy data to it, the management of which incurs CPU overhead. In Vulkan, you simply map the memory address and write to that memory location directly.

Here is a chart showing the difference in CPU usage between Vulkan and OpenGL ES 3.0 for our Library demo.

CPU usage: Vulkan vs OpenGL ES

Leaner, more explicit driver

The result of designing an API around the hardware means that the number of instructions in the front end portion of the driver is significantly reduced compared to OpenGL ES. This reduction in complexity enables developers to issue more draw calls, while hardware vendors can achieve better stability and quicker driver bring up time.

Even though driver used here is an alpha (i.e pre-release) version, we hope that Vulkan should eventually be very stable because there is less code to go wrong.

In Vulkan, high level management of the GPU needs to be performed by the application (e.g. resource lifetimes). The driver is almost completely hands-off and does what the application tells it to. Whilst this results in greater complexity in the application, it should be offset by the need to work around the driver (e.g. shader pre-warmings in OpenGL ES).

If your application is using an engine to do the rendering, the engine will probably already be managing this anyway, and Vulkan can provide an almost free speedup.

The way that Vulkan is designed resembles modern command buffer-based APIs so this work should be easier to do if the application or framework has been ported to these types of programming interfaces already.

More consistent performance

People might say that the main advantage to this API is that less CPU-relevant work needs to be done when submitting a draw command – and this is true.

However the main benefit I see is that the API will make programming 3D graphics much more predictable. Let me explain: for example, when you call glBlendFunc() in OpenGL ES, different things could happen depending on the underlying graphics architecture that is running that code.

Some GPUs could delay setting up the blending until the first time the bound shader is used; others might not. This makes achieving consistent performance across different GPU vendors very difficult.

Vulkan makes solving this problem easier because the entry points to the API are designed to allow the driver to do work in consistent places.

When you fill in a struct describing some state using Vulkan, you know that there is no driver work going on; the code is all application code. The API is designed to fit as best as it can to all GPU vendor’s architectures so there are fewer opportunities for unknown performance hiccups.

The glBlendFunc() problem becomes obsolete because the blend function is specified in a struct during pipeline setup. The driver work will happen early, when the function to create the pipeline is called, instead of some time during rendering causing a stutter.

Actually, a lot of the Vulkan API is aimed at being able to specify everything up-front if possible. For example you can record a list of render commands and state setting commands into a command buffer and replay that every frame with just one call. The driver has more opportunities to optimise this usage case because it knows it can do more work when creating the command buffer, rather than when executing it.

Another consequence of the explicit nature of Vulkan is that there is no resource renaming (or ghosting) behind the application’s back – multi-buffering needs to be performed explicitly. Multi-buffering is the process whereby a graphics driver may have a number of frames being processed at the same time.

The data attached to those frames (e.g. uniform data and attached textures) needs to be kept around until the frame it is attached to has finished; this will need to be performed by the application. On the plus side, the data that you know will not be modified between frames (e.g. brightness or contrast) can be specified as const for possible optimisations.

PowerVR GPUs are first-class citizens

A key feature added to Vulkan is the render pass, which redefines how well an application can control our hardware, and reduces the amount of work we have to do implicitly without the application necessarily knowing about it.

A render pass consists of framebuffer state (other than actual render target addresses), and how render targets should be loaded in and out of the GPU at the start and end of each render. This structure is the key object that allows tiled architectures like PowerVR to run at extremely high efficiency.

In OpenGL ES during rendering, several things can cause implicit flushes of tile buffers to main memory; a bandwidth heavy operation that’s usually unnecessary. Our OpenGL ES drivers spend a lot of effort trying to figure out what the application is doing to avoid doing these flushes, and to avoid having to flush all render targets to main memory. In Vulkan, the only time such a flush can happen is between render passes, making it obvious to both the application and the driver. More importantly – it tells the GPU exactly what an application wants to do with each render target.

Render commands can be created in parallel

Command buffers can be created on a different thread to the thread they are submitted on. This means rendering commands could be created on all cores of a CPU.

There is no extra work or locking required to do this – a feature that was not previously possible with OpenGL ES. This may be of use to games which need to recreate their render commands a lot (e.g. Minecraft).

More intuitive design

Vulkan gives you the advantage of knowing exactly the state that you are setting. Take for example the glActiveTexture() function in OpenGL ES: it is not obvious whether this function will change the state globally for all shaders or maybe change the state just for the current shader program.

In Vulkan, this is explicitly defined: you know that when you bind your resources, it is changing the state for the bound command buffer because that is the first parameter to the function.

A consistent idiom in Vulkan is to have the first parameter to all entry points be the representation of the state that you are going to change with the function call. For example:

vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS,
textureDescriptorSet[0], 0);

vkQueueSubmit(graphicsQueue, 1, &cmdBuffer, 0, 0, fence);

vkMapMemory(staticUniformBufferMemory, 0, (void **)&data);

// ...

vkUnmapMemory(staticUniformBufferMemory);

Explicit memory management

When you call glTexStorage2D() in OpenGL, the driver has to allocate memory for a two-dimensional or one-dimensional array texture. The function and the memory allocation process represent a black box.

In Vulkan however, the memory allocation is done by the application. This means that the application knows more about what type of memory it is using and more importantly how much memory it is using, which should be useful for applications that are memory-bound. This is in contrast to receiving an “out of memory” error in OpenGL ES and needing to reduce resource usage by an unknown value.

Explicit memory management in Vulkan allows applications to use custom allocation strategies. For example to allocate all memory up-front and avoid any allocations during rendering.

Extra details

Imagination is working to give you more information on the Vulkan API as it becomes more mature and will release example source code in the near future.

Editor’s Note

* PowerVR Rogue GPUs are based on published Khronos specifications, and are expected to pass the Khronos Conformance Testing Process. Previous generation PowerVR GPU cores have already achieved OpenGL conformance. Current conformance status can be found at www.khronos.org/conformance.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

OpenGL is a registered trademark and the OpenGL ES logo is a trademark of Silicon Graphics Inc. used by permission by Khronos.