Watch: An Introduction to PVRTune (13 mins)
When you start analysing performance on real hardware, the difficulty is rarely a lack of data. The problem is understanding where time is actually being spent. Frame rate alone doesn’t tell you whether you are limited by CPU submission, GPU execution, or how work moves through the pipeline. PVRTune is designed to answer that question directly, by exposing how an application behaves on PowerVR hardware in real time.
Download PVRTune from the Imagination Developer Portal
PVRTune is Imagination’s profiling tool for graphics and compute workloads on PowerVR GPUs. It captures hardware counters and timing data from a running system and presents them in a form that maps closely to how the GPU executes work. The aim is not just to collect metrics, but to make it clear which stage of the system is responsible for the observed performance.
Within the wider PowerVR toolchain, PVRTune has a specific role. Other tools help with development, debugging, or API inspection. PVRTune sits at the point where correctness is no longer the issue, and the question becomes efficiency, whether the application is making good use of the GPU and where it is not. It is this hardware-level view that allows you to move beyond symptoms and get to root causes.
The tool itself is made up of three components. On the device, PVRPerfServer collects performance data, including GPU hardware counters and optional debug information from the driver or APIs. On the host, the PVRTune GUI manages connections, captures sessions, and provides the analysis interface. There is also PVRTuneScope, which can be integrated into an application to add markers or custom events, making it easier to correlate application-level behaviour with GPU execution.
This separation keeps the target lightweight while providing a flexible environment for analysis. You can capture data live from a running application, inspect it interactively, and save it for later comparison. That ability to move between live and offline analysis is particularly useful when tracking regressions or validating optimisation work over time.
In practice, setup is minimal. The GUI can deploy and start PVRPerfServer automatically, and connections can be made over USB, SSH, or network depending on the target platform. In most cases, you can go from running an application to capturing performance data in a single step, which makes profiling part of the normal development loop rather than a separate activity.
Once a capture is running, the workflow centres on the timeline view. This shows GPU activity over time, organised by major pipeline stages, with performance counters and system metrics overlaid as needed. Rather than switching between different tools or views, you can correlate frame behaviour, GPU workload, and system state in one place.
Activity is grouped into timelines that represent different types of work: geometry processing, fragment processing, compute, and data movement. Each task appears as a block on these timelines, allowing you to follow how work flows through the GPU and to identify long-running or stalled operations. Because everything is time-aligned, you can move directly from a high-level symptom, such as a frame time spike, to the specific tasks and counters responsible for it.
The timelines are supported by a comprehensive set of hardware counters covering utilisation, processing load, and memory bandwidth. These allow performance issues to be measured rather than inferred, and provide the context needed to interpret what you see in the timeline.
This combination becomes particularly useful when identifying bottlenecks. In a CPU-bound case, GPU timelines often show gaps where the hardware is idle while CPU load remains high. The GPU is waiting for work rather than being fully utilised. In a vertex-bound workload, activity is concentrated in the geometry stage, with downstream stages underutilised because they are waiting on vertex processing. In fragment-bound scenarios, the renderer stage runs continuously with little idle time, typically due to high pixel cost or complex shaders. These patterns allow you to categorise performance issues quickly and focus on the part of the pipeline that actually needs attention.
For more complex cases, the same approach scales by combining multiple counters and timeline views. Because PVRTune exposes a wide range of metrics, you can build up a detailed picture of how workloads interact with the GPU, rather than relying on a single indicator. This is where the tool is particularly effective: it reduces the need for guesswork and replaces it with a clear, time-correlated view of the system.
PVRTune is available in two configurations. The standard version provides the core profiling features and essential counters needed for most optimisation tasks. PVRTune Complete (available to licensees) extends this with deeper hardware visibility, including additional counters, shader-level analysis, and more detailed resource tracking, allowing more advanced investigations where required.
In day-to-day use, the benefit of PVRTune is straightforward. It shortens the cycle between identifying a performance issue and understanding its cause. Instead of working from assumptions, you can see how work is scheduled, where time is being spent, and how effectively the GPU is being used. That leads to more targeted optimisation work, and avoids spending time on changes that do not address the real bottleneck.