- 03 March 2026
- Ed Plowman
For decades, semiconductor progress followed a familiar playbook: shrink the node, pack in more logic, raise the clock, and performance would follow. That model held remarkably well, and possibly much longer than it should have.
As the industry moves below 2nm, GPU design is running into a hard physical reality. The limiting factor is no longer how much logic we can fit on a die. It’s how much heat we can safely remove. Below 2nm, power, not area, becomes the defining constraint, and for edge‑class GPUs, that shift fundamentally changes how future architectures must be designed. This is a necessary structural transition that will shape GPU IP, SoC integration, and architectural trade‑offs for the next decade.
Sub‑2nm and the Wall of Thermal Density
At advanced nodes, the industry’s challenge is no longer transistor count, it is thermal density. As feature sizes approach the ~10nm gate length regime required for sub‑2nm processes, device physics becomes unforgiving.
Nanosheet GAAFETs, now being introduced by major foundries, improve electrostatic control and help manage leakage at these dimensions. But they do not eliminate the core issue: packing more active devices into smaller areas concentrates heat. Simulations and early research consistently show rising hotspot severity as logic density increases, particularly in compute‑heavy blocks such as GPU ALUs.
The consequence is clear. Aggressively shrinking compute blocks to chase area efficiency drives up local heat density, forcing supply voltage down to maintain thermal stability. Once voltage drops, frequency follows and with it, traditional performance scaling stalls.
Voltage, Frequency, and the End of Easy Scaling
One of the most significant implications of sub‑2nm design is the collapse of the historical voltage–frequency curve. As devices push toward the angstrom era, maintaining reliability and gate integrity requires operating at lower voltages.
Foundry roadmaps increasingly reflect this reality. Future nodes are optimised for power reduction rather than raw frequency uplift, signalling a shift in how performance gains will be extracted. In practical terms, this means:
- Clock speeds will no longer scale linearly with node transitions
- Performance gains must come from parallelism and efficiency, not frequency
- Power becomes the limiting factor long before area does
Area, in effect, begins to solve itself. Transistors continue to shrink, but pushing for maximum performance through higher clocks becomes thermally unsustainable.
Why Power Dominates in Edge GPUs
For edge devices (automotive, industrial, embedded AI) the implications are even more pronounced. Unlike data center GPUs, edge SoCs operate under tight thermal envelopes, often without active cooling, and must meet stringent reliability requirements across wide temperature ranges.
In this environment, performance per watt is the primary metric that matters. Power is the first‑order constraint; area is secondary, provided the silicon footprint remains commercially viable.
This aligns with broader semiconductor research. Across nanosheet and post‑FinFET studies, the consistent conclusion is that future performance scaling depends on energy‑efficient switching and thermal distribution, not maximal logic density. As the industry looks beyond 2nm toward angstrom‑class nodes, architectural success will be defined by how intelligently power and heat are managed.
What This Means for GPU Architecture
The shift from area‑driven to power‑driven design has direct architectural consequences. Future GPUs must prioritise parallelism over peak frequency, predictable thermal behaviour over dense compute packing, and architectural efficiency over brute‑force scaling. The focus will be on achieving sustainable performance within the realities of advanced silicon. It is why edge GPUs cannot simply inherit architectural assumptions from datacentre accelerators: the constraints are different, and so must be the design philosophy.
What About Chiplets?
These constraints naturally raise questions about chiplet‑based designs. If thermal limits prevent building a one‑size‑fits‑all GPU below 2nm, does disaggregation become inevitable?
In some cases, yes. Chiplets allow different IP blocks to be built at different nodes e.g. compute at more advanced processes, control logic at more mature ones, balancing power, cost, and thermal behaviour. For platforms targeting the post‑2030 timeframe, this approach is increasingly attractive.
However, chiplets are not a universal solution. For edge SoCs, the value depends on whether the added complexity delivers net gains in system‑level power efficiency, latency, and cost. The key point is that thermal management, not density, is the driver—whether within a monolithic die or across chiplets.
Power Is the New Frontier
The next decade of edge GPU design will not be defined by how small transistors can be made, but by how effectively power and heat are managed at the architectural level. As the industry moves deeper into the sub‑2nm era, area becomes a secondary consideration. Power is the real frontier.
This is why Imagination’s GPU architecture has long focused on performance per watt, predictable thermals, and scalable parallelism. Principles that become not just advantageous, but essential, as silicon scaling enters its next phase. To learn more about Imagination’s power‑efficient GPU IP and how it addresses the realities of advanced nodes, get in touch to book a meeting with the team.