来源:市场资讯

(来源:数据GO)

A Time Scaling Theory for Multi-Layer Electronic Systems

Tingbo He Huawei

Abstract

Forsixdecades,Moore'sgeometricscalingdroveprogressinsemiconductors.Thatindustry compact no longer holds:returns from pure dimensionalshrinking have flattened,leading-edge design budgets exceed one billion dollars per chip,and cost-per-transistor at the most advanced nodes is no longer falling.This perspective argues for a successor scaling principle—t scaling— that adopts time itself,rather than transistor area,as the primary metric of progress,applying a single characteristic time constant t as the unifying optimization target across twelve orders of magnitude,fromaswitchingtransistortoadata-centerworkload.Twoproduction-scale demonstrations are presented.On a mobileSoC,LogicFolding —amethodology that partitions digital,analog,and memorycircuitsacross verticallystackedactivetiers—deliversa55%step- wise increase in transistor density and a 41%power-efficiency gain at a fixed device node.On AI systems,a co-designed stack comprising the memory-semantic Unified Bus fabric,near-packaged Hi-ONE optical I/O,and edge-to-surface 3D Folding projects more than 100×growth in hardware integration by 2035.The deeper claim is methodological:t scaling is the first scaling principle since Dennard to establish a shared optimization target across the entire computing stack.

Lead

Sincethemid-1960s,thesemiconductorindustryhasmeasuredprogressinnanometers.Every eighteen months,transistors shrank,frequencies rose,and the cost per logic gate fell.Moore's Law

functioned as both an empirical observation and helped establish an industry compact upon which the entire computing stack was built.That industry compact no longer holds.Beyond the7nm node, geometric scaling no longer delivers its historical dividends.Lithography tooling is approaching the physical limits of patterning,EUVdepreciation dominates wafer cost,and the per-transistor price curve has flattened—and in some cases reversed.For organizations whose access to the most advanced lithography is constrained,the constraint became binding earlier and bears down more severely.

The central question for the industry has therefore changed.It is no longer "how much further can the transistor shrink?"It is "what should be scaled,and against what objective?"

The case for t scaling is developed below as both a scientific methodology and an industrial roadmap,drawing on lessons from 381 chips brought to volume production between May 2020 and May 2026.

1.The End of the Geometric Era

For most of its history,the semiconductor industry has had one job:make the transistor smaller. Gordon Moore's1965observation—thattransistordensitydoublesapproximatelyeverytwo years—was complemented a decade later by Robert Dennard's scaling theory,which established that proportional shrinking of voltage and dimensions could maintain a constant electric field. Together,geometricscalingandDennardscalingdeliveredexponentialimprovementsin

performance per watt and performance per dollar for nearly five decades.

Thisarrangementunraveledintwostages.Around2005,Dennardscalingbrokefirst:voltage ceased to scale proportionally with feature size,and the dark-silicon era began.Geometric scaling persistedlonger,sustainedbyFinFETandsubsequentlygate-all-around(GAA)device architectures.Beyond7nm,however,returnsfrompuredimensionalscalinghaveflattened.The reasons are now well documented:velocity saturation reduces the dependence of intrinsic delay on channellengthfromquadratictolinear;theparasiticresistanceandcapacitanceoflocal interconnects increasingly dominate the standard-cell delay budget;mask costs,EUV depreciation, and design-rule complexity have driven leading-edge chip design budgets past one billion dollars per chip at the 2 nm node.

The economic consequences are equally inescapable.Cost per transistor has flattened at advanced nodes and,at the leading edge,is now rising.The industry compact that sustained the last fifty years— more transistors at lower cost every generation—nolongerholds.

For Huawei Semiconductor,this transition arrived with an additional constraint:restricted access to the most advanced lithography tooling.Assuming that another node would resolve the problem was no longer tenable.Six years ago,the geometric roadmap plateaued,forcing a more fundamental question—one that,in retrospect,the entire industry will eventually have to confront.

2.Time,Not Space:The Real Currency of Moore's Era

Reduced toitsessentialeffectontheenduser,Moore's Law was neverfundamentallyabout geometry.Smaller transistors improved system performance because they switched faster.Denser interconnectsimprovedperformancebecausesignalstraversedshorterdistances.Higher integration improved performance because data crossed fewer boundaries.What each generation delivered,inessence,wasareductionintime—picosecondtonanosecondatthedevice, nanosecond to microsecond at the chip,microsecond to second at the system.Spatial scaling served merely as the instrument for compressing time.

Once this is recognized,an obvious reframing presents itself.Time itself should be adopted as the primary metric.A characteristic time constant t can be defined at every layer of the stack — transistor,circuit,chip,and system—and its reduction treated as the unifying optimization target. Geometric scaling then becomes one technique among many for reducing t,rather than the only one.

This principle is called t scaling,and is proposed here as the successor to geometric Moore scaling as the guiding principle of semiconductor evolution.Formally,t is treated as a layered construct that decomposes as

T=f(Ttransistor,Tcircuit,Tchip,Tsystem)

whereTtransistor, Tcircuit,Tchip,andTsystemrepresentthetimeconstantsatthetransistor, circuit,chip,andsystemlayer,respectively.Eachlayer'stcomposedfromthelayersbeneathit togetherwiththeorganizationalandcommunicationoverheadsintroducedatthatlayer.The workingspaceof tspansapproximately twelveordersof magnitudein time(picoseconds to seconds)andacomparablerangeinspace(nanometerstokilometers).Ateachlayer,distinct mechanisms are available for reducing t:

·Transistor:intrinsicswitchingdelay,addressedthroughmobilityenhancement,strain engineering,high-k/metalgate,andGAAarchitectures,and,increasingly,throughreduction of the parasitic R and C of local interconnects,which now exceed the intrinsic transit time by

several factors.

·Circuit:RCpropagationdelayalongsignalpaths,addressedthroughlower-resistivity conductors,low-k dielectrics,and—most consequentially—through reduction of wire length via vertical integration.

·Chip:computeandmemory-accesslatency,addressedthrougharchitecturalchoices,pipeline depth,memory hierarchy,andon-chipfabrics.

·System:end-to-endmessageandsynchronizationtime,addressedthroughinterconnect topology,protocol stack,and fabric design.

A useful generational rule emerges from this layered formulation:

where the scaling factor a is application-specific rather than universal.Production experience to date indicates a≈1.3×per year for power-constrained mobile devices,≈1.5×per year forsafety- critical autonomous systems,and up to10×per year for AI workloads,where throughput translates directly into economic value.

What renders t a useful primary metric,rather than a relabeling of existing ones,is that it is the same metric across the entire stack.Frequency,latency,bandwidth,and throughput are all governed by t at their respective layers.A process technologist,a circuit designer,and a system architect can debate the same quantity in identical units.t is the language that enables end-to-end stack co- optimization—and the era of independent optimization at each layer,with timing emerging as a residual,hasconcluded.

3.LogicFolding:A Mobile-SoC Proof Point

The first production-scale test of t scaling was conducted in mobile.A smartphone SoC is the unusualcaseinwhichonechipconstitutestheentiresystem.Multi-socketparallelismisnot available;no thousand-node fabric can mask a slow link.All performance delivered to the user originatesfromasingledie,underafew-wattpowerenvelope,againstthermallimitssetby

After 2020,when access to leading-edge nodes was restricted,the operative question became:with the node fixed,how can generation-over-generation improvements continue to be delivered on a single die?

The answer that emerged is called LogicFolding.

Definition.LogicFoldingisadesignmethodologythatpartitionsdigital,analog,andmemory circuitsacrossverticallystackedactivetierstojointlyoptimizeperformance,power,andarea following the time scaling principle.

Digital circuits divide into combinational logic—the Boolean network between registers—and sequential logic—the flip-flops that hold state.The performance ceiling of a digital system is set bythecritical-pathdelaybetweenadjacentflip-flopstages,whichinturnisdominatedby interconnect RC and gate count along that path.Conventional optimization places gates in a plane and routes wires through a metal stack above;the longer the wire,the greater the parasitic RC,and the slower the critical path.

LogicFolding abandons the planar assumption.Critical-path gates are distributed across two(and eventually more)vertically stacked active tiers,connected through ultra-fine-pitch hybrid bonding. From the circuit designer's perspective,the two tiers behave as a single continuous fabric,with cells distributed across the wafer boundary as if it were an additional metal layer.Signal wires become substantially shorter,parasitic RC decreases sharply,clock skew tightens,and the chip operates at a higher clock frequency at the same device node.

To help LogicFolding deliver these gains,it is advantageous to keep the gear ratio between hybrid- bonding pitch and top-metal pitch comparatively low—roughly below 3 in practice,with lower ratios generally better.With today's top-metal pitch around 720 nm,this translates into a hybrid- bonding pitch below 2μm—and ideally to a gear ratio of approximately 1,at which the bird-cage routing overhead at the bonding interface effectively vanishes.Achieving this pitch,together with therequiredoverlayaccuracy(<0.5μm),TSVscaling(CDandKOZsub-1.5μm,pitchsub-6μm), and yield(~100%with smart redundancy),required a multi-year process-development effort across the supplier and partner ecosystem.

The results,measured on Kirin 2026,are concrete:

·Transistor density rose step-wise from 155 to 238 MTr/mm²in a single generation (transistordensity is calculated using the formula ;the area utilization of Kirin SoC design is68%) —a magnitude of improvement that previously required three yearsof geometric scaling.

·SoC performance-core power efficiency improved by 41%and maximum clock frequency rose by nearly 13%.

·Ahigh-speed global Network-on-Chip data path constructed across both upper and lower tiers reduced the data-path footprint by 55%,with improved power-delivery stability.

·Apost-siliconclock-skewadjustmentschemecontributedover5%SoCperformance independently.

·OnSRAM—whereaccessspeed,energy-per-bit,andareadependstronglyonbit-lineand word-linelength—LogicFoldingshortenedcriticalpaths,reducedenergyperbit,and increased operating frequency by over 40%.

·Onarepresentativeprocessingcore,thedouble-layerfoldingarchitecturereducedclock-

buffer count by more than 50%,clock skew by 25%,and wire length by approximately 30%.

These gains were achieved at a fixed device node,obtained not through a new lithography step but through a topological reorganization of the spatial distribution of logic in three dimensions.

The LogicFolding implementation shipping in Kirin 2026 is deliberately conservative.The hybrid- bonding pitch reached 1.5μm;TSV landing advanced only one step below the top metal;folding was applied selectively along key critical paths rather than across the entire design.Even so,the CPU performance-core frequency returns to 3.1 GHz this year.

Over the next decade,LogicFoldingis expected to evolve from local critical-path folding to full- scale,multi-layerfolding—three,four,andmoreactivetiersperpackage—enabledbylower- temperaturehybridbonding(relaxingthethermalbudgetacrosstiers)andbyTSVlanding migrating from the top metal down to M6,which liberates over 30%ofhigh-level routing resources. From2026to2035,transistordensityisprojectedtorisetoward400MTr/mm²andbeyond. Simultaneously, LogicFoldingenables Kirin tosubstantiallystepupCPUcorefrequency,and paves the ways towards 4 GHz and beyond(Table 1).The roadmap is feasible and,in cost terms, economically viable.

Table 1.Trend of the operating frequency of Kirin CPU performance core.

SoC

Architecture

Frequency(GHz)

State

2023

Kirin9000s

Planar

2.6

Mass product

2024

Kirin9020

Planar

2.65

Mass product

2025

Kirin9030 pro

Planar

2.75

Mass product

2026

Kirin 2026

LogicFolding

3.1

Silicon

2027

Kirin 2027

LogicFolding

3.39

Silicon

2028

Kirin 2028

LogicFolding

3.71

Pre-silicon

2029

Kirin 2029

LogicFolding

4

Pre-silicon

Sidebar A—LogicFolding at a Glance

·Hybrid-bondingpitch:sub-2μm(1.5μminKirin2026;targetgearratio≈1)

·Overlayaccuracy:under0.5μm

·TSVCD/KOZ:sub-1.5μm;pitchsub-6μm;failurerate<100ppm;repairrate99.9%

·Yield:~100%withsmartredundancy

·Transistor density:155→238 MTr/mm²in a single step

·Power-efficiency/frequencygain(SoCP-core):+41%/+13%

·SRAMoperatingfrequency:+40%+

·Clock-buffercount/clockskew/wirelengthonarepresentativecore:-50%/-25%/-30%

4.From Picoseconds to Microseconds:t Scaling in the AI Data Center

A natural question is whether a principle developed in the milliwatt smartphone regime survivestranslation to the gigawatt regime of AI training and inference.AI workloads occupy the oppositeend ofthe t spectrum:not a single chip but hundreds or thousands of chips behaving as one machine, with aggregate compute increasing by approximately six orders of magnitude over the past decade.The answer is affirmative —provided t is treated as a system-level objective and applied acrossthe whole chain,rather than within a single accelerator.

Two facts shape the AI side ofthe t argument.First,AI systems continue to grow—from one chip, to dozens,to hundreds,and increasingly to tens of thousands.Second,the energy budget and the materials budget of modern AI systems are dominated by data,not by compute.Over 80%ofenergy

in a large AI cluster is consumed by data movement;over 70%of system cost is allocated to data storage.Theimplicationisdirect:reducingthetimedataspends intransit —betweenchips, between racks,and within the package —is at least as important as reducing the time compute spends computing.

t scaling is instantiated at AI scale through three coordinated layers:a system fabric(Unified Bus), a near-packaged optical engine (Hi-ONE),and a topological reorganization of the package itself (3D Folding).

4.1 Unified Bus—A T-First System Fabric

Traditional multi-node,multi-accelerator architectures move data across multiple stacked protocols: PCle to the host,NVLink or proprietary fabrics within the chassis,Ethernet or InfiniBand betweenchassis,and software-stack remote-memory access on top.Each layer entails a protocol conversion,additionalserialization,anextraDMAbuffer,andafurtherhandshake.Everyconversionaddslatency,reduces reliability,and incurs additional cost.

Unified Bus(UB)replaces this stack with a single protocol that operates within and across the chassis —a fully peer-to-peer fabric that exposes memory semantics natively across the whole system.Data movement is reduced to conversion-free,peer-to-peer transmission at the memory- semantic layer,with hardware-managed coherence in place of software-stack message passing.

The measured benefit is approximately two orders of magnitude:end-to-end remote-access latency falls from the tens of microseconds typical of TCP/IP-classstacks to approximately100 ns—a ~500×reduction in system t along the dominant communication axis.At the rack scale,this brings the system asymptotically close to a single,fabric-coherent machine —designated internally as a System-as-One-Chip.

4.2 Hi-ONE—Optical I/O at the Package

Once communication latency is reduced,the next bottleneck shifts.Increasing the density of chips within a single rack pushes power density and reliability past their limits—and pushes electrical SerDes past theirs.At 400 Gb/s per AI chip,copper cabling remains well understood and reliable.

Atmulti-Tb/s perchip,copperbecomesphysicallyimpractical:SerDesreachcontracts,cabling becomesprohibitivelybulky,panelinstallationbecomesinfeasible,andthermalandpower- delivery margins are exhausted.

The approach developed at Huawei Semiconductor is the High-density Optical-interconnect-Node Engine,Hi-ONE—a near-packaged optical engine that delivers 8 Tb/s per module,matching the UB bandwidth of an AI chip on a single optical link.It reduces the required SerDes reach from ~100cm to~5cm,eliminates bulky cabling,and extends reach from under a meter to100 meters —renderinghigh-densityinterconnectfordistributed,gigawatt-scaledatacentersphysically

realizable.

The design philosophy underlying Hi-ONE is itself a T-scaling argument.In place of a heavy DSP forhighsignalidelity,Hi-ONEadoptsalinearapproach—ananalogequalization-enhanced driver and trans-impedance amplifier—and permits the UB protocol to tolerate a deliberately relaxed bit-error rate.This cross-layer trade between protocol layer and physical layer reduces power,cost,andintegrationcomplexity,andepitomizesthecross-layertrade-offthatat-first methodology rewards.

4.3 The N²-vs-N Dilemma,and Why 3D Folding Is Inevitable

The deepest reason AI accelerators will not stop at 2.5D fan-out is geometric,and merits explicit statement because it determines the post-2030 roadmap.

In a conventional 2.5D AI chip,the logic die occupies the center of the package,HBM stacks and SerDes line its edges,and voltage regulators surround the package.Every memory signal,every interconnect signal,and every ampere of supply current must traverse the die's edge to reach the compute resources within.If the die has side length N,then:

·compute capacity scales as N²(area),

·butmemorybandwidth,interconnect,and powerdelivery —all carried by the 2.5D fan-out along the edge—scale only as N(perimeter).

The widening divergence between these quadratic and linear curves constitutes the fan-out dilemma,

and it accounts for the stalling of 2.5D scaling independent of how aggressive the underlying logic node becomes.No transistor-level improvement closes a topological deficit.

3D Folding resolves this dilemma by relocating the edge-bound resources onto surfaces.Powerdelivery(via backside power and integrated voltage regulators),high-speed memory (via hybridbondingtologic),andopticalI/O(vianear-packagedHi-ONE)allmigratefromperimeterto vertical surface—and,once located on a surface,they scale as N²,matching the quadratic pace of compute.The package is no longer a logic die surrounded by a perimeter belt of memory andSerDes;it becomes a vertically integrated stack in which memory,fabric,power,and logic all scale together.

Theroadmapplacesthisevolutiononanexplicittimeline.Throughapproximately2030,AI accelerators(the AscendSuperPoDline —Ascend910Cin2025,Ascend950in2026,and the 990to follow)rely on a combination of mature techniques:chiplets,2.5D fan-out,and 3D stacking viamicro-bumpandstandard-pitchhybridbonding.Around2030,Ascend990willintroduce LogicFolding into the AI accelerator class,and from that point 3D Folding becomes the principal carrier of a through 2035.Along this path,hardware integration is projected to increase by more than100×by2035,withtreductiondistributedacrosseverylayerofthestackratherthan concentrated at the device level.

Sidebar B—τ at AI System Scale

·UBremote-accesslatency:~10sofμs→~100ns(≈500×treduction)

·HiONE per-module bandwidth:8 Tb/s (matches per-chip UB bandwidth)

·HiONESerDes reach:~100cm→~5cm;panel-to-panel reach:<1m→ 100 m

·Fan-outdilemma:compute∞N²,perimeter-boundBW/I/O/powerxN

·3D Folding:relocates BW,optical I/O,and power delivery from edges onto surfaces,restoring N²parity

·2026→2035projectedhardware-integrationgrowth:>100×

5.Logic and Memory:From Decoupling to Re-Fusion

One implication of t scaling warrants separate discussion,because its consequences are industrial as well as technical.

In the 8086 era,the industry deliberatelydecoupled processors and memory through standardized memorybuses.Thatdecouplingpermittedtwoindustriestoscaleindependently:processor performance advanced rapidly along the Moore curve,while memory vendors developed a vast, separate market alongside it.

The AI era is reversing this decoupling.The continuing expansion of compute density is pushing memory bandwidth,latency,power,and packaging to theirlimits.HBM,hybrid bonding,and3D- stacked SRAM are symptoms of a single underlying fact:for modern AI workloads,data movement is as critical as computation itself,and logic and memory are once again being driven into tight physical integration.As they fuse,the balance of influence in the supply chain is shifting toward memory and packaging vendors.

Thetechnologicaldirectionisunambiguous,buttheeconomicresolutionisnotyetsettled. Enduring success in the AI hardware era will accrue to those who can fuse logic and memory technologically and establish an economic partnership that allows both industries to share the benefits of that fusion over the long term.This is not merely a research problem;it is a structural problem for the industry to address over the next decade.By rendering the cross-layer cost of every separation visible,t scaling ensures that the problem cannot be deferred.

6.Open Challenges

It would be misleading to present t scaling as a completed system.Several substantive problems remain open,and are identified here both to highlight ongoing work and to invite collaboration.

Toolchains and methodologies. Today's EDA was developed for an era in which area,timing,and power were optimized along three separate axes,with system t emerging as a residual.Full-scale

LogicFolding requires the toolchain to treat multiple stacked dies as a single continuous design entity—partitioning logic at cell granularity rather than block granularity,placing across the full volume under a unified cost function,and performing timing closure across inter-die paths where vertical-interconnect parasitics,KOZexclusions,and inter-wafer process variation interact in ways that traditional 2D-trained tools do not address adequately.Preliminary internal tools have been developed that produce useful results,and methodology details will be published in the coming months.A T-native toolchain—open,multi-physics,and 3D-native—is the single most important enabling investment for the next decade.

Inter-wafer process variation.LogicFolding bonds wafers from potentially distinct lots—and in somecasesdistinctnodes.Inter-wafervariationinVth,drivecurrent,andinterconnectRCis materially greater than within-wafer variation,and falls most heavily on clock distribution and hold-timemargins.Smartredundancy,adaptivecompensation,andT-awaresignoffflowsare necessary components of the response

Vertical-interconnect overhead. Every hybrid bond and every TSV incurs a finite resistance and capacitancepenalty,andTSVKOZdisplacesstandardcells. LogicFoldingmustthereforebe justified layer by layer through the simple inequality

TBenefit(effectivesiliconarea+wirelengthreduction)

>T penalty(verticalinterconnectRC)

This threshold has been crossed for mobile critical paths and for memory;the threshold is workload-specific,and the boundary will move as bonding pitch shrinks.

Energy. t is a time law,not a joule law.A super-node operating10×faster but with10×greater power consumption violates noscaling principle,yetexceedsgridcapacity.tscalingtherefore requires an energy companion: memory-semantic fabrics that eliminate stack overhead,near-/co- packaged optics that reduce picojoules per bit by orders of magnitude,backside power delivery, compute-in/near-memory,andthedisciplinedpracticeoftradingtheadroombackforpower

(DVFS at data-center scale —the same mechanism that enabled smartphone battery longevity). Importantly,t headroom itself provides energy headroom when allocated in that direction.

Benchmarks.Theindustry'scurrentperformancebenchmarks—Linpack,MLPerf,SPEC—were designed for an era in which a single scalar per workload sufficed.A τ-scaling industry requires T- profile benchmarks—vectors that expose the dominant t at each layer of a system together with the headroom remaining at that layer.The dominant-t layer is,by definition,the next investment.

7.Six Years In,Ten Years Out

BetweenMay2020andMay2026,HuaweiSemiconductordesignedandbroughttovolume production 381 chips serving mobile,AI,automotive,industrial,and infrastructure markets.Across that portfolio,the t scaling thesis has held up:

·At the device and circuit layers,transistor density has risen from155 toward 400+MTr/mm² by 2031.

·At the chip layer, LogicFolding has demonstrated,on a leading-edge mobile SoC,that critical- path frequency,power efficiency,and density can continue to advance at a fixed device node.

·Atthesystemlayer,UnifiedBusandHi-ONEhavedemonstratedthathundredsof microseconds of communication t can be compressed to hundreds of nanoseconds,and that a multi- rack AI cluster can behave as a single coherent machine.

·Looking forward,CPU performance-core frequency is expected towards 4 GHz and beyond by 2029,Kirin SoC efficiency is projected to more than double in three to five years under typical use,and AI hardware integration is expected to grow more than 100×by 2035.

The deeper claim,beyond any individual product,is methodological.t scaling is the first scaling principle since Dennard to give the entire stack a shared optimization target.It signals to process technologists,circuitdesigners,architects,systemengineers,andsoftwareteamsthatthese communities are now optimizing the same quantity in identical units,and that improvements at any single layer must propagate to the system t to count.It also indicates to industry strategists and

capital allocators that the next dollar should follow t,not nodes—that competitive performance no longer requires perpetual residence on the leading edge of lithography,and that packaging, memory bandwidth,and fabric design now command the strategic weight that the leading-edge logic node alone previously held.

For a generation of engineers educated to treat "Moore's Law"as synonymous with "progress,"this is a difficult transition.The geometric era has,in fact,concluded;denial of that fact is not a viable strategy.The era of acceleration through miniaturization is giving way to an era of acceleration through t optimization across the multi-layered electronic system—and the companies,research groups,and ecosystems that adopt t as the primary objective in the next six to ten years will determine the shape of computing in the decade thereafter.

The next ten years of work are scoped.Many open questions remain,and no single organization canaddressthemalone—thetoolchain,thestandards,thebenchmarks,thedevicephysics,and the economic models all require contributions from beyond any one company.This perspective is therefore intended as both a report from the field and an invitation.

The roadmap ahead is demanding,but the direction is unambiguous.

Author

Tingbo He leads Huawei's semiconductor business.The team she directs has designed and brought to volume production 381 chips between 2020 and 2026 across mobile,AI,automotive,and infrastructure markets,and is the source of the t scaling methodology and the LogicFolding, UnifiedBus,and Hi-ONE technologies described in this article.

Acknowledgments

Thisperspectivedrawsonsixyearsof workbythousandsofengineersacrossHuawei Semiconductor and its ecosystem of foundry,equipment,EDA,and system partners.The author thanks the customers whose patience made this work possible.

Further Reading

1.G.E.Moore,"Crammingmorecomponentsontointegratedcircuits,"Electronics,vol.38,no.

8,pp.114-117,Apr.1965(reprinted inProc.IEEE,vol.86,no.1,Jan.1998).

2.R.H.Dennardetal.,"Designofion-implantedMOSFETswithverysmallphysical dimensions,"IEEEJ.Solid-StateCircuits, vol.9,no.5,pp.256-268,1974.

3.J.L.Hennessy and D.A.Patterson,"A new golden age for computer architecture,"Commun. ACM, vol.62,no.2,pp.48-60,Feb.2019.

4.M.Horowitz,"Computing'senergy problem(and what wecandoaboutit),"ISSCC Dig.Tech. Papers, pp.10-14,Feb.2014.

5.InternationalRoadmapforDevicesandSystems(IRDS)—InterconnectandMore-than-

Moore chapters,2023/2024 update.

6.P.Batudeetal.,"3D sequential integration:a key enabling technology for heterogeneous co- integration of new functions with CMOS,"IEEE J.Electron Devices Soc.,vol.3,no.3,pp.205-

216,2015.