Zen 2 chips

Zen 2 chips DEFAULT
Edit Values
Zen 2 µarch
Arch TypeCPU
DesignerAMD
ManufacturerGlobalFoundries, TSMC
IntroductionJuly 2019
ProcessGloFo 14LPP, TSMC N7, GloFo 12LP
Core Configs4, 6, 8, 12, 16, 24, 32, 64
TypeSuperscalar
OoOEYes
SpeculativeYes
Reg RenamingYes
Stages19
Decode4-way
ISAx86-64
ExtensionsMOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA3, F16C, BMI, BMI2, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVE, SHA, UMIP, CLZERO
Core NamesRenoir (APU/Mobile),
Matisse (Desktop),
Castle Peak (HEDT),
Rome (Server)

Zen 2 is AMD's successor to Zen+, and is a 7 nm processmicroarchitecture for mainstream mobile, desktops, workstations, and servers. Zen 2 was replaced by Zen 3.

For performance desktop and mobile computing, Zen is branded as Athlon, Ryzen 3, Ryzen 5, Ryzen 7, Ryzen 9, and Ryzen Threadripper processors. For servers, Zen is branded as EPYC.

History[edit]

Zen 2 succeeded Zen in 2019. In February of 2017 Lisa Su, AMD's CEO announced their future roadmap to include Zen 2 and later Zen 3. On Investor's Day May 2017 Jim Anderson, AMD Senior Vice President, confirmed that Zen 2 was set to utilize 7 nm process. Initial details of Zen 2 and Rome were unveiled during AMD's Next Horizon event on November 6, 2018.

Codenames[edit]

Core C/T Target
Rome Up to 64/128 High-end server multiprocessors
Castle Peak Up to 64/128 Workstation & enthusiasts market processors
Matisse Up to 16/32 Mainstream to high-end desktops & enthusiasts market processors
Renoir Up to 8/16 Mainstream APUs with Vega GPUs

Brands[edit]

1 Only available on G, GE, H, HS, HX and U SKUs.
2 ECC support is unavailable on AMD APUs.

Process technology[edit]

Zen 2 comprises multiple different components:

Compiler support[edit]

Compiler Arch-Specific Arch-Favorable
GCC
LLVM
  • Note: Initial support in GCC 9 and LLVM 9.

Architecture[edit]

Zen 2 inherits most of the design from Zen+ but improves the instruction stream bandwidth and floating-point throughput performance.

Under construction icon-blue.svgThis article is a work in progress!

Key changes from Zen+[edit]

  • 7 nm process (from 12 nm)
  • Core
    • Higher IPC (AMD self-reported up to 15% IPC)
    • Front-end
      • Improved branch prediction unit
      • Improved µOP cache tags
      • Improved µOP cache
        • Larger µOP cache (4096 entries, up from 2048)
      • Increased dispatch bandwidth
    • Back-end
      • FPU
        • 2x wider datapath (256-bit, up from 128-bit)
        • 2x wider EUs (256-bit FMAs, up from 128-bit FMAs)
        • 2x wider LSU (2x256-bit L/S, up from 128-bit)
        • Improved DP vector mult. latency (3 cycles, down from 4)
      • Integer
        • Increased number of registers (180, up from 168)
        • Additional AGU (3, up from 2)
        • Larger scheduler (4x16 ALU + 1x28 AGU, up from 4x14 ALU + 2x14 AGU
        • Larger Reorder Buffer (224, up from 192)
    • Memory subsystem
      • 0.5x L1 instruction cache (32 KiB, down from 64 KiB)
      • 8-way associativity (from 4-way)
      • 1.33 larger L2 DTLB (2048-entry, up from 1536)
      • 48 entry store queue (was 44)
  • CCX
    • 2x L3 cache slice (16 MiB, up from 8 MiB)
    • Increased L3 latency (~40 cycles, up from ~35 cycles)
  • Security
    • In-silicon Spectre enhancements
    • Increase number of keys/VMs supported
  • I/O
    • PCIe 4.0 (from 3.0)
    • Infinity Fabric 2
      • 2.3x transfer rate per link (25 GT/s, up from ~10.6 GT/s)
    • Decoupling of MemClk from FClk, allowing 2:1 ratio in addition to 1:1
    • DDR4-3200 support, up from DDR4-2933

This list is incomplete; you can help by expanding it.

New instructions[edit]

Zen 2 introduced a number of new x86 instructions:

  • - Write back modified cache line and may retain line in cache hierarchy
  • - Write back and do not flush internal caches, initiate same of external caches
  • - Commit stores to memory
  • - Read Processor ID
  • - Read Processor Register

Furthermore, the User-Mode Instruction Prevention (UMIP) extension.

Block Diagram[edit]

Individual Core[edit]

zen 2 core diagram.svg

Memory Hierarchy[edit]

  • Cache
    • L0 Op Cache:
      • 4,096 Ops, 8-way set associative
      • Parity protected
    • L1I Cache:
      • 32 KiB, 8-way set associative
        • 64 sets, 64 B line size
        • Shared by the two threads, per core
      • Parity protected
    • L1D Cache:
      • 32 KiB, 8-way set associative
      • Write-back policy
      • 4-5 cycles latency for Int
      • 7-8 cycles latency for FP
      • ECC
    • L2 Cache:
      • 512 KiB, 8-way set associative
        • 1,024 sets, 64 B line size
      • Write-back policy
      • Inclusive of L1
      • ≥ 12 cycles latency
      • ECC
    • L3 Cache:
      • Matisse, Castle Peak, Rome: 16 MiB/CCX, shared across all cores
      • Renoir: 4 MiB/CCX, shared across all cores
      • 16-way set associative
        • 16,384 sets, 64 B line size
      • Write-back policy, Victim cache
      • 39 cycles average latency
      • ECC
      • QoS Monitoring and Enforcement
  • System DRAM
    • Rome:
      • 8 channels per socket, up to 16 DIMMs, max. 4 TiB
      • Up to PC4-25600L (DDR4-3200 RDIMM/LRDIMM), ECC supported
    • Castle Peak, sTRX4:
      • 4 channels, up to 8 DIMMs, max. 256 GiB
      • Up to PC4-25600U (DDR4-3200 UDIMM), ECC supported
    • Matisse:
      • 2 channels, up to 4 DIMMs, max. 128 GiB
      • Up to PC4-25600U (DDR4-3200 UDIMM), ECC supported

Translation Lookaside Buffers

  • ITLB
    • 64 entry L1 TLB, fully associative, all page sizes
    • 512 entry L2 TLB, 8-way set associative
      • 4-Kbyte and 2-Mbyte pages
    • Parity protected
  • DTLB
    • 64 entry L1 TLB, fully associative, all page sizes
    • 2,048 entry L2 TLB, 16-way set associative
      • 4-Kbyte and 2-Mbyte pages, PDEs to speed up table walks
    • Parity protected

Core[edit]

Zen 2 largely builds on Zen. Most of the fine details have not been revealed by AMD yet.

Under construction icon-blue.svgThis article is a work in progress!

Front End[edit]

In order to feed the backend, which has been widened to support 256-bit operation, the front-end throughput was improved. AMD reported that the branch prediction unit has been reworked. This includes improvements to the prefetcher and various undisclosed optimizations to the instruction cache. The µOP cache was also tweaked including changes to the µOP cache tags and the µOP cache itself which has been enlarged to improve the instruction stream throughput.

Branch Prediction Unit[edit]

The branch prediction unit guides instruction fetching and attempts to predict branches and their target to avoid pipeline stalls or the pursuit of incorrect execution paths. The Zen 2 BPU almost doubles the branch target buffer capacity, doubles the size of the indirect target array, and introduces a TAGE predictor. According to AMD it exhibits a 30% lower misprediction rate than its perceptron counterpart in the Zen/Zen+ microarchitecture.

Once per cycle the next address logic determines if branch instructions have been identified in the current 64-byte instruction fetch block, and if so, consults several branch prediction facilities about the most likely target and calculates a new fetch block address. If no branches are expected it calculates the address of the next sequential block. Branches are evaluated much later in the integer execution unit which provides the actual branch outcome to redirect instruction fetching and refine the predictions. The dispatch unit can also cause redirects to handle mispredictions and exceptions.

Zen 2 has a three-level branch target buffer (BTB) which records the location of branch instructions and their target. Each entry can hold up to two branches if they reside in the same 64-byte cache line and the first is a conditional branch, reducing prediction latency and power consumption. Additional branches in the same cache line occupy another entry and increase latency accordingly. The L0 BTB holds 8 forward and 8 backward taken branches, up from 4 and 4 in the Zen/Zen+ microarchitecture, no calls or returns, and predicts with zero bubbles. The L1 BTB has 512 entries (256 in Zen) and creates one bubble if its prediction differs from the L0 BTB. The L2 BTB has 7168 entries (4096 in Zen) and creates four bubbles if its prediction differs from the L1 BTB.

A bubble is a pipeline stage performing no work because its input was delayed. Bubbles propagate to later stages and add up as different pipelines stall for unrelated reasons. The various decoupling queues in this design intend to hide bubbles or reduce their impact and allow earlier pipelines to run ahead.

A 32-entry return address stack (RAS) predicts return addresses from a near call. Far control transfers (far call or jump, SYSCALL, IRET, etc.) are not subject to branch prediction. 31 entries are usable in single-threaded mode, 15 per thread in dual-threaded mode. At the address of a CALL instruction the address of the following instruction is pushed onto the stack, at a RET instruction the address is popped. The RAS can recover from most mispredictions, is flushed otherwise. It includes an optimization for calls to the next address, an IA-32 idiom to obtain a copy of the instruction pointer enabling position independent code.

The 1024-entry, up from 512 entries in Zen, indirect target array (ITA) predicts indirect branches, for instance calls through a function pointer. Branches always to the same target are predicted using the static target from the BTB entry. If a branch has multiple targets, the predictor chooses among them using global history at L2 BTB correction latency.

The conditional branch direction predictor predicts if a near branch will be taken or not. Never taken branches are not entered into the BTB, thereby implicitly predicted not-taken. A taken branch is initially predicted always-taken, and dynamically predicted if its behavior changes. The Zen/Zen+ microarchitecture uses a hashed perceptron predictor for this purpose which is supplemented in Zen 2 by a TAGE predictor. AMD did not disclose details, the following explanations describe predictors of this type in general.

When a branch takes place, it is stored in the branch target buffer so that subsequent branches could more easily be determined and taken (or not). Modern microprocessors such as Zen take this further by not only storing the history of the last branch but rather last few branches in a global history register (GHR) in order to extract correlations between branches (e.g., if an earlier branch is taken, maybe the next branch will also likely be taken).

zen-1-2-ghr.png

Perceptrons are the simplest form of machine learning and lend themselves to somewhat easier hardware implementations compared to some of the other machine learning algorithms. They also tend to be more accurate than predictors like gshare but they do have more complex implementations. When the processor encounters a conditional branch, its address is used to fetch a perceptron from a table of perceptrons. A perceptron for our purposes is nothing more than a vector of weights. Those weights represent the correlation between the outcome of a historic branch and the branch being predicted. For example, consider the following three patterns: “TTN”, “NTN”, and “NNN”. If all three patterns resulted in the next branch not being taken, then perhaps we can say that there is no correlation between the first two branches and assign them very little weight. The result of prior branches is fetched from the global history register. The individual bits from that register are used as inputs. The output value is the computed dot product of the weights and the history of prior branches. A negative output, in this case, might mean ‘not taken’ while all other values might be predicted as ‘taken’. It’s worth pointing out that other inputs beyond branch histories can also be used for inference correlations though it’s unknown if any real-world implementation makes use of that idea. The implementation on Zen is likely much more complex, sampling different kinds of histories. Nonetheless, the way it works remains the same.

zen-1-2-hp-1.png

Given Zen pipeline length and width, a bad prediction could result in over 100 slots getting flushed. This directly translates to a loss of performance. Zen 2 keeps the hashed perceptron predictor but adds a second layer new TAGE predictor. This predictor was first proposed in 2006 by Andre Seznec which is an improvement on Michaud’s PPM-like predictor. The TAGE predictor has won all four of the last championship branch prediction (CBP) contests (2006-2016). TAGE relies on the idea that different branches in the program require different history lengths. In other words, for some branches, very small histories work best. For example, 1-bit predictor: if a certain branch was taken before, it will be taken again. A different branch might rely on prior branches, hence requiring a much longer multi-bit history to adequately predict if it will be taken. The TAgged GEometric history length (TAGE) predictor consists of multiple global history tables that are indexed with global history registers of varying lengths in order to cover all of those cases. The lengths the registers used forms a geometric series, hence the name.

zen-1-2-tage.png

The idea with the TAGE predictor is to figure out which amount of branch history is best for which branch, prioritizing the longest history over shorter history.

zen-1-2-hr-lengths.png

This multi-predictor scheme is similar to the layering of the branch target buffers. The first-level predictor, the perceptron, is used for quick lookups (e.g., single-cycle resolution). The second-level TAGE predictor is a complex predictor that requires many cycles to complete and therefore must be layered on top of the simple predictor. In other words, the L2 predictor is slower but better and is therefore used to double check the result of the faster and less accurate predictor. If the L2 predictor differs from the L1 one, a minor flush occurs as the TAGE predictor overrides the perceptron predictor and the fetch goes back and uses the L2 prediction as it’s assumed to be the more accurate prediction.

With the Zen family the translation of virtual to physical fetch addresses moved into the branch unit, this allows instruction fetching to begin earlier. The translation is assisted by a two-level translation lookaside buffer (TLB), unchanged in size from Zen to Zen 2. The fully-associative L1 instruction TLB contains 64 entries and holds 4-Kbyte, 2-Mbyte, or 1-Gbyte page table entries. The 512 entries of the 8-way set-associative L2 instruction TLB can hold 4-Kbyte and 2-Mbyte page table entries. 1-Gbyte pages are smashed into 2-Mbyte entries in the L2 ITLB. A hardware page table walker handles L2 ITLB misses.

Address translation, instruction cache and op cache lookups start in parallel. Micro-tags predict the way where the instruction block may be found, these hits are qualified by a cache tag lookup in a following pipeline stage. On an op cache hit the op cache receives a fetch address and enters macro-ops into the micro-op queue. If the machine was not in OC-mode already the op cache stalls until the instruction decoding pipeline is empty. Otherwise the address and information if and where the fetch block resides in the instruction cache is entered into a prediction queue decoupling branch prediction from the instruction fetch unit.

Instruction Fetch and Decode Unit[edit]

The instruction fetch unit reads 32 bytes per cycle from a 32 KiB, 8-way set associative, parity protected L1 instruction cache (IC), which replaced the 64 KiB, 4-way set associative instruction cache in the Zen/Zen+ microarchitecture. The cache line size is 64 bytes. On a miss, cache lines are fetched from the L2 cache which has a bandwidth of 32 bytes per cycle. The instruction cache generates fill requests for the cache line which includes the miss address and up to thirteen additional 64-byte blocks. They are prefetched from addresses generated by the branch prediction unit and prepared in the prediction queue.

A 20-entry instruction byte queue decouples the fetch and decode units. Each entry holds 16 instruction bytes aligned on a 16-byte boundary. 10 entries are available to each thread in SMT mode. The IBQ, as apparently all data structures maintained by the core except the L1 and L2 data cache, is parity protected. A parity error causes a machine check exception, the core recovers by reloading the data from memory. The caches are ECC protected to correct single (and double?) bit errors and disable malfunctioning ways.

An align stage scans two 16-byte windows per cycle to determine the boundaries of up to four x86 instructions. The length of x86 instructions is variable and ranges from 1 to 15 bytes. Only the first slot can pick instructions longer than 8 bytes. There is no penalty for decoding instructions with many prefixes in AMD Family 16h and later processors.

In another pipeline stage or stages the instruction decoder converts up to four x86 instructions per cycle to macro-ops. According to AMD the instruction decoder can send up to four instructions per cycle to the op cache and micro-op queue. This has to encompass at least four, and various sources suggest no more than four, macro-ops.

A macro-op is a fixed length, uniform representation of (usually) one x86 instruction, comprising an ALU and/or a memory operation. The latter can be a load, a store, or a load and store to the same address. A micro-op in AMD parlance is one of these primitive operations as well as the subset of a macro-op relevant to this operation. AMD refers to instructions decoded into one macro-op as "FastPath Single" type. More complex x86 instructions, the "VectorPath" type, are expanded into a fixed or variable number of macro-ops by the microcode sequencer, and at this stage probably represented by a macro-op containing a microcode ROM entry address. In the Zen/Zen+ microarchitecture AVX-256 instructions which perform the same operation on the 128-bit upper and lower half of a YMM register are decoded into a pair of macro-ops, the "FastPath Double" type. Zen 2 decodes these instructions into a single macro-op. There are, however, other instructions which generate two macro-ops. If these are "FastPath Double" decoded or microcoded is unclear. Branch fusion, discussed below, combines two x86 instructions into a single macro-op.

The op cache (OC) is a 512-entry, up from 256-entry in Zen, 8-way set associative, parity protected cache of previously decoded instructions. Each entry contains up to 8 sequential instructions ending in the same 64-byte aligned memory region, resulting in a maximum capacity of 4096 instructions, or rather, macro-ops. The op cache can send up to 8 macro-ops per cycle to the micro-op queue in lieu of using the traditional instruction fetch and decode pipeline stages. A transition from IC to OC mode is only possible at a branch target, and the machine generally remains in OC mode until an op cache miss occurs on a fetch address. More details can be gleaned from "Operation Cache", patent WO 2018/106736 A1.

Bypassing the clock-gated fetch and decode units, and providing up to twice as many instructions per cycle, the op cache improves the decoding pipeline latency and bandwidth and reduces power consumption. AMD stated that doubling its size at the expense of halving the instruction cache in Zen 2 results in a better trade-off.

AMD Family 15h and later processors support branch fusion, combining a CMP or TEST instruction with an immediately following conditional branch into a single macro-op. Instructions with a RIP-relative address or both a displacement and an immediate operand are not fused. Reasons may be an inability to handle two RIP-relative operands in one operation and limited space in a macro-op. According to Agner Fog the Zen microarchitecture can process two fused branches per cycle if the branches are not taken, one per two cycles if taken. AMD diagrams refer to the output of the Zen 2 op cache as fused instructions and the aforementioned patent confirms that the op cache can contain branch-fused instructions. If they are fused when entering the cache, or if the instruction decoder sends fused macro-ops to the op cache as well as the micro-op queue is unclear.

A micro-op queue of undocumented depth, supposedly 72 entries in the Zen/Zen+ microarchitecture, decouples the decoding and dispatch units. Microcoded instructions are sent to the microcode sequencer which expands them into a predetermined sequence of macro-ops stored in the microcode ROM, temporarily inhibiting the output of macro-ops from the micro-op queue. A patch RAM supplements the microcode ROM and can hold additional sequences. The microcode sequencer supports branching within the microcode and includes match registers to intercept a limited number of microcode entry addresses and redirect execution to the patch RAM.

The stack engine unites a sideband stack optimizer (SSO) and a stack tracker. The former removes dependencies on the RSP register in a chain of PUSH, POP, CALL, and RET instructions. A SSO is present in AMD processors since the K10 microarchitecture. The stack tracker predicts dependencies between pairs of PUSH and POP instructions. The memfile similarly predicts dependencies between stores and loads accessing the same data in memory, e.g. local variables. Both functions use memory renaming to facilitate store-to-load forwarding bypassing the load-store unit.

Dependencies on the RSP arise from the side effect of decrementing and incrementing the stack pointer. A stack operation can not proceed until the previous one updated the register. The SSO lifts these adjustments into the front end, calculating an offset which falls and rises with every PUSH and POP, and turns these instructions into stores and loads with RSP + offset addressing. The stack tracker records PUSH and POP instructions and their offset in a table. The memfile records stores and their destination, given by base, index, and displacement since linear or physical addresses are still unknown. They remain on file until the instruction retires. A temporary register is assigned to each store. When the store is later executed, the data is copied into this register (possibly by mapping it to the physical register backing the source register?) as well as being sent to the store queue. Loads are compared to recorded stores. A load predicted to match a previous store is modified to speculatively read the data from the store's temporary register. This is resolved in the integer or FP rename unit, potentially as a zero latency register-register move. The load is also sent to the LS unit to verify the prediction, and if incorrect, replay the instructions depending on the load with the correct data. It should be noted that several loads and stores can be dispatched in one cycle and this optimization is applied to all of them.

The dispatch unit distributes macro-ops to the out-of-order integer and floating point execution units. It can dispatch up to six macro-ops per cycle.

Execution Engine[edit]

AMD stated that both the dispatch bandwidth and the retire bandwidth has been increased.

Integer Execution Unit[edit]

The integer execution (EX) unit consists of a dedicated rename unit, five schedulers, a 180-entry physical register file (PRF), four ALU and three AGU pipelines, and a 224-entry retire queue shared with the floating point unit. The depth of the four ALU scheduler queues increased from 14 to 16 entries in Zen 2. The two AGU schedulers with a 14-entry queue of the Zen/Zen+ microarchitecture were replaced by one unified scheduler with a 28-entry queue. A third AGU pipeline only for store operations was added as well. The size of the PRF increased from 168 to 180 entries, the capacity of the retire queue from 192 to 224 entries.

The retire queue and the integer and floating point rename units form the retire control unit (RCU) tracking instructions, registers, and dependencies in the out-of-order execution units. Macro-ops stay in the retire queue until completion or until an exception occurs. When all macro-ops constituting an instruction completed successfully the instruction becomes eligible for retirement. Instructions are retired in program order. The retire queue can track up to 224 macro-ops in flight, 112 per thread in SMT mode, and retire up to eight macro-ops per cycle.

The rename unit receives up to six macro-ops per cycle from the dispatch unit. It maps general purpose architectural registers and temporary registers used by microcoded instructions to physical registers and allocates physical registers to receive ALU results. The PRF has 180 entries. Up to 38 registers per thread are mapped to architectural or temporary registers, the rest are available for out-of-order renames.

Zen 2, like the Zen/Zen+ microarchitecture, supports move elimination, performing register to register moves with zero latency in the rename unit while consuming no scheduling or execution resources. This is implemented by mapping the destination register to the same physical register as the source register and freeing the physical register previously backing the destination register. Given a chain of move instructions registers can be renamed several times in one cyle. Moves of partial registers such as AL, AH, or AX are not eliminated, they require a register merge operation in an ALU.

Earlier AMD microarchitectures, apparently including Zen/Zen+, recognize zeroing idioms such as XOR-ing a register with itself to eliminate the dependency on the source register. Zen 2 likely inherits this optimization but AMD did not disclose details.

The ALU pipelines carry out integer arithmetic and logical operations and evaluate branches. Each ALU pipeline is headed by a scheduler with a 16-entry micro-op queue. The scheduler tracks the availability of operands and issues up to one micro-op per cycle which is ready for execution, oldest first, to the ALU. Together all schedulers in the core can issue up to 11 micro-ops per cycle, this is not sustainable however due to the available dispatch and retire bandwidth. The PRF or the bypass network supply up to two operands to each pipeline. The bypass network enables back-to-back execution of dependent instructions by forwarding results from the ALUs to the previous pipeline stage. Data from load operations is superforwarded through the bypass network, obviating a write and read of the PRF.

The integer pipelines are asymmetric. All ALUs are capable of all integer operations except multiplies, divides, and CRC which are dedicated to one ALU each. The fully pipelined 64 bit × 64 bit multiply unit has a latency of three cycles. With two destination registers the latency becomes four cycles and throughput one per two cycles. The radix-4 integer divider can compute two result bits per cycle.

The address generation pipelines compute a linear memory address from the operands of a load or store micro-op. That can be a segment, base, and index register, an index scale factor, and a displacement. Address generation is optimized for simple address modes with a zeroed segment register. If two or three additions are required the latency increases by one cycle. Three-operand LEA instructions are also sent to an AGU and have two cycle latency, the result is inserted back into the ALU2 or ALU3 pipeline. Load and store micro-ops stay in the 28-entry address generation queue (AGQ) until they can be issued. The scheduler tracks the availability of operands and free entries in the load or store queue, and issues up to three ready micro-ops per cycle to the address generation units (AGUs). Two AGUs can generate addresses for load operations and send them to the load queue. All three AGUs can generate addresses for store operations and send them to the store queue. Some address checks, e.g. for segment boundary violations, are performed on the side. Store data is supplied by an integer or floating point ALU.

AMD did not disclose the rationale for adding a third store AGU. Three-way address generation may be necessary to realize the potential of the load-store unit to perform two 256-bit loads and one store per cycle. Remarkably Intel made the same improvements, doubling the L1 cache bandwidth and adding an AGU, in the Haswell microarchitecture.

Floating Point Unit[edit]

Like the Zen/Zen+ microarchitecture, the Zen 2 floating point unit utilizes a coprocessor architectural model comprising a dedicated rename unit, a single 4-issue, out-of-order scheduler, a 160-entry physical register file (PRF), and four execution pipelines. The in-order retire queue is shared with the integer unit. The FPU handles x87, MMX, SSE, and AVX instructions. FP loads and stores co-opt the EX unit for address calculations and the LS unit for memory accesses.

In the Zen/Zen+ microarchitecture the floating point physical registers, execution units, and data paths are 128 bits wide. For efficiency AVX-256 instructions which perform the same operation on the 128-bit upper and lower half of a YMM register are decoded into two macro-ops which pass through the FPU individually as execution resources become available and retire together. Accordingly the peak throughput is four SSE/AVX-128 instructions or two AVX-256 instructions per cycle.

Zen 2 doubles the width of the physical registers, execution units, and data paths to 256 bits. The L1 data cache bandwidth was doubled to match. The number of micro-ops issued by the FP scheduler remains four, implying most AVX-256 instructions decode to a single macro-op which conserves queue entries and reduces pressure on RCU and scheduling resources. AMD did not disclose how the FPU was restructured. Die shots suggest two execution blocks splitting the PRF and FP ALUs, one operating on the lower 128 bits of a YMM register, executing x87, MMX, SSE, and AVX instructions, the other on the upper 128 bits for AVX-256 instructions. This improvement doubles the peak throughput of AVX-256 instructions to four per cycle, or in other words, up to 32 FLOPs/cycle in single precision or up to 16 FLOPs/cycle in double precision. Another improvement reduces the latency of double-precision vector multiplications from 4 to 3 cycles, equal to the latency of single-precision multiplications. The latency of fused multiply-add (FMA) instructions remains 5 cycles.

The rename unit receives up to four macro-ops per cycle from dispatch. It maps x87, MMX, SSE, and AVX architectural registers and temporary registers used by microcoded instructions to physical registers. The floating point control/status register is renamed as well. MMX and x87 registers occupy the lowest 64 or 80 bits of a PR. The Zen/Zen+ rename unit allocates two 128-bit PRs for each YMM register. Only one PR is needed for SSE and AVX-128 instructions, the upper half of the destination YMM register in another PR remains unchanged or is zeroed, respectively, which consumes no execution resources. (SSE instructions behave this way for compatibility with legacy software which is unaware of, and does not preserve, the upper half of YMM registers through library calls or interrupts.) Zen 2 allocates a single 256-bit PR and tracks in the register allocation table (RAT) if the upper half of the YMM register was zeroed. This necessitates register merging when SSE and AVX instructions are mixed and the upper half of the YMM register contains non-zero data. To avoid this the AVX ISA exposes an SSE mode where the FPU maintains the upper half of YMM registers separately. Zen 2 handles transitions between the SSE and AVX mode by microcode which takes approximately 100 cycles in either direction. Zeroing the upper half of all YMM registers with the VZEROUPPER or VZEROALL instruction before executing SSE instructions prevents the transition.

Zen 2 inherits the move elimination and XMM register merge optimizations from its predecessors. Register to register moves are performed by renaming the destination register and do not occupy any scheduling or execution resources. XMM register merging occurs when SSE instructions such as SQRTSS leave the upper 64 or 96 bits of the destination register unchanged, causing a dependency on previous instructions writing to this register. AMD family 15h and later processors can eliminate this dependency in a sequence of scalar FP instructions by recording in the RAT if those bits were zeroed. By setting a Z-bit in the RAT the rename unit also tracks if an architectural register was completely zeroed. All-zero registers are not mapped, the zero data bits are injected at the bypass network which conserves power and PRF entries, and allows for more instructions in flight. Earlier AMD microarchitectures, apparently including Zen/Zen+, recognize zeroing idioms such as XORPS combining a register with itself and eliminate the dependency on the source register. AMD did not disclose if or how this is implemented in Zen 2. Family 16h processors recognize zeroing idioms in the instruction decode unit and set the Z-bit on the destination register in the floating point rename unit, completing the operation without consuming scheduling or execution resources.

As in the Zen/Zen+ microarchitecture the 64-entry non-scheduling queue decouples dispatch and the FP scheduler. This allows dispatch to send operations to the integer side, in particular to expedite floating point loads and store address calculations, while the FP scheduler, whose capacity cannot be arbitrarily increased for complexity reasons, is busy with higher latency FP operations. The 36-entry out-of-order scheduler issues up to four micro-ops per cycle to the execution pipelines. A 160-entry physical register file holds the speculative and committed contents of architectural and temporary registers. The PRF has 8 read ports and 4 write ports for ALU results, each 256 bits wide in Zen 2, and two additional write ports supporting up to two 256-bit load operations per cycle, up from two 128-bit loads in Zen. The load convert (LDCVT) logic converts data to the internal register file format. The FPU is capable of superforwarding load data to dependent instructions through the bypass network, obviating a write and read of the PRF. The bypass network enables back-to-back execution of dependent instructions by forwarding results from the FP ALUs to the previous pipeline stage.

The floating point pipelines are asymmetric, each supporting a different set of operations, and the ALUs are grouped in domains, to conserve die space and reduce signal path lengths which permits higher clock frequencies. The number of execution resources available reflects the density of different instruction types in x86 code. Each pipe receives two operands from the PRF or bypass network. Pipe 0 and 1 support FMA instructions which take three operands; The third operand is obtained by borrowing a PRF read port from pipe 3, stalling this pipe for one cycle, unless the operand is available on the bypass network. All pipes can perform logical operations. Other operations are supported by one, two, or three pipes. The execution domains distinguish vector integer operations, floating point operations, and store operations. Instructions consuming data produced in another domain incur a one cycle penalty. Only pipe 2 executes floating point store-data micro-ops, sending data to the store queue at a rate of up to one 256-bit store per cycle, up from one 128-bit store in Zen. It also takes care of transfers between integer and FP registers on dedicated data busses.

Same as Zen/Zen+ the Zen 2 FPU handles denormal floating-point values natively, this can still incur a small penalty in some instances (MUL/DIV/SQRT).

Load-Store Unit[edit]

The load-store unit handles memory reads and writes. The width of data paths and buffers doubled from 128 bits in the Zen/Zen+ microarchitecture to 256 bits.

The LS unit contains a 44-entry load queue (LDQ) which receives load operations from dispatch through either of the two load AGUs in the EX unit and the linear address of the load computed there. A load op stays in the LDQ until the load completes or a fault occurs. Adding the AGQ depth of 28 entries, dispatch can issue up to 72 load operations at a time. A 48-entry store queue (STQ), up from 44 entries in Zen, receives store operations from dispatch, a linear address computed by any of the three AGUs, and store data from the integer or floating point execution units. A store op likewise stays in the STQ until the store is committed or a fault occurs. Loads and stores are speculative due to branch prediction.

Three largely independent pipelines can execute up to two 256-bit load operations and one 256-bit store per cycle. The load pipes translate linear to physical addresses in parallel with L1 data cache accesses. The LS unit can perform loads and stores out of order. It supports loads bypassing older loads and loads bypassing older non-conflicting stores, observing architectural load and store ordering rules. Store-to-load forwarding is supported when an older store containing all of the load's bytes, with no particular alignment since the Piledriver microarchitecture, is in the STQ and store data has been produced. Memory dependence prediction, to speculatively reorder loads and stores before the physical address has been determined, was introduced by AMD in the Bulldozer microarchitecture.

A two-level translation lookaside buffer (TLB) assists load and store address translation. The fully-associative L1 data TLB contains 64 entries and holds 4-Kbyte, 2-Mbyte, and 1-Gbyte page table entries. The L2 data TLB is a unified 12-way set-associative cache with 2048 entries, up from 1536 entries in Zen, holding 4-Kbyte and 2-Mbyte page table entries, as well as page directory entries (PDEs) to speed up DTLB and ITLB table walks. 1-Gbyte pages are smashed into 2-Mbyte entries but installed as 1-Gbyte entries when reloaded into the L1 TLB.

Two hardware page table walkers handle L2 TLB misses, presumably one serving the DTLB, another the ITLB. In addition to the PDE entries, the table walkers include a 64-entry page directory cache which holds page-map-level-4 and page-directory-pointer entries speeding up DTLB and ITLB walks. Like the Zen/Zen+ microarchitecture, Zen 2 supports page table entry (PTE) coalescing. When the table walker loads a PTE, which occupies 8 bytes in the x86-64 architecture, from memory it also examines the other PTEs in the same 64-byte cache line. If a 16-Kbyte aligned block of four consecutive 4-Kbyte pages are also consecutive and 16-Kbyte aligned in physical address space and have identical page attributes, they are stored into a single TLB entry greatly improving the efficiency of this cache. This is only done when the processor is operating in long mode.

The LS unit relies on a 32 KiB, 8-way set associative, write-back, ECC-protected L1 data cache (DC). It supports two loads, if they access different DC banks, and one store per cycle, each up to 256 bits wide. The line width is 64 bytes, however cache stores are aligned to a 32-byte boundary. Loads spanning a 64-byte boundary and stores spanning a 32-byte boundary incur a penalty of one cycle. In the Zen/Zen+ microarchitecture 256-bit vectors are loaded and stored as two 128-bit halves; the load and store boundaries are 32 and 16 bytes respectively. Zen 2 can load and store 256-bit vectors in a single operation, but stores must be 32-byte aligned now to avoid the penalty. As in Zen aligned and unaligned load and store instructions (for example MOVUPS/MOVAPS) provide identical performance.

The DC load-to-use latency is 4 cycles to the integer unit, 7 cycles to the FPU. The AGUs and LS unit are optimized for simple address generation modes: Base + displacement, base + index, and displacement-only. More complex modes and/or a non-zero segment base increase the latency by one cycle.

The L1 DC tags contain a linear-address-based microtag which allows loads to predict the way holding the requested data before the physical address has been determined, reducing power consumption and bank conflicts. A hardware prefetcher brings data into the L1 DC to reduce misses. The LS unit can track up to 22 in-flight cache misses, these are recorded in the miss address buffer (MAB).

The LS unit supports memory type range register (MTRR) and the page attribute table (PAT) extensions. Write-combining, if enabled for a memory range, merges multiple stores targeting locations within the address range of a write buffer to reduce memory bus utilization. Write-combining is also used for non-temporal store instructions such as MOVNTI. The LS unit can gather writes from 8 different 64-byte cache lines.

Each core benefits from a private 512 KiB, 8-way set associative, write-back, ECC-protected L2 cache. The line width is 64 bytes. The data path between the L1 data or instruction cache and the L2 cache is 32 bytes wide. The L2 cache has a variable load-to-use latency of no less than 12 cycles. Like the L1 cache it also has a hardware prefetcher.

Core Complex[edit]

Zen 2 organizes CPU cores in a core complex (CCX). A CCX comprises four cores sharing a 16 MiB, 16-way set associative, write-back, ECC protected, L3 cache. The L3 capacity doubled over the Zen/Zen+ microarchitecture. The cache is divided into four slices of 4 MiB capacity. Each core can access all slices with the same average load-to-use latency of 39 cycles, compared to 35 cycles in the previous generation. The Zen CCX is a flexible design allowing AMD to omit cores or cache slices in APUs and embedded processors. All Zen 2-based processors introduced as of late 2019 have the same CCX configuration, only the number of usable cores and L3 slices varies by processor model.

The width of a L3 cache line is 64 bytes. The data path between the L3 and L2 caches is 32 bytes wide. AMD did not disclose the size of miss buffers. Processors based on the Zen/Zen+ microarchitecture support 50 outstanding misses per core from L2 to L3, 96 from L3 to memory.

Each CPU core is supported by a private L2 cache. The L3 cache is a victim cache filled from L2 victims of all four cores and exclusive of L2 unless the data in the L3 cache is likely being accessed by multiple cores, or is requested by an instruction fetch.(non-inclusive hierarchy)

The L3 cache maintains shadow tags for all cache lines of each L2 cache in the CCX. This simplifies coupled fill/victim transactions between the L2 and L3 cache, and allows the L3 cache to act as a probe filter for requests between the L2 caches in the CCX, external probes and, taking advantage of its knowledge that a cache line shared by two or more L2 caches is exclusive to this CCX, probe traffic to the rest of the system. If a core misses in its L2 cache and the L3 cache, and the shadow tags indicate a hit in another L2 cache, a cache-to-cache transfer within the CCX is initiated. CCXs are not directly connected, even if they reside on the same die. Requests leaving the CCX pass through the scalable data fabric on the I/O die.

Zen 2 introduces the AMD64 Technology Platform Quality of Service Extensions which aim for compatibility with Intel Resource Director Technology, specifically CMT, MBM, CAT, and CDP. An AMD-specific PQE-BW extension supports read bandwidth enforcement equivalent to Intel's MBA. The L3 cache is not a last level cache shared by all cores in a package as on Intel CPUs, so each CCX corresponds to one QoS domain. L2 QoS monitoring and enforcement is not supported.

Resources are tracked using a Resource Monitoring ID (RMID) which abstracts an application, thread, VM, or cgroup. For each logical processor only one RMID is active at a given time. CMT (Cache Monitoring Technology) allows an OS, hypervisor, or VMM to measure the amount of L3 cache occupied by a thread. MBM (Memory Bandwidth Monitoring) counts read requests to the rest of the system. CAT (Cache Allocation Technology) divides the L3 cache into a number of logical segments, possibly corresponding to ways, and allows system software to restrict a thread to an arbitrary, possibly empty, set of segments. It should be noted that still only a single copy of data is stored in the L3 cache if the data is accessed by threads with mutually exclusive sets. CDP (Code and Data Prioritization) extends CAT by differentiating between sets for code and data accesses. MBA (Memory Bandwidth Allocation) and PQE-BW (Platform Quality of Service Enforcement for Memory Bandwidth) limits the memory bandwidth a thread can consume within its QoS domain. Benefits of QoS include the ability to protect time critical processes from cache intensive background tasks, to reduce contention by scheduling threads according to their resource usage, and to mitigate interference from noisy neighbors in multitenant virtual machines.

Rome, Castle Peak, and Matisse are multi-die designs combining an I/O die tailored for their market and between one and eight identical core complex dies (CCDs), each containing two independent core complexes, a system management unit (SMU), and a global memory interconnect version 2 (GMI2) interface.

The GMI2 interface extends the scalable data fabric from the I/O die to the CCDs, presumably a bi-directional 32-lane IFOP link comparable to the die-to-die links in first and second generation EPYC and Threadripper processors. According to AMD the die-to-die bandwidth increased from 16 B read + 16 B write to 32 B read + 16 B write per fclk.

Inferring its function from earlier Family 17h processors the SMU is a microcontroller which captures temperatures, voltage and current levels, adjusts CPU core frequencies and voltages, and applies local limits in a fast local closed loop and a global loop with a master SMU on the I/O die. The SMUs communicate through the scalable control fabric, presumably including a dedicated single lane IFOP SerDes on each CCD.

Rome[edit]

Rome is codename for AMD's server chip based on the Zen 2 core. Like prior generation (Naples), Rome utilizes a chiplet multi-chip package design. Each chip comprises of nine dies - one centralized I/O die and eight compute dies. The compute dies are fabricated on TSMC's 7 nm process in order to take advantage of the lower power and higher density. On the other hand, the I/O makes use of GlobalFoundries mature 14 nm process.

The centralized I/O die incorporates eight Infinity Fabric links, 128 PCIe Gen 4 lanes, and eight DDR4 memory channels. The full capabilities of the I/O have not been disclosed yet. Attached to the I/O die are eight compute dies - each with eight Zen 2 core - for a total of 64 cores and 128 threads per chip.

Die[edit]

Zen 2 CPU core[edit]

  • TSMC 7-nanometer process
  • 13 metal layers[1]
  • 475,000,000 transistors incl. 512 KiB L2 cache and one 4 MiB L3 cache slice[1]
  • Core size incl. L2 cache and one L3 cache slice: 7.83 mm²[1]
  • Core size incl. L2 cache: 3.64 mm² (estimated)

Core Complex Die[edit]

  • TSMC 7-nanometer process
  • 13 metal layers[1]
  • 3,800,000,000 transistors[2]
  • Die size: 74 mm²[2][3]
  • CCX size: 31.3 mm²[4][5]
  • 2 × 16 MiB L3 cache: 2 × 16.8 mm² (estimated)
AMD Zen 2 CCD.jpg

Client I/O Die[edit]

Server I/O Die[edit]

Renoir Die[edit]

renoir die.png

All Zen 2 Chips[edit]

Sours: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2

Zen 2

2019 AMD 7-nanometre processor microarchitecture

Zen 2 is a computer processormicroarchitecture by AMD. It is the successor of AMD's Zen and Zen+ microarchitectures, and is fabricated on the 7 nanometerMOSFET node from TSMC. The microarchitecture powers the third generation of Ryzen processors, known as Ryzen 3000 for the mainstream desktop chips (codename "Matisse"), Ryzen 4000U/H (codename "Renoir") and Ryzen 5000U (codename "Lucienne") for mobile applications, as Threadripper 3000 for high-end desktop systems,[4][5] and as Ryzen 4000G for accelerated processing units (APUs). The Ryzen 3000 series CPUs were released on 7 July 2019,[6][7] while the Zen 2-based Epyc server CPUs (codename "Rome") were released on 7 August 2019.[8] An additional chip, the Ryzen 9 3950X, was released in November 2019.[6]

At CES 2019, AMD showed a Ryzen third-generation engineering sample that contained one chiplet with eight cores and 16 threads.[4] AMD CEO Lisa Su also said to expect more than eight cores in the final lineup.[9] At Computex 2019, AMD revealed that the Zen 2 "Matisse" processors would feature up to 12 cores, and a few weeks later a 16 core processor was also revealed at E3 2019, being the aforementioned Ryzen 9 3950X.[10][11]

Zen 2 includes hardware mitigations to the Spectre security vulnerability.[12] Zen 2-based EPYC server CPUs use a design in which multiple CPU dies (up to eight in total) manufactured on a 7 nm process ("chiplets") are combined with a 14 nm I/O die on each multi-chip module (MCM) package. Using this, up to 64 physical cores and 128 total compute threads (with simultaneous multithreading) are supported per socket. This architecture is nearly identical to the layout of the "pro-consumer" flagship processor Threadripper 3990X.[13] Zen 2 delivers about 15% more instructions per clock than Zen and Zen+,[14][15] the 14- and 12-nm microarchitectures utilized on first and second generation Ryzen respectively.

Both the PlayStation 5 and the Xbox Series X and Series S use chips based on the Zen 2 microarchitecture, with proprietary tweaks and different configurations in each system's implementation than AMD sells in its own commercially available APUs.[16][17]

Design[edit]

Two delidded Zen 2 processors designed with the multi-chip module approach. The CPU on the left (top on mobile) (used for mainstream Ryzen CPUs) uses a smaller, less capable I/O die and up to two CCDs (only one is used on this particular example), while the one on the right (bottom, used for high-end desktop, HEDT, Ryzen Threadripper and server Epyc CPUs) uses a larger, more capable I/O die and up to eight CCDs.

Zen 2 is a significant departure from the physical design paradigm of AMD's previous Zen architectures, Zen and Zen+. Zen 2 moves to a multi-chip module design where the I/O components of the CPU are laid out on its own, separate die, which is also called a chiplet in this context. This separation has benefits in scalability and manufacturability. As physical interfaces don't scale very well with shrinks in process technology, their separation into a different die allows these components to be manufactured using a larger, more mature process node than the CPU dies. The CPU dies (referred to by AMD as core complex dies or CCDs), now more compact due to the move of I/O components onto another die, can be manufactured using a smaller process with fewer manufacturing defects than a larger die would exhibit (since the chances of a die having a defect increases with device (die) size) while also allowing for more dies per wafer. In addition, the central I/O die can service multiple chiplets, making it easier to construct processors with a large number of cores.[13][18][19]

Simplified illustration of the Zen 2 microarchitecture

On the left (top on mobile): Die shot of a Zen 2 Core Complex Die. On the right (bottom): Die shot of a Zen 2 EPYC I/O die.

With Zen 2, each CPU chiplet houses 8 CPU cores, arranged in 2 core complexes (CCXs), each of 4 CPU cores. These chiplets are manufactured using TSMC's 7 nanometerMOSFET node and are about 74 to 80 mm2 in size.[18] The chiplet has about 3.8 billion transistors, while the 12 nm I/O die (IOD) is ~125 mm2 and has 2.09 billion transistors.[20] The amount of L3 cache has been doubled to 32 MB, with each CCX in the chiplet now having access to 16 MB of L3 compared to the 8 MB of Zen and Zen+.[21]AVX2 performance is greatly improved by an increase in execution unit width from 128-bit to 256-bit.[22] There are multiple variants of the I/O die: one manufactured on GlobalFoundries14 nanometer process, and another manufactured using the same company's 12 nanometer process. The 14 nanometer dies have more features and are used for the EPYC Rome processors, whereas the 12 nm versions are used for consumer processors.[18] Both processes have similar feature sizes, so their transistor density is also similar.[23]

AMD's Zen 2 architecture can deliver higher performance at a lower power consumption than Intel's Cascade Lake architecture, with an example being the AMD Ryzen Threadripper 3970X running with a TDP of 140 W in ECO mode delivering higher performance than the Intel Core i9-10980XE running with a TDP of 165 W.[24]

New features[edit]

Feature tables[edit]

CPUs[edit]

CPU features table

APUs[edit]

APU features table

Products[edit]

On 26 May 2019, AMD announced six Zen 2-based desktop Ryzen processors (codenamed "Matisse"). These included 6-core and 8-core variants in the Ryzen 5 and Ryzen 7 product lines, as well as a new Ryzen 9 line that includes the company's first 12-core and 16-core mainstream desktop processors. [29]

The Matisse I/O die is also used as the X570 chipset.

AMD's second generation of Epyc processors, codenamed "Rome", feature up to 64 cores, and were launched on 7 August 2019.[8]

Desktop CPUs[edit]

Model Release date
and price
FabChipletsCores
(threads)
Core config[i]Clock rate (GHz) CacheSocketPCIe lanes
(User accessible+Chipset link)[ii]
Memory
support
TDP
Base Boost L1L2L3
Entry-level
Ryzen 3 3100[30]April 21, 2020
$99
TSMC
7FF
1 × CCD
1 × I/O
4 (8) 2 × 2 3.6 3.9 32 KB inst.
32 KB data
per core
512 KB
per core
16 MB
8 MB per CCX
AM424 (20+4) DDR4-3200
dual-channel
65 W
Ryzen 3 3300X[31]April 21, 2020
$120
1 × 4 3.8 4.3 16 MB
Mainstream
Ryzen 5 3500 November 15, 2019
OEM (West)
Japan ¥16000[32]
TSMC
7FF
1 × CCD
1 × I/O
6 (6) 2 × 3 3.6 4.1 32 KB inst.
32 KB data
per core
512 KB
per core
16 MB
8 MB per CCX
AM424 (20+4) DDR4-3200
dual-channel
65 W
Ryzen 5 3500X[33]October 8, 2019
China ¥1099
32 MB
16 MB per CCX
Ryzen 5 3600[34]July 7, 2019
US $199
6 (12) 3.6 4.2
Ryzen 5 Pro 3600[35]September 30, 2019
OEM
Ryzen 5 3600X[36]July 7, 2019
US $249
3.8 4.4 95 W
Ryzen 5 3600XT[37]July 7, 2020
US $249
4.5
Performance
Ryzen 7 Pro 3700[38]September 30, 2019
OEM
TSMC
7FF
1 × CCD
1 × I/O
8 (16) 2 × 4 3.6 4.4 32 KB inst.
32 KB data
per core
512 KB
per core
32 MB
16 MB per CCX
AM424 (20+4) DDR4-3200
dual-channel
65 W[iii]
Ryzen 7 3700X[40]July 7, 2019
US $329
Ryzen 7 3800X[41]July 7, 2019
US $399
3.9 4.5 105 W
Ryzen 7 3800XT[42]July 7, 2020
US $399
4.7
Enthusiast
Ryzen 9 3900[43]October 8, 2019
OEM
TSMC
7FF
2 × CCD
1 × I/O
12 (24) 4 × 3 3.1 4.3 32 KB inst.
32 KB data
per core
512 KB
per core
64 MB
16 MB per CCX
AM424 (20+4) DDR4-3200
dual-channel
65 W
Ryzen 9 Pro 3900[44]September 30, 2019
OEM
Ryzen 9 3900X[45]July 7, 2019
US $499
3.8 4.6 105 W[iv]
Ryzen 9 3900XT[46]July 7, 2020
US $499
4.7
Ryzen 9 3950X[47]November 25, 2019
US $749
16 (32) 4 × 4 3.5
High-End Desktop (HEDT)
Ryzen Threadripper 3960X[48]November 25, 2019
US $1399
TSMC
7FF
4 × CCD
1 × I/O
24 (48) 8 × 3 3.8 4.5 32 KB inst.
32 KB data
per core
512 KB
per core
128 MB
16 MB per CCX
sTRX464 (56+8) DDR4-3200
quad-channel
280 W[v]
Ryzen Threadripper 3970X[50]November 25, 2019
US $1999
32 (64) 8 × 4 3.7 4.5
Ryzen Threadripper 3990X[51]February 7, 2020
US $3990
8 × CCD
1 × I/O
64 (128) 16 × 4 2.9 4.3 256 MB
16 MB per CCX
Workstation
Ryzen Threadripper Pro 3945WX[52]July 14, 2020
OEM
TSMC
7FF
2 × CCD
1 × I/O
12 (24) 4 × 3 4.0 4.3 32 KB inst.
32 KB data
per core
512 KB
per core
64 MB
16 MB per CCX
sWRX8 128 (120+8) DDR4-3200
octa-channel
280 W
Ryzen Threadripper Pro 3955WX[53]July 14, 2020
OEM
16 (32) 4 × 4 3.9
Ryzen Threadripper Pro 3975WX[54]July 14, 2020
OEM
4 × CCD
1 × I/O
32 (64) 8 × 4 3.5 4.2 128 MB
16 MB per CCX
Ryzen Threadripper Pro 3995WX[55]July 14, 2020
OEM
8 × CCD
1 × I/O
64 (128) 16 × 4 2.7 4.2 256 MB
16 MB per CCX
  1. ^Core Complexes (CCXs) × cores per CCX
  2. ^The chipset itself provides additional user-accessible PCIe lanes and integrated PCIe devices, see AM4 chipsets.
  3. ^Ryzen 7 3700X may consume over 90 W under load.[39]
  4. ^Ryzen 9 3900X and Ryzen 9 3950X may consume over 145 W under load.[39]
  5. ^Ryzen Threadripper 3990X may consume over 490 W under load.[49]

Desktop APUs[edit]

Mobile processors[edit]

Renoir (4000 series)[edit]

Model Release
date
SOC CPU GPU SocketPCIe
lanes
Memory supportTDP
FabTransistors

(million)

Die Size

(mm²)

Cores
(threads)
Core config[i]Clock rate (GHz) CacheModel,
config[ii]
Clock Processing
power
(GFLOPS)[iii]
L1L2L3
Ryzen 3 4300U[57][58]March 16, 2020 TSMC
7FF
9,800 156 4 (4) 1 × 4 2.7 3.7 32 KB inst.
32 KB data
per core
512 KB
per core
4 MB AMD Radeon Graphics
320:20:8
5 CU
1400 MHz 896 FP6 16 (8+4+4) DDR4-3200
LPDDR4-4266
dual-channel
10–25 W
Ryzen 3 PRO 4450U[59]May 7, 2020 4 (8) 2.5
Ryzen 5 4500U[60][61]March 16, 2020 6 (6) 2 × 3 2.3 4.0 8 MB
4 MB per CCX
AMD Radeon Graphics
384:24:8
6 CU
1500 MHz 1152
Ryzen 5 4600U[62]6 (12) 2.1
Ryzen 5 PRO 4650U[63]May 7, 2020
Ryzen 5 4680U[64]April 13, 2021 AMD Radeon Graphics
448:28:8
7 CU
1344
Ryzen 5 4600HS[65]March 16, 2020 3.0 AMD Radeon Graphics
384:24:8
6 CU
1152 35 W
Ryzen 5 4600H[66][67]35–54 W
Ryzen 7 4700U[68]8 (8) 2 × 4 2.0 4.1 AMD Radeon Graphics
448:28:8
7 CU
1600 MHz 1433.6 10–25 W
Ryzen 7 PRO 4750U[69]May 7, 2020 8 (16) 1.7
Ryzen 7 4800U[70]March 16, 2020 1.8 4.2 AMD Radeon Graphics
512:32:8
8 CU
1750 MHz 1792
Ryzen 7 4980U[71]April 13, 2021 2.0 4.4 1950 MHz 1996.8
Ryzen 7 4800HS[72]March 16, 2020 2.9 4.2 AMD Radeon Graphics
448:28:8
7 CU
1600 MHz 1433.6 35 W
Ryzen 7 4800H[73][74]35–54 W
Ryzen 9 4900HS[75]3 4.3 AMD Radeon Graphics
512:32:8
8 CU
1750 MHz 1792 35 W
Ryzen 9 4900H[76]3.3 4.4 35–54 W

Lucienne (5000 series)[edit]

Embedded processors[edit]

Server processors[edit]

Common features of these CPUs:

  • Codenamed "Rome"
  • The number of PCI-E lanes: 128
  • Release date: August 7, 2019 except EPYC 7H12 which was released on September 18, 2019
  • Memory support: eight-channel DDR4-3200
Model Price FabChipletsCores
(threads)
Core config[i]Clock rate (GHz) CacheSocket &
configuration
TDP
Base Boost L1L2L3
All-core Max
EPYC 7232P US $450 7 nm2 × CCD
1 × I/O
8 (16) 4 × 2 3.1 3.2 32 KB inst.
32 KB data
per core
512 KB
per core
32 MB
8 MB per CCX
SP3
1P
120 W
EPYC 7302P US $825 4 × CCD
1 × I/O
16 (32) 8 × 2 3 3.3 128 MB
16 MB per CCX
155 W
EPYC 7402P US $1250 24 (48) 8 × 3 2.8 3.35 180 W
EPYC 7502P US $2300 32 (64) 8 × 4 2.5 3.35
EPYC 7702P US $4425 8 × CCD
1 × I/O
64 (128) 16 × 4 2 3.35 256 MB
16 MB per CCX
200 W
EPYC 7252 US $475 2 × CCD
1 × I/O
8 (16) 4 × 2 3.1 3.2 64 MB
16 MB per CCX
SP3
2P
120 W
EPYC 7262 US $575 4 × CCD
1 × I/O
8 × 1 3.2 3.4 128 MB
16 MB per CCX
155 W
EPYC 7272 US $625 2 × CCD
1 × I/O
12 (24) 4 × 3 2.9 3.2 64 MB
16 MB per CCX
120 W
EPYC 7282 US $650 16 (32) 4 × 4 2.8 3.2
EPYC 7302 US $978 4 × CCD
1 × I/O
8 × 2 3 3.3 128 MB
16 MB per CCX
155 W
EPYC 7352 US $1350 24 (48) 8 × 3 2.3 3.2
EPYC 7402 US $1783 8 × 3 2.8 3.35 180 W
EPYC 7452 US $2025 32 (64) 8 × 4 2.35 3.35 155 W
EPYC 7502 US $2600 8 × 4 2.5 3.35 180 W
EPYC 7532 US $3350 8 × CCD
1 × I/O
16 × 2 2.4 3.3 256 MB
16 MB per CCX
200 W
EPYC 7542 US $3400 4 × CCD
1 × I/O
8 × 4 2.9 3.4 128 MB
16 MB per CCX
225 W
EPYC 7552 US $4025 6 × CCD
1 × I/O
48 (96) 12 × 4 2.2 3.3 192 MB
16 MB per CCX
200 W
EPYC 7642 US $4775 8 × CCD
1 × I/O
16 × 3 2.3 3.3 256 MB
16 MB per CCX
225 W
EPYC 7662 US $6150 64 (128) 16 × 4 2 3.3 225 W
EPYC 7702 US $6450 2 3.35 200 W
EPYC 7742 US $6950 2.25 3.4 225 W
EPYC 7H12 2.6 3.3 280 W
EPYC 7F32 US $2100 4 × CCD
1 × I/O
8 (16) 8 × 1 3.7 3.9 128 MB
16 MB per CCX
SP3
1P/2P
180 W
EPYC 7F52 US $3100 8 × CCD
1 × I/O
16 (32) 16 × 1 3.5 3.9 256 MB
16 MB per CCX
240 W
EPYC 7F72 US $2450 6 × CCD
1 × I/O
24 (48) 12 × 2 3.2 3.7 192 MB
16 MB per CCX
240 W
  1. ^Core Complexes (CCX) × cores per CCX

Video game consoles[edit]

Gallery[edit]

  • Infrared die shot of the I/O Die

  • Zen 2 Core Complex Die (CCD)

  • AMD EPYC 7702 server processor.

  • A delidded AMD 7702 featuring 8 CCDs, with remains of the solder thermal interface material (TIM) on the chiplets.

See also[edit]

References[edit]

  1. ^"AMD Unleashes Ultimate PC Gaming Platform with Worldwide Availability of AMD Radeon RX 5700 Series Graphics Cards and AMD Ryzen 3000 Series Desktop Processors" (Press release). Santa Clara, California: Advanced Micro Devices, Inc. 7 July 2019. Retrieved 7 November 2020.
  2. ^Larabel, Michael (16 May 2017). "AMD Talks Up Vega Frontier Edition, Epyc, Zen 2, ThreadRipper". Phoronix. Retrieved 16 May 2017.
  3. ^ abCutress, Ian (20 June 2017). "AMD EPYC Launch Event Live Blog". AnandTech. Retrieved 21 June 2017.
  4. ^ abCutress, Ian (9 January 2019). "AMD Ryzen third Gen 'Matisse' Coming Mid 2019: Eight Core Zen 2 with PCIe 4.0 on Desktop". AnandTech. Retrieved 15 January 2019.
  5. ^online, heise. "AMD Ryzen 3000: 12-Kernprozessoren für den Mainstream". c't Magazin.
  6. ^ abLeather, Antony. "AMD Ryzen 9 3900X and Ryzen 7 3700X Review: Old Ryzen Owners Look Away Now". Forbes. Retrieved 19 September 2019.
  7. ^"AMD Ryzen 3000 CPUs launching July 7 with up to 12 cores". PCGamesN. Retrieved 28 May 2019.
  8. ^ ab"2nd Gen AMD EPYC Processors Set New Standard for the Modern Datacenter with Record-Breaking Performance and Significant TCO Savings". AMD. 7 August 2019. Retrieved 8 August 2019.
  9. ^Hachman, Mark (9 January 2019). "AMD's CEO Lisa Su confirms ray tracing GPU development, hints at more 3rd-gen Ryzen cores". Retrieved 15 January 2019.
  10. ^Curtress, Ian (26 May 2019). "AMD Ryzen 3000 Announced: Five CPUs, 12 Cores for $499, Up to 4.6 GHz, PCIe 4.0, Coming 7/7". Retrieved 3 July 2019.
  11. ^Thomas, Bill (10 June 2019). "AMD announces the Ryzen 9 3950X, a 16-core mainstream processor". Retrieved 3 July 2019.
  12. ^Alcorn, Paul (31 January 2018). "AMD Predicts Double-Digit Revenue Growth In 2018, Ramps Up GPU Production". Tom's Hardware. Retrieved 31 January 2018.
  13. ^ abShilov, Anton (6 November 2018). "AMD Unveils 'Chiplet' Design Approach: 7nm Zen 2 Cores Meet 14 nm I/O Die".
  14. ^Cutress, Ian. "AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome". www.anandtech.com.
  15. ^Walton, Steven (16 November 2020). "AMD Ryzen 5000 IPC Performance Tested". TechSpot. Retrieved 18 April 2021.
  16. ^Warren, Tom (24 February 2020). "Microsoft reveals more Xbox Series X specs, confirms 12 teraflops GPU". The Verge. Retrieved 24 February 2020.
  17. ^Leadbetter, Richard (18 March 2020). "Inside PlayStation 5: the specs and the tech that deliver Sony's next-gen vision". Eurogamer. Retrieved 18 March 2020.
  18. ^ abcCutress, Ian (10 June 2019). "AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome". AnandTech. p. 1. Retrieved 17 June 2019.
  19. ^De Gelas, Johan (7 August 2019). "AMD Rome Second Generation EPYC Review: 2x 64-core Benchmarked". AnandTech. Retrieved 29 September 2019.
  20. ^November 2019, Paul Alcorn 21. "AMD Ryzen 9 3900X and Ryzen 7 3700X Review: Zen 2 and 7nm Unleashed". Tom's Hardware.
  21. ^Cutress, Ian (10 June 2019). "AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome". AnandTech. Retrieved 17 June 2019.
  22. ^Cutress, Ian (10 June 2019). "AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome". AnandTech. Retrieved 17 June 2019.
  23. ^Schor, David (22 July 2018). "VLSI 2018: GlobalFoundries 12nm Leading-Performance, 12LP".
  24. ^Mujtaba, Hassan (24 December 2019). "AMD Ryzen Threadripper 3970X Is An Absolutely Efficient Monster CPU".
  25. ^"AMD Zen 2 CPUs Come With A Few New Instructions - At Least WBNOINVD, CLWB, RDPID - Phoronix". www.phoronix.com.
  26. ^"GNU Binutils Adds Bits For AMD Zen 2's RDPRU + MCOMMIT Instructions - Phoronix". www.phoronix.com.
  27. ^btarunr (12 June 2019). "AMD Zen 2 has Hardware Mitigation for Spectre V4". TechPowerUp. Retrieved 18 October 2019.
  28. ^Agner, Fog. "Surprising new feature in AMD Ryzen 3000". Agner's CPU blog.
  29. ^Cutress, Ian (26 May 2019). "AMD Ryzen 3000 Announced: Five CPUs, 12 Cores for $499, Up to 4.6 GHz, PCIe 4.0, Coming 7/7". AnandTech. Retrieved 17 June 2019.
  30. ^"AMD Ryzen 3 3100 Desktop Processor". AMD.
  31. ^"AMD Ryzen 3 3300X Desktop Processor". AMD.
  32. ^"AMD Launches Ryzen 5 3500 in Japan with 6 Cores/6 Threads for 16K Yen". hardwaretimes.com. 17 February 2020.
  33. ^Cutress, Ian (8 October 2019). "AMD Brings Ryzen 9 3900 and Ryzen 5 3500X To Life". AnandTech.com.
  34. ^"AMD Ryzen 5 3600 Desktop Processor". AMD.
  35. ^"AMD Ryzen 5 PRO 3600 Processor". AMD.
  36. ^"AMD Ryzen 5 3600X Processor". AMD.
  37. ^"AMD Ryzen 5 3600XT". AMD.
  38. ^"AMD Ryzen 7 PRO 3700 Processor". AMD.
  39. ^ ab"Tom's Hardware Ryzen 9 3950X review". Tom's Hardware. Retrieved 12 May 2020.
  40. ^"AMD Ryzen 7 3700X". AMD.
  41. ^"AMD Ryzen 7 3800X". AMD.
  42. ^"AMD Ryzen 7 3800XT". AMD.
  43. ^"AMD Ryzen 9 3900 specifications". CPU World.
  44. ^"AMD Ryzen 9 PRO 3900 Processor". AMD.
  45. ^"AMD Ryzen 9 3900X Processor". AMD.
  46. ^"AMD Ryzen 9 3900XT Processor". AMD.
  47. ^"AMD Ryzen 9 3950X Processor". AMD.
  48. ^"AMD Ryzen Threadripper 3960X Processor". AMD.
  49. ^"Kitguru AMD Ryzen Threadripper 3990X CPU Review". KitGuru. Retrieved 12 May 2020.
  50. ^"AMD Ryzen Threadripper 3970X Processor". AMD.
  51. ^"AMD Ryzen Threadripper 3990X Processor". AMD.
  52. ^"AMD Ryzen Threadripper PRO 3945WX". AMD.
  53. ^"AMD Ryzen Threadripper PRO 3955WX". AMD.
  54. ^"AMD Ryzen Threadripper PRO 3975WX". AMD.
  55. ^"AMD Ryzen Threadripper PRO 3995WX". AMD.
  56. ^ abcdefghijkl"AMD Ryzen 4000 Series Desktop Processors with AMD Radeon Graphics Set to Deliver Breakthrough Performance for Commercial and Consumer Desktop PCs".
  57. ^"AMD Ryzen 3 4300U". AMD.
  58. ^"AMD Ryzen 3 4300U Specs". TechPowerUp. Retrieved 17 September 2021.
  59. ^"AMD Ryzen 3 PRO 4450U". AMD.
  60. ^"AMD Ryzen 5 4500U". AMD.
  61. ^"AMD Ryzen 5 4500U Specs". TechPowerUp. Retrieved 17 September 2021.
  62. ^"AMD Ryzen 5 4600U". AMD.
  63. ^"AMD Ryzen 5 PRO 4650U". AMD.
  64. ^CoveMiner. "Surface Laptop 4 processors technical overview - Surface". docs.microsoft.com. Retrieved 14 April 2021.
  65. ^"AMD Ryzen 5 4600HS". AMD.
  66. ^"AMD Ryzen 5 4600H". AMD.
  67. ^"AMD Ryzen 5 4600H Specs". TechPowerUp. Retrieved 17 September 2021.
  68. ^"AMD Ryzen 7 4700U". AMD.
  69. ^"AMD Ryzen 7 PRO 4750U". AMD.
  70. ^"AMD Ryzen 7 4800U". AMD.
  71. ^CoveMiner. "Surface Laptop 4 processors technical overview - Surface". docs.microsoft.com. Retrieved 14 April 2021.
  72. ^"AMD Ryzen 7 4800HS". AMD.
  73. ^"AMD Ryzen 7 4800H". AMD.
  74. ^"AMD Ryzen 7 4800H Specs". TechPowerUp. Retrieved 17 September 2021.
  75. ^"AMD Ryzen 9 4900HS". AMD.
  76. ^"AMD Ryzen 9 4900H". AMD.
  77. ^"AMD Ryzen™ 3 5300U". AMD.
  78. ^"AMD Ryzen™ 5 5500U". AMD.
  79. ^"AMD Ryzen 5 5500U Specs". TechPowerUp. Retrieved 17 September 2021.
  80. ^"AMD Ryzen™ 7 5700U". AMD.
  81. ^ abcd"Embedded Processor Specifications". AMD.
  82. ^ abcd"Product Brief: AMD Ryzen™ Embedded V2000 Processor Family"(PDF).
  83. ^"AMD Unveils AMD Ryzen™ Embedded V2000 Processors with Enhanced Performance and Power Efficiency". AMD.
Sours: https://en.wikipedia.org/wiki/Zen_2
  1. Jct tiller
  2. Funny random tattoos
  3. Agate element
  4. Gentoo screenshots
  5. Electric gate opener home depot

AMD May Be Preparing New Zen 2 CPUs. But Why?

AMD's Zen 3-based Ryzen 5000 (Vermeer) processors are among the best CPUs currently on the market. However, it would seem that the chipmaker has an excess of leftover Zen 2 dies as a new USB-IF listing (via Komachi_Ensaka) has exposed three unreleased Zen 2 processors.

The submission mentions the Athlon Gold 4100GE, Ryzen 5 4500 and Ryzen 3 4100 processors with the A1 revision. We don't know for certain if the trio of AMD chips are wielding Zen 2 cores. Since AMD's utilizing the Ryzen 5000 branding for Zen 3 products, it's unrealistic to think that the chipmaker would use Zen 3 outside of the moniker.

Given the model names, the unannounced AMD processors could be a refresh of their Ryzen 3000 counterparts. There's also the possibility that processors may be special SKUs for OEMs, and we know how AMD likes producing custom-tailored chips for its partners.

Starting with the Athlon Gold 4100GE, the processor could be a follow-up for the Athlon Gold 3150GE, which is an OEM APU. AMD's Athlon Gold SKUs feature integrated Vega graphics solutions so the Athlon Gold 4100GE shouldn't be an exception. While we don't know the core count or clock speeds for the APU, the GE denomination tells us that the Athlon Gold 4100GE is restricted to a 35W TDP (thermal design power).

On the other hand, it's reasonable to assume that the Ryzen 5 4500 and Ryzen 3 4100 are the direct successors to the Ryzen 5 3500 and Ryzen 3 3100, respectively. For reference, the Ryzen 3 3500 is a hexa-core chip, while the Ryzen 3 3100 is a quad-core part. Both feature Zen 2 cores, adhere to a 65W TDP and lacks integrated graphics. We suspect that the Ryzen 5 4500 and Ryzen 3 4100 will inherit the majority of their predecessors' traits, but plausibly sport higher clock speeds.

It's unknown when AMD submitted the entry to the USB-IF, but it's more than enough evidence that the chipmaker has been preparing the three processors. Perhaps, the chipmaker will launch them silently soon, but only time will tell.

Sours: https://www.tomshardware.com/news/amd-may-be-preparing-new-zen-3-cpus-but-why
LUKE and LEO get technical - The AMD Zen 2 Episode!
AMD 3rd Gen Ryzen architecture deep dive
Just prior to hosting its Next Horizon E3 Livestream event, AMDtreated press and industry analysts to additional deep dive technical details regarding its next-generation Zen 2CPU and Navi GPU microarchitectures. The actual product details disclosed during the E3 Livestream, which included 16-Core Ryzen 9 3950X and Radeon RX 5700 XT specifications and pricing, are covered in our news stories here (Ryzen 9 3950X)and here (Radeon RX 5700/5700 XT). 
ryzen 3000 family

In this piece, however, we’re going to dig a little deeper and cover the new aspects of AMD's Zen 2 architecture, and what it means for the company's Ryzen 3000 series family of processors. All of the Navi and RDNA-related juicy details as they pertain to the upcoming Radeon RX 5700 series of graphics cards will be made available in separate article. Between all of the articles, we’ll cover the vast majority of products and technologies AMD will be unleashing on the enthusiast community in early July.

As most of you probably know, Zen 2 is the microarchitecture at the foundation of the forthcoming AMD Ryzen 3000 series of processors. Zen 2 is the next evolution of the Zen microarchitecture that debuted with the original Ryzen processors back in 2016. Zen was further refined and optimized for the current family of 2nd Gen Ryzen processors based on Zen+, but Zen 2 is the true, next-gen microarchitecture AMD will be leveraging in its newest line-up of Ryzen CPUs.

zen2 overview
zen2 integer execution

AMD has made a number of enhancements with Zen 2 in an effort to improve everything from IPC (instructions per clock) and single-thread performance, to multi-thread scaling, latency, and efficiency/power. The company has made claims that IPC has been improved 15% generation over generation (Zen vs. Zen 2), thanks to better branch prediction, higher integer throughput, and reduced effective latency to memory. These gains are over and above the frequency and power benefits inherent to the processor’s more advanced 7nmmanufacturing process.

zen2 fetch

Although Zen 2builds upon the successes of Zen, many changes have been made to the CPU cores. The updated cores have a new TAGE (Tagged Geometry) Branch Predictor, in addition improved instruction pre-fetching, and a re-optimized L1 cache structure with double the micro-Op cache. The new TAGE branch predictor is able to make selections with better accuracy and granularity and is able to manage longer histories for workloads where that is important. The L1 instruction cache has actually been halved down to 32K, but it is now 8-way associative. The L2 cache remains 512K per core, and is 8-way associative as well. The Zen 2 architecture features more L1 and L2 BTB (Branch Target Buffer) entries and a larger 1K indirect target array as well.

zen2 load store

AMD has also increased the Load / Store bandwidth (2 Loads and 1 Store per cycle, 48 entry queue up from 44), Zen 2 has a larger rename space with 180 registers (up from 168), and another Address Generation Unit (AGU) has been added, bringing to total number up to 3 AGUs. Zen 2 can better utilize available CPU resources for increased SMT(Symmetrical Multi-Threading) performance, and it offers a wider, 6 mico-op dispatch as well.

zen2 fpu

AMD has also significantly beefed up Zen 2’s floating point capabilities. Zen 2 doubles FP performance and Load / Store bandwidth from (128-bit to 256-bit), features 2 x 256-bit Fmacs (built as 4 pipes, 2 Fadd, and 2 Fmul), and offers single-op support for AVX-256 instructions. The architecture has also been optimized to reduced contention in Integer execution.

All of this means, when it comes to serious heavy-lifting in multithreaded and math-intensive workloads, Ryzen 3000 processors will offer significant gains over AMD's previous generation chips, in addition to their previously noted straight-up IPC lift. However, to coin a phrase, but wait there's more...
Sours: https://hothardware.com/reviews/amd-zen-2-architecture-explained

2 chips zen

AMD Zen 2 specs, price and release date: all about AMD's newest processor tech

Over the last couple years, AMD has been releasing some of the best processors (CPUs) on the market, and it doesn’t look like it plans to slow down any time soon. Back at CES 2019, AMD announced its Zen 2 architecture, cutting the manufacturing process down to 7 nanometers (nm), and offering greater performance and efficiency. 

Then, at Computex 2019, AMD pulled the veil off of its Ryzen 3rd Generation processors. These chips took advantage of the smaller Zen 2 manufacturing process, bringing a 12-core, 24-thread processor to the mainstream market at less than half the cost of Intel’s 12-core HEDT chip. 

And, if that wasn’t enough, Microsoft took the stage at its E3 2019 keynote, announcing that the system-on-a-chip powering the next Xbox, Project Scarlett, is using Zen 2 cores and AMD Navi graphics

Zen 2 is indeed on a roll. That’s without mentioning the AMD Ryzen Threadripper 3rd Generation that includes the AMD Ryzen Threadripper 3990X that’s coming out very soon. And, even with the next generation Ryzen 4000 that AMD revealed at CES 2020 looming on the horizon, these chips are likely to stay relevant well into the near future.

There is so much more to Zen 2, so we have to dive in and explore everything that this 7nm CPU architecture can do. Be sure to keep this page bookmarked as we’ll keep the article updated with all the latest information.

Cut to the chase

  • What is it? AMD's 7nm CPU architecture
  • When is it out? Out since July 7, 2019
  • How much is it? Starting at $199 (about £160, AU$290)

AMD Zen 2 release date

The AMD Ryzen 3rd Generation processors hit the streets on July 7. These chips are the first consumer-ready processors based on the 7nm Zen 2 architecture, and are also the most affordable. And they’ve finally been followed up with Ryzen 4000 chips for laptops.

We also know that AMD Ryzen Threadripper 3rd Generation’s first two processors, the Threadripper 3960X and the Threadripper 3970X, came out on November 25, 2019, despite rumors that they may be delayed until 2020. Meanwhile, the Threadripper 3990X, with its ridiculous 64 cores, has been slated for release on February 7.

The final Zen 2 product will probably be in the next-generation consoles. We now know that alongside a Navi GPU, a bespoke 8-core AMD Zen 2 chipset will be inside the PS5. However, the PS5 won’t be out until late 2020. Similarly, the next Xbox will be touting a custom-designed AMD processor that’s based on Zen 2, along with an AMD Navi GPU. That console also won’t be out until late 2020. We’ll probably see both the next gen consoles release around November 2020.

AMD Zen 2 price

Now that most of the AMD Ryzen 3000 chips and a couple of Threadripper 3rd-generation processors are out, we have information on their respective prices. Below are how much each Ryzen 3rd-generation chip costs:

  • AMD Ryzen 9 3950X: $749 (about £570, AU$1,070)
  • AMD Ryzen 9 3900X: $499 (about £390, AU$720)
  • AMD Ryzen 7 3800X: $399 (about £310, AU$580)
  • AMD Ryzen 7 3700X: $329 (about £260, AU$480)
  • AMD Ryzen 5 3600X: $249 (about £200, AU$360)
  • AMD Ryzen 5 3600: $199 (about £160, AU$290)

The AMD Threadripper 3rd-generation chips that are either out now or have been revealed will set you back as follows:

  • AMD Ryzen Threadripper 3960X: $1,399 (about £1,070, AU$2,000)
  • AMD Ryzen Threadripper 3970X: $1,999 (about £1,525, AU$2,860)
  • AMD Ryzen Threadripper 3990X: $3,990 (about £3,050, AU$5,715)

It will be interesting to see, however, if the massive boost to technology will see the next generation consoles get a price bump. With all the lofty technology Microsoft and Sony are promising, we wouldn’t be surprised if these consoles are more expensive than previous generations.

AMD Zen 2 specs and performance

With the move to 7nm, the biggest improvements are to power efficiency. AMD Ryzen 3rd Generation processors see power requirements greatly decrease, which should result in lower temperatures, better overclocking and, of course, lower power bills.

For instance, the AMD Ryzen 7 3700X only has a 65W TDP, which is extremely low for an 8-core, 16-thread processor. It’s also capable of delivering raw performance that would take other processors much more power to equal.

As for core counts, the chiplets containing the physical cores have shrunk for Zen 2, which means that each processor can fit more cores. This hasn’t been implemented on most of the lineup, as the Ryzen 7 processors still have 8-cores. 

However, the AMD Ryzen 9 3900X boasts 12 cores and 24 threads while the Ryzen 9 3950X, which is already breaking world overclocking records thanks to this die shrink, boasts 16 cores and 32 threads.

Beyond core counts, Zen 2 allows for better performance overall. Not only do clock speeds see an improvement – up to 4.6GHz on the Ryzen 9 3900X and up to 5GHz on the Ryzen 9 3950X out of the box – but also a massive boost to IPC (instructions per clock) performance. AMD engineers have apparently squeezed an extra 15% IPC out of Zen 2 cores. 

As far as the HEDT processors, they already boast up to a whopping 64 cores and 128 threads, a major boost from the previous generation’s up to 32 cores and 64 threads. Well, that is, when the Threadripper 3990X comes out in February. The Threadripper 3960X starts lower, however, with 24 cores and 48 threads, while the Threadripper 3970X equals Ryzen 2000’s 32 cores and 64 threads.

We’ll keep this page updated, especially as soon as we get more information regarding the next Ryzen 3000 and Threadripper 3rd-generation releases, so keep this page bookmarked.

Images Credit: TechRadar

Bill Thomas (Twitter) is TechRadar's computing editor. They are fat, queer and extremely online. Computers are the devil, but they just happen to be a satanist. If you need to know anything about computing components, PC gaming or the best laptop on the market, don't be afraid to drop them a line on Twitter or through email.

Sours: https://www.techradar.com/news/amd-zen-2
AMD Zen, Zen+, and Zen 2 EXPLAINED

Disclaimer:

When you click on links to various merchants on this site and make a purchase, this can result in this site earning a commission. Affiliate programs and affiliations include, but are not limited to, the eBay Partner Network.

As an Amazon Associate I earn from qualifying purchases.

This page includes affiliate links for which the administrator of GadgetVersus may earn a commission at no extra cost to you should you make a purchase. These links are indicated using the hashtag #ad.


Information:

We do not assume any responsibility for the data displayed on our website. Please use at your own risk. Some or all of this data may be out of date or incomplete, please refer to the technical page on the respective manufacturer's website to find the latest up-to-date information regarding the specifics of these products.


Trading System ForexW3AskW3Ask BrasilW3Ask DeutschlandW3Ask EspañaW3Ask FranceW3Ask ItaliaW3Ask Nederland

Sours: https://gadgetversus.com/processor/amd-zen-2-processors-list/

You will also like:

News Posts matching #Zen 2

Return to Keyword Browsing

by btarunrDiscuss (26 Comments)

AMD disrupted a decade of $350 quad-core from Intel with its path-breaking Ryzen processor and the "Zen" microarchitecture, which enters 5th year in the market (5 years since tapeout). AMD went into the Ryzen processor launch as a company that had been written off in the CPU space by PC enthusiasts, and "Zen" was at best expected to give AMD another round of processors to sell around $250. Boy was everyone wrong. The Ryzen 7 1800X eight-core processor brought HEDT-levels of performance to the mainstream desktop form-factor, and its HEDT counterpart, the Threadripper, dominated Intel's Core X series ever since.

Intel's first response to the 1800X was a 50% increase in CPU core counts calculating that AMD would only see marginal IPC increases going forward, and the superior IPC of "Skylake" cores, along with a 6-core/12-thread setup in the Core i7-8700K would see things through. This is roughly when Intel faced severe supply shortages that spiraled prices out of control, giving AMD space to come out with the Ryzen 7 2700X with a 4% IPC increase, and improved multi-threaded performance, but more importantly, predictable pricing at around $330. Months later, Intel refreshed its lineup with the 9th Gen, and finally attained parity with AMD in core counts, with the Core i9-9900K.

Read full story

by btarunrDiscuss (41 Comments)

Cybersecurity researchers Saidgani Musaev and Christof Fetzer with the Dresden Technology University discovered a novel method of forcing illegal data-flow between microarchitectural elements on AMD processors based on the "Zen+" and "Zen 2" microarchitectures, titled "Transient Execution of Non-canonical Accesses." The method was discovered in October 2020, but the researchers followed responsible-disclosure norms, giving AMD time to address the vulnerability and develop a mitigation. The vulnerability is chronicled under CVE-2020-12965 and AMD Security Bulletin ID "AMD-SB-1010."

The one-line summary of this vulnerability from AMD reads: "When combined with specific software sequences, AMD CPUs may transiently execute non-canonical loads and store using only the lower 48 address bits, potentially resulting in data leakage." The researchers studied this vulnerability on three processors, namely the EPYC 7262 based on "Zen 2," and Ryzen 7 2700X and Ryzen Threadripper 2990WX, based on "Zen+." They mention that all Intel processors that are vulnerable to MDS attacks "inherently have the same flaw." AMD is the subject of the paper as AMD "Zen+" (and later) processors are immune to MDS as demonstrated on Intel processors. AMD developed a mitigation for the vulnerability, which includes ways of patching vulnerable software.

Find the security research paper here(PDF), and the AMD security bulletin here. AMD's mitigation blueprint can be accessed here.

by RaevenlordDiscuss (32 Comments)

Valve's Steam Deck announcement took the gaming world by storm last week, as the announcement of a Valve-designed portable gaming console packing an AMD Zen 2 CPU with RDNA2 cores set collective imaginations on fire. However, as is the case for any recent gaming hardware launches, expect the Steam Deck to be hard to come by - demand for a mainstream portable, Switch-like console that promises to enable AAA-gaming on the go is apparently sky-high, despite the fact that some portable devices exploring the same concept have been available for a while now, such as the AYA Neo (which even packs two extra Zen 2 cores) and the Intel-based One XPlayer.

As is the case for any recent hardware launch that garners enough mainstream attention (looking at you, current-gen GPUs and consoles), a lopsided demand-supply ratio is a playground for unscrupulous types looking to make a profit at the expense of other people's impatience. And it sure is happening already - eBay listings for "pre-order confirmed" Steam Deck variants are already being set at €4,324 (roughly $4,989) - though we'd say they're tentatively set at that ludicrous pricing. It seems that the current median asking price sits around the $900 mark for the 512 GB SSD-equipped variant. Tentative or not, this just goes to show that the new normal is for launched products to be actively gauged for scalping practices - more now than ever before.

by btarunrDiscuss (59 Comments)

Valve today announcedits first big splash into the console market with Steam Deck, a device out to eat the Nintendo Switch's lunch. The announcement comes as yet another feather in AMD's cap for its semi-custom SoC business, benefiting from being the only company with an x86-64 CPU license and having a cutting-edge graphics hardware IP. Built on the 7 nm node at TSMC, the semi-custom chip at the heart of the Steam Deck is designed for extended gameplay on battery, and is a monolithic silicon that combines CPU, GPU, and core-logic.

The yet-unnamed semi-custom chip features a 4-core/8-thread CPU based on the "Zen 2" microarchitecture, with a nominal clock speed of 2.40 GHz, and up to 3.50 GHz boost. The CPU component offers an FP32 throughput of 448 GFLOP/s. The GPU is based on AMD's latest RDNA2 graphics architecture—the same one powering the Xbox Series X, PlayStation 5, and Radeon RX 6900 XT—and is comprised of 8 RDNA2 compute units (512 stream processors). The GPU operates at an engine clock speed of 1.10 GHz to 1.60 GHz, with peak compute power of 1.6 TFLOP/s. The silicon uses a unified memory interface, and a cutting-edge LPDDR5 memory controller.

Read full story

by btarunrDiscuss (43 Comments)

Indian PC components retailer PrimeABGB started listing pre-built desktops based on the AMD 4700S Desktop Kit, a PC motherboard based on harvested PlayStation 5 SoCswith their iGPUs disabled. These are semi-custom SoCs originally bound for Sony, which didn't make the cut, as their iGPUs were found defective.

It appears like the desktop PrimeABGB is selling for the equivalent of $600, is integrated in-house by the retailer, and the other parts that make up the build are certainly of a comparable quality to the ones large OEMs cram in their $600 desktops. These include a SilverStone Sugo 13 Mesh case, an Antec Atom 450 W PSU, a 120 GB SATA 6 Gbps SSD, and a GeForce GT 710 handling graphics on par with basic iGPU solutions. What you're getting, though, is an 8-core/8-thread "Zen 2" CPU that's highly capable for productivity tasks, and hardwired 16 GB memory.

by btarunrDiscuss (33 Comments)

Back in May, pictures surfacedof a curious-looking Micro-ATX motherboard featuring a so-called "AMD 4700S" SoC. At the heart of these boards were an SoC not unlike the one that powers the Xbox Series X, except that the integrated GPU is completely disabled, with no onboard display outputs. The board is very likely a means for AMD to harvest Xbox Series X/S SoCs with broken iGPUs. It now turns out that the board is an official AMD product, named "AMD 4700S 8-core Processor Desktop Kit."

The board provides an 8-core/8-thread CPU based on the "Zen 2" microarchitecture, no iGPU, but a PCI-Express x16 slot that's electrically PCI-Express 2.0 x4, a handful USB 2.0 and USB 3.0 ports, two SATA 6 Gbps ports, onboard 1 GbE LAN and 6-channel HD audio. The SoC comes with its own unspecified amount of onboard memory in the form of hardwired DDR4 memory chips surrounding it; there are no additional memory slots. The Xbox Series X SoC features a 256-bit wide memory bus, so it will be interesting to see if AMD has maximized it. AMD didn't reveal pricing or availability information, although they way this is marketed, the board will very likely be available in the retail channel.

by UskompufDiscuss (12 Comments)

by RaevenlordDiscuss (8 Comments)

AMD has revealed the Infinity Cache size for the upcoming Navi 23 GPU, as well as its absence in the next-generation Van Gogh APU, which features Zen 2 cores and an RDNA GPU. The reveal comes via a new patch done by AMD to the AMKFD, a Linux kernel HSA driver for AMD APUs. The patch file doesn't list Infinity Cache per se, but does clarify the last-level cache for AMD's GPUs - L3, which is essentially the same.

The patch reveals L3 size for Sienna Cichlid (Navi 21), Navy Flounder (Navi 22), and Dimgrey Cavefish (Navi 23). Navi 21 features 128*1024 (128 MB) of Infinity Cache, the just-released Navi 22 has 96 MB, as we know, and according to the file, Navi 23 is bound to feature 32 MB of it. Considering that Van Gogh lacks an infinity Cache, it would seem that it's making use of previous-gen Navi graphics, and won't leverage RDNA2, of which the Infinity Cache is a big part of. It remains to be seen if Van Gogh will materialize in an APU product lineup or if it's a specific part for a customer. It also remains to be seen which RX product will Navi 23 power - if an AMD RX 66000 series, or 6500 series.

by AleksandarKDiscuss (24 Comments)

AMD is slowly preparing to launch its next-generation client-oriented accelerated processing unit (APU), which is AMD's way of denoting a CPU+GPU combination. The future design is codenamed after Van Gogh, showing AMD's continuous use of historic names for their products. The APU is believed to be a design similar to the one found in the SoC of the latest PlayStation 5 and Xbox Series X/S consoles. That means that there are Zen 2 cores present along with the latest RDNA 2 graphics, side by side in the same processor. Today, one of AMD's engineers posted a boot log of the quad-core Van Gogh APU engineering sample, showing some very interesting information.

The boot log contains information about the memory type used in the APU. In the logs, we see a part that says "[drm] RAM width 256bits DDR5", which means that the APU has an interface for the DDR5 memory and it is 256-bit wide, which represents a quad-channel memory configuration. Such a wide memory bus is typically used for applications that need lots of bandwidth. Given that Van Gogh uses RDNA 2 graphics, the company needs a sufficient memory bandwidth to keep the GPU from starving for data. While we don't have much more information about it, we can expect to hear greater details soon.

by btarunrDiscuss (39 Comments)

AMD in its "Where Gaming Begins Episode 3" online event, announced that it is introducing Smart Access Memory (resizable base address register) support to Ryzen 3000 series "Matisse" processors, based on the "Zen 2" microarchitecture. These exclude the Ryzen 3 3200G and Ryzen 5 3400G. The PCI-SIG innovated feature was, until now, restricted to the Ryzen 5000 series on the AMD platform, although is heavily proliferated across the Intel platform. Resizable BAR enables the CPU to see the graphics card's entire dedicated memory as one addressable block, rather than through 256-megabyte apertures. For game engines that are able to take advantage of the feature, this could translate to a performance boost of up to 16 percent. Be on the lookout for BIOS updates from your motherboard manufacturer.

by AleksandarKDiscuss (13 Comments)

When AMD and Sony collaborated on making the next generation console chip, AMD has internally codenamed it Flute, while Sony codenamed it Oberon or Ariel. This PlayStation 5 SoC die has today been pictured thanks to the Fritzchens Fritz and we get to see a closer look at the die internals. Featuring eight of AMD's Zen2 cores that can reach frequencies of up to 3.5 GHz, the CPU is paired with 36 CU GPU based on the RDNA 2 technology. The GPU is capable of running at speed of up to 2.23 GHz. The SoC has been made to accommodate all of that hardware, and bring IO to connect it all.

When tearing down the console, the heatsink and the SoC are connected by liquid metal, which is used to achieve the best possible heat transfer between two surfaces. Surrounding the die there is a small amount of material used to prevent liquid metal (a conductive material) from possibly spilling and shorting some components. Using a special short wave infrared light (SWIR) microscope, we can take a look at what is happening under the hood without destroying the chip. And really, there are a few distinct areas that are highlighted by the Twitter user @Locuza. As you can see, the die has special sectors with the CPU complex and a GPU matrix with plenty of workgroups and additional components for raytracing.

Read full story

by AleksandarKDiscuss (62 Comments)

AMD is always in development mode and just when they launch a new product, the company is always gearing up for the next-generation of devices. Just a few months ago, back in November, AMD has launched its Zen 3 core, and today we get to hear about the next steps that the company is taking to stay competitive and grow its product portfolio. In the AnandTech interview with Dr. Lisa Su, and The Street interview with Rick Bergman, the EVP of AMD's Computing and Graphics Business Group, we have gathered information about AMD's plans for Zen 4 core development and RDNA 3 performance target.

Starting with Zen 4, AMD plans to migrate to the AM5 platform, bringing the new DDR5 and USB 4.0 protocols. The current aim of Zen 4 is to be extremely competitive among competing products and to bring many IPC improvements. Just like Zen 3 used many small advances in cache structures, branch prediction, and pipelines, Zen 4 is aiming to achieve a similar thing with its debut. The state of x86 architecture offers little room for improvement, however, when the advancement is done in many places it adds up quite well, as we could see with 19% IPC improvement of Zen 3 over the previous generation Zen 2 core. As the new core will use TSMC's advanced 5 nm process, there is a possibility to have even more cores found inside CCX/CCD complexes. We are expecting to see Zen 4 sometime close to the end of 2021.

Read full story

by UskompufDiscuss (30 Comments)

VideoCardz has recently received a render of the upcoming AMD Ryzen 5000 Cezanne APU which is expected to be unveiled next week. The Zen 3 Cezanne APUs support up to 8 cores and 16 threads just like Zen 2 Renoir APUs. The Cezanne APU should support up to 8 graphics cores and 20 PCIe lanes, it is currently unknown whether these lanes will be PCIe 3.0 or PCIe 4.0. The Cezanne die appears to be ~10% larger than Renoir which comes from the larger Zen 3 core design and a larger L3 cache of 16 MB. The new Ryzen 5000H Cezanne series processors are expected to be announced by AMD next week and will power upcoming low and high power laptops.

by AleksandarKDiscuss (40 Comments)

With the launch of AMD's next-generation mobile processors just around the corner, with an expected launch date in the beginning of 2021 at the CES virtual event. The Cezanne lineup, as it is called, is based on AMD's latest Zen 3 core, which brings many IPC improvements, along with better frequency scaling thanks to the refined architecture design. Today, we get to see just how much the new Cezanne generation brings to the table thanks to the GeekBench 5 submission. In the test system, a Ryzen 5 5600H mobile processor was used, found inside of a Xiaomi Mi Notebook, paired with 16 GB of RAM.

As a reminder, the AMD Ryzen 5 5600H is a six-core, twelve threaded processor. So you are wondering how the performance looks like. Well, in the single-core test, the Zen 3 enabled core has scored 1372 points, while the multi-threaded performance result equaled 5713 points. If we compare that to the last generation Zen 2 based "Renoir" design, the equivalent Ryzen 5 4600H processor, the new design is about 37% faster in single-threaded, and about 14% faster in multi-threaded workloads. We are waiting for the announcement to see the complete AMD Cezanne lineup and see the designs it will bring.

by RaevenlordDiscuss (42 Comments)

An investigative, generation-upon-generation review from golem.de paints an extremely impressive picture for AMD's efforts in iterating upon their original Zen architecture. While the first generation Zen achieved a sorely needed inflection point in the red team's efforts against arch-rival Intel and its stranglehold in the high-performance CPU market, AMD couldn't lose its vision on generational performance improvements on pain of being steamrolled (pun intended) by the blue giant's sheer scale and engineering prowess. However, perhaps this is one of those showcases of "small is nimble", and we're now watching Intel slowly changing its posture, crushed under its own weight, so as to offer actual competition to AMD's latest iteration of the Zen microarchitecture.

The golem.de review compares AMD's Zen, Zen+, Zen 2 and Zen 3 architectures, represented by the Ryzen 7 1800X, Ryzen 7 2700X, Ryzen 7 3800X and Ryzen 7 5800X CPUs. Through it, we see a generational performance increase that mostly exceeds the 20% performance points across every iteration of Zen when it comes to both gaming and general computing workloads. This generational improvement hits its (nowadays) most expressive result in that AMD's Ryzen 7 5800X manages to deliver 89% higher general computing, and 84% higher gaming performance than the company's Zen-based Ryzen 7 1800X. And this, of course, ignoring performance/watt improvements that turn the blue giant green with envy.

by btarunrDiscuss (14 Comments)

AMD Ryzen Threadripper Pro line of HEDT/Workstation processors were a nothingburger for the DIY PC crowd as it was launched exclusively through Lenovo for its ThinkStation P620 line of workstations. These processors are a step-up from the retail Threadripper 3000 series, as they feature the full 8-channel DDR4 memory interface, and 128 PCI-Express Gen 4 lanes of the "Zen 2" based "Rome" MCM. The retail Threadripper 3000 chips only feature a quad-channel (4-channel) memory interface.

GIGABYTE has developed a custom server motherboard based on the AMD WRX80 chipset that drives the Lenovo ThinkStation P620. The new WRX80 SU8 motherboard by GIGABYTE features a single sWRX8 CPU socket, supporting Threadripper Pro processors up to the Threadripper Pro 3995WX. It features seven PCI-Express 4.0 x16 slots, three 64 Gbps U.2 ports, two M.2-22110 slots, and eight DDR4 DIMM slots, each with its own dedicated memory channel. GIGABYTE also used the lavish PCIe budget of this platform to give the board dual 10 GbE interfaces. The board also comes with an ASPEED IPMI remote management chip. GIGABYTE is a server vendor, and this board's unveiling could hint at the likelihood of AMD opening up availability of the Threadripper Pro to other OEM vendors, ending Lenovo's exclusivity.

by btarunrDiscuss (103 Comments)

CD Projekt RED released updated PC system requirements lists for "Cyberpunk 2077," which will hopefully release before the year 2077. There are a total of seven user experience grades, split into conventional raster 3D graphics, and with raytracing enabled. The bare minimum calls for at least a GeForce GTX 780 or Radeon RX 480; 8 GB of RAM, Core i3 "Sandy Bridge" or AMD FX "Bulldozer," and 64-bit Windows 7. The 1080 High grade needs at least a Core i7 "Haswell" or Ryzen 3 "Raven Ridge" processor, 12 GB of RAM, GTX 1060 6 GB or GTX 1660 Super or RX 590 graphics. The 1440p Ultra grade needs the same CPUs as 1080p High, but with steeper GPU requirements of at least an RTX 2060 or RX 5700 XT.

The highest sans-RT grade, 4K UHD Ultra, needs either the fastest i7-4790 "Haswell" or Ryzen 5 "Zen 2" processor, RTX 2080 Super or RTX 3070, or Radeon RX 6800 graphics. Things get interesting with the three lists for raytraced experience. 1080p Medium raytraced needs at least an RTX 2060; 1440p High raytraced needs an RTX 3070, and 4K UHD Ultra raytraced needs at least a Core i7 "Skylake" or Ryzen 5 "Zen 2" chip, and RTX 3080 graphics. All three raytraced presets need 16 GB of RAM. Storage requirements across the board are 70 GB, and CDPR recommends the use of an SSD. What's interesting here is that neither the RX 6800 nor RX 6800 XT make it to the raytraced list (despite the RX 6800 finding mention in the non-raytraced lists). PC Gamer reports that Cyberpunk 2077 will not enable raytracing on Radeon RX 6800 series at launch. CDPR, however, confirmed that it is working with AMD to optimize the game for RDNA2, and should enable raytracing "soon."

Press Release by btarunrDiscuss (3 Comments)

AMD today launched a new product in its high-performance Embedded processor family, the AMD Ryzen Embedded V2000 Series processor. Built on the innovative 7 nm process technology, "Zen 2" cores and high-performance AMD Radeon graphics, the AMD Ryzen Embedded V2000 Series provides a new class of performance with 7 nm technology, incredible power efficiency and continues to deliver enterprise-class security features for embedded customers.

The AMD Embedded Ryzen V2000 family is designed for embedded applications such as Thin Client, MiniPC and Edge systems. Equipped with up to eight CPU cores and seven GPU compute units, a single AMD Ryzen Embedded V2000 Series processor provides 2x the multi-threaded performance-per-watt, up to 30 percent better single-thread CPU performance and up to 40 percent better graphics performance over the previous generation. For customers and applications that need high-performance display capabilities, the Ryzen Embedded V2000 series can power up to four independent displays in 4K resolution.

Read full story

by btarunrDiscuss (24 Comments)

There's been much chatter in the social media about a new piece of AMD APU silicon, codenamed "Lucienne." It's being rumored that "Lucienne" is a refresh of the current-generation "Renoir" silicon, and is an APU with eight "Zen 2" CPU cores and eight "Vega" NGCUs. One of the first SKUs based on the die is the Ryzen 7 5700U, which surfaced on the AoTS benchmark database.

The 5700U is possibly a 15 W ultra-portable processor, and according to the AoTS benchmark screenshot, it comes with an 8-core/16-thread CPU (the 4700U is 8-core/8-thread). The addition of SMT helps the 5700U shore up much of its performance lead over the 4700U. It also turns out that the Ryzen 5000 will see two APU dies driving AMD's product-stack, with "Lucienne" powering the Ryzen 5 5500U and Ryzen 7 5700U; while the newer "Cezanne" die, which introduces "Zen 3" CPU cores, powers the Ryzen 5 5600U and the Ryzen 7 5800U.

by AleksandarKDiscuss (49 Comments)

Currently, Intel's best silicon manufacturing process available to desktop users is their 14 nm node, specifically the 14 nm+++ variant, which features several enhancements so it can achieve a higher frequencies and allow for faster gate switching. Compare that to AMD's best, a Ryzen 3000 series processor based on Zen 2 architecture, which is built on TSMC's 7 nm node, and you would think AMD is in clear advantage there. Well, it only sort of is. German hardware overclocker and hacker, der8auer, has decided to see how one production level silicon compares to another, and he put it to the test. He decided to use Intel's Core i9-10900K processor and compare it to AMD's Ryzen 9 3950X under a scanning electron microscope (SEM).

First, der8auer took both chips and detached them from their packages; then he proceeded to grind them as much as possible so SEM could do its job of imaging the chips sans the substrate and protective barrier. This was followed by securing the chips to a sample holder using an electrically conductive adhesive to improve penetration of the high energy electrons from the SEM electron gun. To get as fair a comparison as possible, he used the L2 cache component of both processors as they are usually the best representatives of a node. This happens because the logic portion of the chip differs according to architecture; hence, level two cache is used to get a fair comparison - it's design is much more standardized.

Read full story

by btarunrDiscuss (78 Comments)

ClockTuner for Ryzen (CTR) by Yuri "1usmus" Bubliy, is an evolution of the DRAM Calculator for Ryzen utility. The utility goes beyond the functionality of the DRAM Calculator - which finds the most precise memory settings for Ryzen processors - and does your homework for Ryzen CPU overclocking. Optimized for processors based on the "Zen 2" microarchitecture, CTR has been designed both for Socket AM4 and sTRX4 (Threadripper) processors, and Linus Tech Tips in its announcement videoof CTR demonstrated the tool's prowess in squeezing out a neat 10% performance gain for their Threadripper 3960X processor. Besides CPU and memory settings, the tool performs stability testing and benchmarking. 1usmus expects to release CTR 1.0 in September 2020.

by btarunrDiscuss (10 Comments)

A May 2020 reportput together with info from multiple sources pointed towards AMD's client-segment product roadmap going as far into the future as 2022. The roadmap was partial, with a few missing bits. VideoCardz attempted to reconstruct the roadmap based on new information from one of the primary sources of the May leak, @MeibuW. According to the roadmap, 2020 will see AMD debut its 4th Gen Ryzen "Vermeer" desktop processors featuring "Zen 3" CPU cores, built on TSMC N7e or N7P silicon fabrication process, and offering PCIe Gen 4. The "Renoir" APU silicon combining up to 8 "Zen 2" CPU cores with a 512-SP "Vega" iGPU debuted on the mobile platform, and recently launched on the desktop platform as an OEM-exclusive. It remains to be seen if AMD launches this in the DIY retail channel.

2021 is when three new codenames from AMD get some air-time. "Warhol" is codename for the 5th Gen Ryzen part that succeeds "Vermeer." Interestingly, it too is shown as a combination of "Zen 3" CPU cores, PCIe Gen 4, and 7 nm. Perhaps AMD could innovate in areas such as DRAM (switch to PC DDR5), and maybe increase core counts. DDR5 could herald a new socket, after 4 years of AM4. The second silicon bound for 2021 is "Van Gogh," an APU that combines "Zen 2" CPU cores with an RDNA2 iGPU. Interestingly, "Cezanne," bound for the same year, has the opposite CPU+iGPU combination - a newer gen "Zen 3" CPU component, and an older gen "Vega" iGPU. The two chips could target different markets, looking at their I/O, with "Van Gogh" supporting LPDDR5 memory.

by btarunrDiscuss (38 Comments)

Microsoft in its Hot Chips 32 presentation detailed the SoC at the heart of the upcoming Xbox Series X entertainment system. The chip mostly uses AMD IP blocks, and is built on TSMC N7e (enhanced 7 nm) process. It is a 360.4 mm² die with a transistor count of 15.3 billion. Microsoft spoke about the nuts and bolts of the SoC, including its largest component - the GPU based on AMD's new RDNA2 graphics architecture. The GPU takes up much of the chip's die area, and has a raw SIMD throughput of 12 TFLOP/s. It meets DirectX 12 Ultimate logo requirements, supporting hardware-accelerated ray-tracing.

The RDNA2 GPU powering the Xbox Series X SoC features 52 compute units spread across 26 RDNA2 dual compute units. The silicon itself physically features two additional dual CUs (taking the total physical CU count to 56), but are disabled (possibly harvesting headroom). We've detailed first-generation RDNA architecture in the "architecture" pages of our first AMD Radeon RX 5000-series "Navi" graphics card reviews, which explains much of the SIMD-level innovations from AMD that help it drive a massive SIMD IPC gain over the previous-generation GCN architecture. This hierarchy is largely carried over to RDNA2, but with the addition of a few SIMD-level components.

Read full story

by AleksandarKDiscuss (28 Comments)

Last year, in November of 2019, a startup company called NUVIA Inc. broke out of the stealth mode and decided to reveal itself to the public. Focused on "re-imagining silicon", the company is led by some of the brightest minds in the semiconductor industry. Some people like Gerard Williams III, the CEO of the company, previously served as a chief CPU architect at Apple and has spent over 10 years at Arm before that. Others like Manu Gulati and John Bruno serve as senior vice presidents of silicon and system engineering respectively. Together, their people are forming a company full of well-known industry names. Of course, there are more and you should check out this page.

NUVIA Inc. promises to deliver only the best performance and "re-imagine silicon" as they say. Today, we got some bold claims from the company regarding the performance of their upcoming Phoenix SoC. Using Geekbench 5, the company has provided some simulated results of how the Phoenix SoC will perform. Being that it runs on Arm ISA, the SoC can run at very low power and achieve good performance. NUVIA has run some simulations and it expects its Phoenix SoC to be 40-50% faster in single-threaded performance than Zen 2/Sunny Cove at just a third of the power, 33% of the percent of power to be precise. In the graph below, NUVIA has placed its SoC only in 5 W range, however, the company said that they have left the upper curve to be disclosed at later date, meaning that the SoC will likely compete in high-performance markets and at higher power targets. While these claims are to be taken with a grain of salt, it is now a waiting game to see how NUVIA realizes its plans.
NUVIA Inc. LogoNUVIA Phoenix SoC Performance

by btarunrDiscuss (2 Comments)

Intel will launch its 11th Generation Core "Tiger Lake" mobile processors on September 2. The company sent out invites to a "virtual event" to be held on that date, which will be webcast to the public. On that day, several major notebook manufacturers are expected to unveil their next-generation devices based on the new processors. "Tiger Lake" is an important product launch for Intel as it marks the commercial debut of its ambitious Xe graphics architecture as the chip's Gen12 integrated graphics solution. In related news, Intel's chief architect for Xe, Raja Koduri, is expected to lead a webcast on August 13, where he will provide an update on his team's work.

The processors also debut the "Willow Cove" CPU cores that offer increased IPC over current "Sunny Cove" and "Skylake" cores, which will play a big role in closing the performance gap against the 8-core "Zen 2" processors by AMD based on the "Renoir" silicon. "Tiger Lake" is also expected to be one of the final front-line mobile processors by Intel to feature only one kind of CPU cores, as the company is expected to go big on Hybrid core technology with its future microarchitectures.
Return to Keyword Browsing
Sours: https://www.techpowerup.com/news-tags/Zen%202


425 426 427 428 429