Apple M2 Pro vs AMD 3950X
A few days ago, I put my hands on a brand new Macbook Pro with a M2 Pro CPU/GPU. As everyone raves about the performance of the new ARM-based Apple Silicon chips, I am very curious to test it. So I benched it against my Windows PC built around an AMD Ryzen 3950X, which was top notch three years ago.
It is almost impossible to compare the performance of ARM and the AMD chips without their surrounding environment: the OS running on the machines, the libraries provided to the software developers and the compilers are different So, I will present the mandatory Geekbench, then the result running applications that I know well and use almost every day. After all, the most important metrics is how well applications run on a machine.
This article will begin with a quick reminder of both architectures, but if you are here for the numbers, you can jump straight to them.
The contenders, a quick overview
AMD Ryzen 3950X
The Ryzen 3950X consists of 16 “Zen2” cores that are grouped together by units of four around 16MiB of L3 cache. Two of those groups of four CPUs are built into a die to form a CCX or chiplet. The 3950X consists of two chiplets that are linked together using a fast interconnect and to the rest of the system (and the RAM) via a discrete I/O core that provides two memory controllers. On my machine, equipped with 64GiB of 3200-DDR4, for a maximum theoretical bandwidth of 51.2 GiB/s.
High level overview of the 3950X architecture
The chiplets and IO dies are grouped into the same package
This chiplet approach is very clever. They are easier to produce than a big monolithic design and can address different markets by embedding a various number of chiplets. For instance, the EPYC chips proposed for servers can support up to 8 chiplets for a total of 64 cores!
But there is a caveat. Inside a chiplet, 4 cores can quickly exchange data through their L3 cache. Talking to the 4 other cores is a bit slower, and dialoguing to the 8 cores in the other chiplet even more.
There is no integrated GPU in the 3950X, so all the GPU computation takes place on an external GPU. A NVIDIA Geforce RTX 2070 Super in my case. It has a fixed amount (8GiB) of high speed dedicated RAM. But communications with the CPU occur with a higher latency than what is possible with integrated GPUs.
The Ryzen 3950X is connected to a discrete GPU card via PCI Express
Apple M2 Pro
The Apple M2 Pro is way more integrated than the Ryzen 3950X. All the CPU cores, the “I/O” and the GPU reside on the same die, forming a gigantic piece of silicon and totaling 40 billion of transistors! Furthermore, the RAM, shared by both the CPUs and the GPUs, is not far away and resides in the same package!
Unlike on the Ryzen, where all cores are equal, on the M2 sports two kinds of cores: efficiency (named Icetorm) and performance (named Avalanche).
The Avalanche cores share a single 32 MiB L2 cache, allowing for very fast inter-core communications, while the Blizzard cores are built around 4 MiB of cache. They are also far slower but consume only a third of the power of the Performance Cores.
Indeed, this is not a chip targeting workstations but laptops, for which energy consumption is an important metric. The Efficiency Cores are key to the very good battery endurance of modern Mac laptops. There will be more about that later.
The M2 Pro tested here is the small tier, with 6 Performance Cores and 4 Efficiency Cores (a.k.a Blizzard). There is a more expensive M2 Pro with 8 Performance Cores available.
A System Level Cache (SLC) is shared between all the CPU cores and the GPU. Its performance is in the same league as the L3 cache of a Ryzen chiplet.
High level overview of the Apple M2 Pro architecture. 2 Avalanche cores are deactivated in my unit.
Four memory controllers link to the RAM with a total bandwidth of 200GiB/s! The GPU has two times less bandwidth than the private memory of the NVIDIA 2070 Super, but the CPU cores have four times more than the Ryzen!
The RAM lies in the same package as the CPU and GPU cores
The RAM resides in the same package. The advantage is that the latency is very low. The disadvantage is that it is impossible to extend. In fact the M2 Pro is a very difficult architecture for Apple to expand. The M2 Max has the same number of CPU cores, albeit with bigger caches, a bigger GPU and more memory bandwidth. If a hypothetical M2 Ultra were to be designed as the M1 Ultra was, it would be two M2 Max glued together.
The M2 architecture is more integrated and provides more resources per core than the Ryzen, but is more expansive and difficult to scale-up.
Incidentally the M2’s CPU cores are ARM based. I won’t spend time discussing whether it is the key to the M2 CPU cores good performances, because as the former lead engineer of Apple Silicon stated: this has no technical significance. Except for the perfect integration between Performance and Efficiency Cores, as a system.
Geekbench is a “synthetic benchmark” that tests single core as well as multicore performance.
The performance of a single Avalanche core is impressive. A single Zen2 core is no slouch, but cannot compete. In a multi-cores scenario, the 3950X takes the lead because it has more cores (16 vs 6 + 4).
Geekbench can also test general computing scenarios running on the GPU. Nowadays more and more software delegate processing to the GPU which is perfectly suited to handle massively parallel tasks.
OpenCL is the open-standard for general computing on a GPU. Here the NVIDIA 2070 Super crushes the GPU of the M2 Pro, which is understandable as it is a discrete card with more processing units and more memory bandwidth. Keep in mind though, that it is limited to its private pool of memory whereas the GPU of the M2 Pro has access to the whole RAM of the computer.
About Geekbench 6
Geekbench 6 was released a few days ago, but I have made the choice to compare using Geekbench 5. Indeed, its author, John Poole, stated that the multi-cores score was modified to better reflect how many cores work together on a tightly shared task. This is a modification that greatly favors the Apple Silicon chip, as all the cores are tightly coupled and share a large pool of low latency memory. The score of Geekbench 6 becomes less accurate with what I observed with my day to day applications. Maybe in the future applications will be written for this kind of workload, but I think it is not the case yet.
Enough with synthetic benchmarks. Here are some common applications that I use day to day on which I could run comparable tests on both machines.
I ran a test “action” on a 50Mpixels 16-bit image. It consists of applying various filters, scaling and duplication to the image and relies mainly on single core performance.
Time in seconds to complete the action
As expected, the M2 Pro is faster than the Ryzen 3950X.
Lightroom is an integrated toolbox for digital photographers. It can also process batches of photos which can be time consuming… and very useful for a benchmark. It makes good usage of many cores and also offers the possibility to optionally offload some of the processing on the GPU.
All the photos processed were 36-Megapixels RAW shots from a Nikon D810. Some of them were processed using complex masks relying on “AI” and some were HDR pictures made from combining multiple shots at different exposures.
The first test is the processing of a small batch of photos. Representative of what a photographer could export from a laptop on the road. During this test the Apple M2 Pro may have not been forced to throttle down. Indeed, it runs inside a laptop and has to slow down if the thermal dissipation is too limited to handle the heat it produces under prolonged heavy usage.
Time in seconds to complete the batch
- CPU only, the result of the M2 Pro is quite good. The Ryzen is faster because of its 16 cores of course.
- With the help of its GPU, the M2 is more than twice faster! It crushes the AMD, albeit there may be a problem on the PC here, as it is slower when helped by its powerful GPU!!
Time in seconds to complete the batch
This second batch is larger, meaning that the M2 Pro was certainly throttled down.
- CPU only, the gap is wider in favor of the AMD Ryzen, because it is more efficiently cooled by a gigantic heatsink and fan.
- The M2 Pro gains less with its GPU, as it is another source of heat. Anyway, it is still sufficient to beat the AMD chip by a significant margin.
Maybe there is a problem with my PC configuration because its GPU is way faster than the M2’s. Or maybe Adobe optimized Lightroom with integrated GPUs that share memory with the CPU in mind.
Anyway, the M2 Pro is faster than my AMD Ryzen 3950X under Lightroom! I would have not believed it beforehand!
But there is no magic at work here. The large batch ate 10% of the Macbook pro’s battery in ten minutes! The Apple Silicon chip is indeed powerful, but is as power hungry as any other processor when delivering all.
A good way to bench CPUs is to compile some code. Finding a code base that is not too dependent on the platform, so that almost the same thing is compiled on a Windows x86 machine and a macOS ARM one is not that easy. I settled on LLVM.
Both machines were equipped with speedy NVMe SSDs and were not IO bound. The compilation was done with clang 14 and ninja. The PC ran clang on Ubuntu running in WSL2, from the internal native filesystem.
Time in seconds
Compiling code is well suited to multiprocessors. It is no surprise that the AMD 3950X is faster than the Apple M2 Pro here.
The Apple M2 Pro deserves its excellent reputation. I can compete and sometimes, with the help of its GPU, beat the AM2 3950X that was one of the most powerful CPUs man could buy only three years ago! It’s incredible that a laptop can pack so much power.
Apple, having the deepest pocket of the industry, can afford the best manufacturing process and used it to design deeply integrated CPU cores, GPU and memory into a single package. The M2 family is certainly expensive to build, maybe hard to scale-up, but perfectly suited where it matters the most for Apple: the laptops.
Apple controls all its products from top to the bottom. It builds computers around its own chips and provides the OS and libraries that run on top of it. Inside its “walled garden”, the software can exploit the hardware in ways that are difficult to achieve on more open platforms. The price to pay is of course that the platform is even more closed than Windows.
Addendum: a word on battery life
Apple Silicon Macbooks are renowned for their CPU power, which as I have tested, is justified. But they are also able to sustain many hours of light usage. How is that possible?
The answer is Blizzard. It is the name of the cores dedicated to energy consumption efficiency. They are slower than the fast Avalanche cores, but also require a lot less power. The M2 Pro embeds four of them. But they would be almost useless if not sensibly used by the scheduler of macOS.
The scheduler offers five levels of task prioritization: background, utility, userInitiated, and userInteractive. The fifth one, auto, is chosen by the scheduler itself if no level has been manually specified by the developer. Background and utility tasks always run on the Blizzard cores, leaving the Avalanche cores idling if there is no intensive or user-interactive task to complete!
Low priority tasks run exclusively on “Efficiency Cores”
This is the case most of the time when doing casual “light computing” and that explains the great battery life of the Mx laptops. It also provides another advantage: the Performance Cores are immediately available to handle user interactions with the machine, giving an impression of great reactivity to the system.
Finally, as I measured when testing Lightroom, the big Avalanche cores and the GPU can swiftly eat a great chunk of the battery when a heavy computation is required.
So, there is no magic in the longevity of the Macbook laptops, but clever engineering. Which is of course helped by the fact that Apple controls the entirety of its ecosystem and that there are not many different hardware configurations to support.