Because I think x64 systems' main advantage over ARM64 ones is in memory subsystem performance. ARM systems are descended from mobile applications where you get narrow LPDDR and small caches. Intel systems are descended from desktop systems where you get wide DDR and large caches.
One advantage of high main memory system bandwidth, and indeed HBM, is that the penalty for caches misses is so much less. If you can pull 1024 bits out of HBM in one read rather than 64-128 bits out of (LP)DDR it matters a lot less if you get a cache miss.
Actually it turns out that Fujitsu makes an ARM chip with HBM
https://www.networkworld.com/article/3535812/can-fujitsu-beat-nvidia-in-the-hpc-race.html
https://archive.vn/v9wxl
Here's where the link goes
https://web.archive.org/web/2020041....gov/srt/conferences/Scala/2019/keynote_2.pdf
Unfortunately, all the benchmarks are HPC ones which are probably very memory intensive. They're also comparing it against an
Intel® Xeon® Platinum 8168 Processor (
archive),
which is a Many Integrated Cores design - lots of in-order cores. So it's pretty different from a desktop or server Intel chip. Edit : it's a Skylake chip, just with a lot of cores.
There's an argument that you need some 'big' i.e. out of order cores for desktop stuff. Though it's worth pointing out that the A12Z is already a big/little design.
Intel had some very interesting results for Larabee which was also a MIC design - you can have a lot more in order cores than out of order ones. Larrabee used a core that was derived from the P54C but had x64 and AVX-512 added. The Intel Xeon Platinum 8168 used Atom cores, once again with AVX-512. Anyway, Intel had a paper where they were running Windows on some of the cores and using the rest for a GPU and they got
very impressive scaling of DirectX performance with the number of cores. Larrabee and the Xeon are both based on DDR memory, though the Xeon has 6 channels of it.
The A64FX is supposed to be a chip where you can run either CPU or GPU tasks on the cores and you've got vast amounts of memory bandwidth compared to a traditional DDR design, even with a wide DDR interface.
And, like I said, HBM3 is supposed to be cheaper than current HBM. HBM or WideIO are lower power than DDR too because you can drive low voltage signals a very short distance over a very carefully tuned transmission line. One stack of HBM on the same module as the SOC or even WideIO memory mounted package on package on top of the SOC are both going to use less power and deliver more bandwidth than 6-8 channels of DDR.
Incidentally, if you look at Apple Silicon on Wikipedia you see this
https://en.wikipedia.org/wiki/Apple_Silicon#A_series_list (
archive)
View attachment 1457096
So one of the things they did for the A12Z for the iPad Pro and Developer Transition kit was to double the DDR channel width from 64 bit to 128. Obviously Wide IO or HBM would be pushing this further.
And if you look at the patent and job vacancy I linked to earlier you can sort of infer that Apple are at least looking into this.
If I were them I'd go for Wide IO for the ARM Macbook Air and HBM for the Mac Pro/Macbook Pro. Then again Apple being Apple they could probably just chuck an A12Z in a Macbook Air and sell for more $ than the Intel version based on battery life. Or do A13Z with 256-bit quad-channel LPDDR4X.
In fact, DDR5 is supposed to be 2x the performance of DDR4. So they could double the bus width and use DD5 rather than 4 and get 4x the bandwidth. I still reckon that Wide IO for the Air and HBM for the Pro is the way to go, though. There's going to be a significant advantage both in bandwidth and power consumption.