SC23 Because the SC23 convention in Denver, USA, kicks off in earnest, Intel is spilling the tea on the two-phase Daybreak supercomputer it is constructing for the UK with Dell and the College of Cambridge.
The chipmaker touted the system earlier this month in the course of the UK’s AI Summit, claiming will probably be “the UK’s quickest AI supercomputer.”
Emphasis on AI, we predict, as a result of at 19 petaFLOPS of benchmarked FP64 efficiency, Daybreak in its first part solely nearly matches in the present day’s publicly identified quickest UK supercomputer, Scotland’s Archer2, which is at the moment ranked thirty ninth on this planet’s publicly identified Top500. Archer2 manages to high out at 26 petaFLOPS as a theoretical most, or 20 petaFLOPS in benchmarks.
So, Daybreak proper now is not the quickest in Britain at FP64. When you decrease its precision to one thing like FP8 for AI work, then sure, its efficiency will in idea be increased, and it’d due to this fact be the quickest AI machine within the nation (assuming Archer2 could not pull off the identical feat if its operators so desired.) See beneath for extra on that.
And it is nonetheless not clear if Intel says the primary or second part of Daybreak would be the “quickest” within the UK at AI. The second part is about to be ten-times as quick as the primary a part of Daybreak.
In a press briefing forward of SC23, Intel execs mentioned at the very least the primary part system would characteristic 512 4th-gen Xeon Scalable processors and 1,024 Datacenter GPU Max accelerators unfold throughout 256 liquid-cooled Dell PowerEdge XE9640 techniques.
Every node is provided with 1TB of DDR5 reminiscence and 512GB of excessive bandwidth reminiscence. We have additionally discovered every node will make the most of 4 of Nvidia’s Infiniband HDR200 interconnects.
Whereas neither Intel or Dell have shared the main points of what the second part of the challenge will appear like, it is presupposed to, as we mentioned, enhance the system’s capability tenfold.
Because it stands the primary part of the system is rated for a peak output of 53 petaFLOPS of double precision efficiency. Nevertheless, in its first Linpack run, Daybreak managed lower than half that. At 19 petaFLOPS of real-world FP64 efficiency, the system is available in at forty first place within the world Top500.
Intel’s peak efficiency claims would appear to point the chipmaker has managed to work out the kinks in its Ponte Vecchio GPUs, which on paper are good for about 52 teraFLOPS at FP64.
As our sibling publication The Subsequent Platform identified earlier, the Ponte Vecchio elements delivered to Argonne Nationwide Lab for integration into the US-based system have been solely able to delivering 31.5 teraFLOPS of FP64 efficiency — about 61 % of what the datasheet claims.
We have requested Intel for clarification on the GPU Max 1550’s efficiency; we’ll let you realize if we hear something again.
Which means if and when Daybreak’s second part is full, its peak theoretical efficiency ought to be nearer to 532 petaFLOPS at FP64. That may be a large step up from the UK’s Archer2.
If Intel and Dell can enhance the effectivity of the system, Daybreak’s second part ought to rank among the many high 10 quickest supercomputers formally recorded, with efficiency in spitting distance from the Fugaku system, which is rated for 537 petaFLOPS of peak FP64 efficiency.
With that mentioned, precise efficiency within the Linpack bench often is available in a good bit decrease. Whereas Fugaku is rated for 537 petaFLOPS of peak efficiency, in the true world it is nearer to 442 petaFLOPS.
Additional evaluation
Intel’s declare is that Daybreak is the UK’s “quickest AI supercomputer,” and this is the place issues get somewhat fascinating. These GPU Max 1550s are good for 832 teraFLOPS of Mind Float 16 (BF16) math, in line with Intel’s datasheet. In its first part, that put its AI efficiency at 852 petaFLOPS. Except the declare relies on the chip’s integer efficiency, through which case we’re taking a look at 1.7 exaOPS of INT8. Totally constructed, the system might be nearer to between 8.5 exaFLOPS of BF16 and 17 exaOPS of Int8.
Nvidia has made comparable claims in regards to the AI efficiency of the Isambard-AI supercomputer being deployed in collaboration with the College of Bristol, which might be comprised of 5,448 Nvidia GH200 Grace-Hopper Superchips. These elements assist practically 4 petaFLOPS of sparse FP8 efficiency, placing its peak AI efficiency at about 21 exaFLOPs.
Examine pure BF16 efficiency and the finished Daybreak system ought to come out forward. But when your workload can leverage FP8, then the Isambard-AI machine is the one to beat.
After all, all of those estimates assume that Daybreak will ultimately swell to 10,000-plus GPUs and that Intel is definitely getting 52 teraFLOPS of FP64 efficiency from the accelerators. ®