
Whereas micro-architecture particular optimizations are relatively widespread place throughout the Linux x86_64 kernel for numerous Intel and AMD CPU households with numerous efficiency tips, the ARM64 Linux kernel maintainers are in opposition to introducing new micro-architecture particular optimizations because it impacts new ARM processors.
Ampere Computing despatched out a set of 4 patches offering an optimization for his or her new AmpereOne server processors. Ampere Computing discovered these new excessive core rely ARM server processors may gain advantage from aggressive prefetches when utilizing the 4K web page measurement. The reported profit with HugeTLB or Tmpfs throughout sequential learn efficiency exams was “as much as 1.3 ~ 1.4x.”
“Check outcome:
In hugetlb or tmpfs, We are able to get large seqential learn efficiency enchancment as much as 1.3x ~ 1.4x.”
Whereas these features are thrilling for enhancing the AmpereOne Linux efficiency, it is wanting like that work will not be upstreamed into the mainline Linux kernel.
Distinguished ARM Linux kernel developer Will Deacon commented on the performance-enhancing patches particular to AmpereOne CPUs:
“We are likely to shrink back from micro-architecture particular optimisations within the arm64 kernel as they’re fairly unmaintainable, laborious to check correctly, usually result in bloat and add further obstacles to updating our library routines.
Admittedly, we now have one thing for Thunder-X1 in copy_page() (disguised as ARM64_HAS_NO_HW_PREFETCH) however, frankly, that machine wanted all the assistance it may get and given the place it’s at this time I believe we may drop that code with none materials penalties.
So I might actually favor to not merge this; trendy CPUs ought to do higher at copying information. It is copy_to_user(), not rocket science.”
ARM’s Mark Rutland chimed in to agree with Deacon’s assertion and likewise endorsing the elimination of the Thunder-X1 focused optimization. Kernel developer Marc Zyngier additionally agreed and has already been engaged on a patch to drop that Thunder-X1 particular code.
So within the curiosity of code maintainability and avoiding over-complicating the ARM64 Linux kernel code, they are not after CPU/micro-architecture particular optimizations. We’ll see if this results in any ARM Linux targeted distributions carrying such patches themselves or any AmpereOne-optimized Linux distributions shifting ahead, particularly given Ampere’s deal with excessive efficiency and energy effectivity ARM Linux servers and sure not wanting to go away any optimizations go untouched particularly with their goal of compete with AMD EPYC and Intel Xeon servers.