
Along with x86-simd-sort 3.0 being launched for fast AVX-512 sorting, Friday additionally introduced the discharge of oneDNN 3.3 because the deep neural community library that’s a part of oneAPI and targeted on serving to builders construct out deep studying purposes.
Intel oneDNN continues to assist CPU-based execution on not solely x86_64 but additionally AArch64 and POWER and RISC-V whereas additionally supporting AMD and NVIDIA GPU execution along with its Intel graphics assist. The oneDNN library is closely tuned for taking advantage of Intel {hardware} and with oneDNN 3.3 there may be extra Superior Matrix Extensions (AMX) tuning and different alterations to learn the most recent era Xeon Scalable “Sapphire Rapids” processors. Plus oneDNN 3.3 rolls out extra early optimization work for next-generation Granite Rapids and Sierra Forest processors coming in 2024.
The oneDNN 3.3 efficiency optimization work contains:
Intel Structure Processors:
Improved efficiency for 4th era Intel Xeon Scalable processors (previously Sapphire Rapids).
Improved int8 convolution efficiency with zero factors on processors with Intel AMX instruction set assist.
Improved efficiency for the longer term Intel Xeon Scalable processors (code-named Sierra Forest and Granite Rapids). This performance is disabled by default and might be enabled through CPU dispatcher management.
Improved fp32 and int8 convolution efficiency for instances with small numbers of enter channels for processors with Intel AVX-512 and/or Intel AMX instruction set assist.
Improved s32 binary primitive efficiency.
Improved fp16, fp32, and int8 convolution efficiency for processors with Intel AVX2 directions assist.
Improved efficiency of subgraphs with convolution, matmul, avgpool, maxpool, and softmax operations adopted by unary or binary operations with Graph API.
Improved efficiency of convolution for depthwise instances with Graph API.
[experimental] Improved efficiency of LLAMA2 MLP block with Graph Compiler.
Intel Graphics Merchandise:
Improved efficiency for the Intel Knowledge Middle GPU Max Sequence (previously Ponte Vecchio).
Improved efficiency for Intel Arc graphics (previously Alchemist and DG2) and the Intel Knowledge Middle GPU Flex Sequence (previously Arctic Sound-M).
Lowered RNN primitive initialization time on Intel GPUs.
AArch64-based Processors:
Improved fp32 to bf16 reorder efficiency.
Improved max pooling efficiency with Arm Compute Library (ACL).
Improved dilated convolution efficiency for depthwise instances with ACL.
The oneDNN 3.3 launch additionally provides group normalization primitive assist, prolonged verbose mode output, new examples for the oneDNN Graph API, and different adjustments.
Downloads and extra particulars on the oneDNN 3.3 launch through GitHub.
