RNG Performance Improvements on ARM on .NET 7 (v0.4.0)

The upcoming 0.4.0 release brings SIMD acceleration to ARM processors with .NET 7's new Vector128 + Vector256 intrinsics APIs. Instead of needing to reimplement ChaCha hardware acceleration on each and every architecture, the same code is now automatically accelerated on any platform that the runtime provides an implementation for. The upshot of this is that any ARM processor supporting Advanced SIMD (Neon) gets nearly 400% the throughput on .NET 7 when generating random data with ChaCha compared to .NET 6. Other RNGs also improved thanks to general .NET 7 improvements, with Mersenne Twister speeding up by 20% and PCG 32 by 22%.

System Information

These benchmarks were performed on a somewhat noisy Raspberry Pi 4 Model B 8GB.

BenchmarkDotNet=v0.13.2, OS=ubuntu 20.04BenchmarkDotNet=v0.13.2, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.202
  [Host]        : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT AdvSIMD
  .NET 6.0      : .NET 6.0.15 (6.0.1523.11507), Arm64 RyuJIT AdvSIMD
  .NET 7.0      : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT AdvSIMD

Raw Data

RngFill.NET 6.0.NET 7.0Ratio
ChaCha842.49 MB/s169.64 MB/s3.99
ChaCha1229.16 MB/s116.02 MB/s3.98
ChaCha2018.13 MB/s71.18 MB/s3.93
Mt1993764146.89 MB/s176.61 MB/s1.20
Pcg32231.23 MB/s282.61 MB/s1.22
XorShift226.70 MB/s269.47 MB/s1.19
SystemCryptoRng17.98 MB/s17.93 MB/s1.00
CryptoServiceProvider17.99 MB/s17.94 MB/s1.00
SystemRandom42.50 MB/s45.83 MB/s1.08
RngUInt64.NET 6.0.NET 7.0Ratio
ChaCha839.39 MB/s134.18 MB/s3.41
ChaCha1228.34 MB/s98.35 MB/s3.47
ChaCha2017.46 MB/s64.13 MB/s3.67
Mt1993764153.85 MB/s189.76 MB/s1.23
Pcg32326.13 MB/s380.58 MB/s1.17
XorShift308.59 MB/s356.78 MB/s1.16
SystemCryptoRng13.07 MB/s13.06 MB/s1.00
CryptoServiceProvider13.03 MB/s13.06 MB/s1.00
SystemRandom50.19 MB/s52.57 MB/s1.05
RngUInt32.NET 6.0.NET 7.0Ratio
ChaCha838.68 MB/s121.69 MB/s3.15
ChaCha1227.77 MB/s91.41 MB/s3.29
ChaCha2017.20 MB/s61.05 MB/s3.55
Mt199376471.66 MB/s92.76 MB/s1.29
Pcg32438.92 MB/s456.60 MB/s1.04
XorShift439.06 MB/s439.06 MB/s1.00
SystemCryptoRng12.82 MB/s12.82 MB/s1.00
CryptoServiceProvider12.81 MB/s12.73 MB/s0.99
SystemRandom42.77 MB/s53.37 MB/s1.25