The upcoming 0.4.0 release brings SIMD acceleration to ARM processors with .NET 7's new Vector128 + Vector256 intrinsics APIs. Instead of needing to reimplement ChaCha hardware acceleration on each and every architecture, the same code is now automatically accelerated on any platform that the runtime provides an implementation for. The upshot of this is that any ARM processor supporting Advanced SIMD (Neon) gets nearly 400% the throughput on .NET 7 when generating random data with ChaCha compared to .NET 6. Other RNGs also improved thanks to general .NET 7 improvements, with Mersenne Twister speeding up by 20% and PCG 32 by 22%.
These benchmarks were performed on a somewhat noisy Raspberry Pi 4 Model B 8GB.
BenchmarkDotNet=v0.13.2, OS=ubuntu 20.04BenchmarkDotNet=v0.13.2, OS=ubuntu 20.04
Unknown processor
.NET SDK=7.0.202
[Host] : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT AdvSIMD
.NET 6.0 : .NET 6.0.15 (6.0.1523.11507), Arm64 RyuJIT AdvSIMD
.NET 7.0 : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT AdvSIMD
Raw Data
RngFill | .NET 6.0 | .NET 7.0 | Ratio |
---|
ChaCha8 | 42.49 MB/s | 169.64 MB/s | 3.99 |
ChaCha12 | 29.16 MB/s | 116.02 MB/s | 3.98 |
ChaCha20 | 18.13 MB/s | 71.18 MB/s | 3.93 |
Mt1993764 | 146.89 MB/s | 176.61 MB/s | 1.20 |
Pcg32 | 231.23 MB/s | 282.61 MB/s | 1.22 |
XorShift | 226.70 MB/s | 269.47 MB/s | 1.19 |
SystemCryptoRng | 17.98 MB/s | 17.93 MB/s | 1.00 |
CryptoServiceProvider | 17.99 MB/s | 17.94 MB/s | 1.00 |
SystemRandom | 42.50 MB/s | 45.83 MB/s | 1.08 |
RngUInt64 | .NET 6.0 | .NET 7.0 | Ratio |
---|
ChaCha8 | 39.39 MB/s | 134.18 MB/s | 3.41 |
ChaCha12 | 28.34 MB/s | 98.35 MB/s | 3.47 |
ChaCha20 | 17.46 MB/s | 64.13 MB/s | 3.67 |
Mt1993764 | 153.85 MB/s | 189.76 MB/s | 1.23 |
Pcg32 | 326.13 MB/s | 380.58 MB/s | 1.17 |
XorShift | 308.59 MB/s | 356.78 MB/s | 1.16 |
SystemCryptoRng | 13.07 MB/s | 13.06 MB/s | 1.00 |
CryptoServiceProvider | 13.03 MB/s | 13.06 MB/s | 1.00 |
SystemRandom | 50.19 MB/s | 52.57 MB/s | 1.05 |
RngUInt32 | .NET 6.0 | .NET 7.0 | Ratio |
---|
ChaCha8 | 38.68 MB/s | 121.69 MB/s | 3.15 |
ChaCha12 | 27.77 MB/s | 91.41 MB/s | 3.29 |
ChaCha20 | 17.20 MB/s | 61.05 MB/s | 3.55 |
Mt1993764 | 71.66 MB/s | 92.76 MB/s | 1.29 |
Pcg32 | 438.92 MB/s | 456.60 MB/s | 1.04 |
XorShift | 439.06 MB/s | 439.06 MB/s | 1.00 |
SystemCryptoRng | 12.82 MB/s | 12.82 MB/s | 1.00 |
CryptoServiceProvider | 12.81 MB/s | 12.73 MB/s | 0.99 |
SystemRandom | 42.77 MB/s | 53.37 MB/s | 1.25 |