Benchmarking Rust Compiler Settings with Criterion #Imaginations Hub

Image source -

Controlling Criterion with scripts and setting variables

Timing a crab race — Supply: All different figures from the writer.

This text explains, first, the right way to benchmark utilizing the favored criterion crate. It, then, offers further info exhibiting the right way to benchmark throughout compiler settings. Though every mixture of compiler settings requires re-compilation and a separate run, we are able to nonetheless tabulate and analyze outcomes. The article is a companion to the article 9 Guidelines for SIMD Acceleration of Your Rust Code in In direction of Information Science.

We’ll utilized this method to the range-set-blaze crate. Our objective is to measure the efficiency results of assorted SIMD (Single Instruction, A number of Information) settings. We additionally wish to examine efficiency throughout totally different CPUs. This method can also be helpful for understanding the good thing about totally different optimization ranges.

Within the context of range-set-blaze, we consider:

  • 3 SIMD extension ranges — sse2 (128 bit), avx2 (256 bit), avx512f (512 bit)
  • 10 ingredient varieties — i8, u8, i16, u16, i32, u32, i64, u64, isize, usize
  • 5 lane numbers — 4, 8, 16, 32, 64
  • 2 CPUs — AMD 7950X with avx512f, Intel i5–8250U with avx2
  • 5 algorithms — Common, Splat0, Splat1, Splat2, Rotate
  • 4 enter lengths — 1024; 10,240; 102,400; 1,024,000

Of those, we externally regulate the primary 4 variables (SIMD extension degree, ingredient sort, lane quantity, CPU). We managed the ultimate two variables (algorithm and enter size) with loops inside common Rust benchmark code.

Getting Began with Criterion

So as to add benchmarking to your challenge, add this dev dependency and create a subfolder:

cargo add criterion --dev --features html_reports
mkdir benches

In Cargo.toml add:

title = "bench"
harness = false

Create a benches/ Right here is pattern one:

use criterion::black_box, criterion_group, criterion_main, Criterion;
use is_consecutive1::*;

// create a string from the SIMD extension used
const SIMD_SUFFIX: &str = if cfg!(target_feature = "avx512f")
else if cfg!(target_feature = "avx2")
else if cfg!(target_feature = "sse2")
sort Integer = i32;
const LANES: usize = 64;
// examine in opposition to this
pub fn is_consecutive_regular(chunk: &[Integer; LANES]) -> bool
for i in 1..LANES
if chunk[i - 1].checked_add(1) != Some(chunk[i])
return false;


// outline a benchmark referred to as "easy"
fn easy(c: &mut Criterion)
let mut group = c.benchmark_group("easy");
// generate about 1 million aligned components
let parameter: Integer = 1_024_000;
let v = (100..parameter + 100).accumulate::<Vec<_>>();
let (prefix, simd_chunks, reminder) = v.as_simd::<LANES>(); // preserve aligned half
let v = &v[prefix.len()..v.len() - reminder.len()]; // preserve aligned half
group.bench_function(format!("common,", SIMD_SUFFIX),
criterion_group!(benches, easy);

If you wish to run this instance, the code is on GitHub.

Run the benchmark with the command cargo bench. A report will seem in goal/criterion/easy/report/index.html and consists of plots like this one exhibiting Splat1 operating many occasions quicker than Common.

Considering Exterior the Criterion Field

We’ve an issue. We wish to benchmark sse2 vs. avx2 vs. avx512f which requires (usually) a number of compilations and criterion runs.

Right here’s our method:

  • Use a Bash script to set setting variables and name benchmarking.
    For instance,
SIMD_INTEGER_VALUES=("i64" "i32" "i16" "i8" "isize" "u64" "u32" "u16" "u8" "usize")
SIMD_LANES_VALUES=(64 32 16 8 4)
RUSTFLAGS_VALUES=("-C target-feature=+avx512f" "-C target-feature=+avx2" "")

for simdLanes in "$SIMD_LANES_VALUES[@]"; do
for simdInteger in "$SIMD_INTEGER_VALUES[@]"; do
for rustFlags in "$RUSTFLAGS_VALUES[@]"; do
echo "Operating with SIMD_INTEGER=$simdInteger, SIMD_LANES=$simdLanes, RUSTFLAGS=$rustFlags"
SIMD_LANES=$simdLanes SIMD_INTEGER=$simdInteger RUSTFLAGS="$rustFlags" cargo bench

Apart: You’ll be able to simply use Bash on Home windows in case you have Git and/or VS Code.

  • Use a to show these setting variables into Rust configurations:
use std::env;

fn major()
if let Okay(simd_lanes) = env::var("SIMD_LANES")
println!("cargo:rustc-cfg=simd_lanes=""", simd_lanes);

if let Okay(simd_integer) = env::var("SIMD_INTEGER")
println!("cargo:rustc-cfg=simd_integer=""", simd_integer);

const SIMD_SUFFIX: &str = if cfg!(target_feature = "avx512f") 
else if cfg!(target_feature = "avx2")
else if cfg!(target_feature = "sse2")

#[cfg(simd_integer = "i8")]
sort Integer = i8;
#[cfg(simd_integer = "i16")]
sort Integer = i16;
#[cfg(simd_integer = "i32")]
sort Integer = i32;
#[cfg(simd_integer = "i64")]
sort Integer = i64;
#[cfg(simd_integer = "isize")]
sort Integer = isize;
#[cfg(simd_integer = "u8")]
sort Integer = u8;
#[cfg(simd_integer = "u16")]
sort Integer = u16;
#[cfg(simd_integer = "u32")]
sort Integer = u32;
#[cfg(simd_integer = "u64")]
sort Integer = u64;
#[cfg(simd_integer = "usize")]
sort Integer = usize;
simd_integer = "i8",
simd_integer = "i16",
simd_integer = "i32",
simd_integer = "i64",
simd_integer = "isize",
simd_integer = "u8",
simd_integer = "u16",
simd_integer = "u32",
simd_integer = "u64",
simd_integer = "usize"
sort Integer = i32;
const LANES: usize = if cfg!(simd_lanes = "2")
else if cfg!(simd_lanes = "4")
else if cfg!(simd_lanes = "8")
else if cfg!(simd_lanes = "16")
else if cfg!(simd_lanes = "32")
  • In, create a benchmark id that data the mixture of variables you’re testing, separated by commas. This may both be a string or a criterion BenchmarkId. I created a BenchmarkId with this name: create_benchmark_id::<Integer>("common", LANES, *parameter) to this perform:
fn create_benchmark_id<T>(title: &str, lanes: usize, parameter: usize) -> BenchmarkId
the place
T: SimdElement,

mem::size_of::<T>() * 8,

Set up:

cargo set up cargo-criterion-means


cargo criterion-means > outcomes.csv

Output Instance:

# ...


A CSV file is appropriate for evaluation through spreadsheet pivot tables or knowledge body instruments resembling Polars.

For instance, right here is the highest of my 5000-line lengthy Excel knowledge file:

Columns A to J got here from the benchmark. Columns Ok to N are calculated by Excel.

Here’s a pivot desk (and chart) based mostly on the information. It exhibits the impact of various the variety of SIMD lanes on throughput. The chart averages throughout ingredient sort and enter size. The chart means that for the very best algorithms, both 32 or 64 lanes is finest.

With this evaluation, we are able to now select our algorithm and determine how we wish to set the LANES parameter.


Thanks for becoming a member of me for this journey into Criterion benchmarking.

Should you’ve not used Criterion earlier than, I hope this encourages you to strive it. Should you’ve used Criterion however couldn’t get it to measure the whole lot you cared about, I hope this provides you a path ahead. Embracing Criterion on this expanded method can unlock deeper insights into the efficiency traits of your Rust tasks.

Please observe Carl on Medium. I write on scientific programming in Rust and Python, machine studying, and statistics. I have a tendency to jot down about one article per month.

Benchmarking Rust Compiler Settings with Criterion was initially printed in In direction of Information Science on Medium, the place persons are persevering with the dialog by highlighting and responding to this story.

Related articles

You may also be interested in