Parallel Dwarfs

The Parallel Dwarfs project is a suite of 13 kernels (as VS projects in C++/C#/F#) parallelized using various technologies such as MPI, OpenMP, TPL, MPI.Net, etc. It also has a driver to run them, collect traces, and visualize the results using Vampir, Jumpshot, Xperf and Excel

Welcome to the Parallel Dwarfs Projects!

If you are interested in learning about parallel programming using MPI, OpenMP, PFx, etc., feel free to download and run these kernels on your machine (or cluster!). If you are interested in contributing, please see below.

These "Dwarfs" (kernels) were written by Students at the St Petersburg University in Russia under Prof. Vladimir Safonov (see People).

The original idea behind the Dwarfs is from Phil Colella's paper, which was later expanded on by Dave Patterson. See the following for info about the Dwarfs themselves: UCB Dwarfs; Also see this presentation by Patterson on the philosophy behind the Dwarfs.

What they are

The Dwarfs are basically kernels that represent a wide spectrum of computations for various application domains. This is an attempt to parallelize these 13 algorithms using various parallelization technologies available on Windows. The download is a VS2010 solution with the kernels (dwarfs) in C++ and C#, an F# input data generator and a driver. You can use the driver to select one or more dwarfs, input sizes for the runs, tracing and visualization options. The dwarfs can be run on a laptop, multi-core workstation or a cluster.

Considering the number of kernels, languages, input sizes, etc., you can run approximately 350 combinations of the dwarfs!


Here are some sample screenshots of the dwarfs & the reports.

Running the Dwarfs

After downloading & setting up the project, you can run the dwarfs as follows:

PS> exec-dwarfs -names structuredgrid,unstructuredgrid -platform unmanaged -parallel serial,mpi,omp -size small -mpitrace -plotvampir -plotexcel

The above invocation will do the following:
  • Select the two "Grid" kernels (out of 13)
  • Select the unmanaged (native) versions only (ie the C++ versions) - options: managed, unamanged
  • Run the Serial, MPI and OpenMP verions - options: serial, mpi,, tpl, hybrid (eg mpi with openmp)
  • Feed the kernels the pregenerated small input data (ie short runs) - options: small, medium and large
  • Collect MPI msg trace information - performs auto clocksync & clock drift correction if run on cluster
  • Plot the results using Vampir (an MPI msg viewer) for inspection & analysis - can also choose JumpShot
  • Create an Excel chart of the runtime results - pretty charts...

Usage scenarios

The Dwarfs project can be used in various scenarios such as:
  • Educational: learn what | | technologies are available & how they can be used
  • Start with existing, functional templates & Visual Studio projects (spend time on kernels & not on scafolding...)
  • Compare & contrast different | | technologies side by side
  • Explore the perf characteristics of various | | models
  • Dig deep into how programs behave using Xperf & Vampir (cpu, io, network, etc)
  • Use the driver/framework to add your own kernels or help improve the existing ones!
  • Explore performance behavior on workstations vs clusters
  • Explore combining different | | models (eg MPI + OpenMP)


Please keep the following in mind when using the Parallel Dwarfs:
  • This is a work in progress!
  • A small number of kernel combinations are missing
  • Some of the kernels are tuned and some are not. As such, some perf anomalies exist (esp when using small input sizes)
  • Not all Dwarfs lend themselves well to parallelization :)


As mentioned before, this started as a student project and is now available on Codeplex mainly because we believe it can become a great educational and research vehicle. We hope that you will download it, report any bugs you find and hopefully actively participate in improving the kernels... some examples of how you can help:
  • Tune the dwarfs
  • Add missing combinations
  • Add new langauges: Python, F#, Fortran, Java, ...
  • Add new | | models: PPL, Intel's TBB, ClusterSOA, ...
  • Tune the input generator (some runs are too short on fast machines/large clusters)
  • Port to Mono


Please see the Prerequisites and Setup information page.

Project conventions

Please see the Project Conventions page.

For a tutorial on programming an HPC cluster, please see:

You can run the Parallel Dwarfs on your workstation or a cluster. You can setup a cluster with as little as two machine or with several hundred. The driver (dwarfbench) can run the kernels locally or on a Windows HPC cluster and schedule at a core, socket or node level.

Frequently asked questions


Last edited Apr 16, 2010 at 7:40 PM by RobertPalmer, version 55