Project Conventions

There a couple of project conventions that you should know about. There is a tool called exec-dwarf that will run your dwarfs, but you have to get them in the right place, have the right cli, and have associated input and output in the right location. Here is how to do it:

Binary name

The name of the dwarf should be: Dwarf.<Kernel>.<Platform>.<Parallelism>.exe.

Historically we have called C# kernels "managed" and F# kernels "fsharp", even though F# is a managed language. The platform choice "unmanaged" has been used to mean C++.

I would imagine this would mean that if you implement an NBody kernel, you will call your kernels Dwarf.NBody.Unmanaged.<parallelism>.exe. We have OMP, TPL, PPL, MPI, Threads, Hybrid, and Serial -- depending on the platform, sometimes the parallelism means more than one kernel (such as MPI).

Now it should be the case that you can put anything in for <Kernel>, <Platform>, and <Parallelism> and exec-dwarfs will find it, but in order to do direct comparisons easily and have the filtering be meaningful you will need to follow the convention above.

Source placement and organization

The source should be placed in $(DwarfDir)\Src\Dwarf.<Kernel>\

And each implementation should be placed in its own subdirectory with the name Dwarf.<Platform>.<Parallelism>.

Each project should have a main program file. For the unmanaged kernels we have used Dwarf.<Platform>.<Parallelism>.cpp. For the C# and F# kernels we use Program.cs or Program.fs. These are fairly superficial files. The primary implementation should be held in a file called Solver.<extension>. An optional Configurator.h and Configurator.cpp contain the interface to the CliTools.dll necessary for the native components.

Project organization

Each project should use the CliTools.dll for their command line interface, including managed projects.

Each project should be part of the Dwarfs solution. The Solution is organized by kernel and then into three sections: Components, Managed, and Unmanaged. The optional Components section contains code that is shared between projects and typically has the interface code for CliTools.dll.

Binary placement

The CliTools.dll should be copied into the binary directory as part of the setup script or as a post build step. You will need to add the appropriate line in $(DwarfDir)\Bin\Scripts\InputFileGenerator.bat to copy CliTools.dll to the appropriate location.

The project settings of your new projects should specify the output directory as ..\..\..\BIN\Dwarf.<Kernel>\. This will place the binaries appropriately when the project is built.

Input file generation

Each dwarf kernel has an input file generator associated with it. Each kernel reads input from a file before starting the computation. The custom input file generators create mostly random input files in the appropriate file format. A new input file generator will be needed in order to run new kernels. The input file generator project for each kernel is in $(DwarfDir)\Src\InputFileGeneratorLibrary. The $(DwarfDir)\Src\InputFileGenerator creates input files for each type of kernel. The script $(DwarfDir)\Bin\Scripts\InputFileGeneratorCreateInput.bat and $(DwarfDir)\Bin\Scripts\ParallelInputFileGeneratorCreateInput.bat are used as part of setup to create the input for each kernel.

The InputFileGenerator.exe binary takes three arguments:
-name <Kernel>
-file <output file location>
-size <n>

The size parameter is the most difficult to get right as often a good input set needs more specification than just size. For this reason we allow the InputFileGenerator.exe to take custom inputs for each given kernel, however it must have "reasonable" defaults in the event that only the name, file, and size parameters are given on the command line.

Both the InputFileGeneratorCreateInput.bat and ParallelInputFileGeneratorCreateInput.bat files should be updated to reflect the addition of a kernel.

Input file placement

The input files should be placed in $(DwarfDir)\Bin\Dwarf.<Kernel>\Input\.

To take advantage of filtering in exec-dwarfs, the input files should be named consistently with the other input files. The intent was to have the various small, medium, and large input files result in computation that are roughly the same amount of time for each kernel using the same parallelism -- meaning that smallcontent.txt for the C# branch and bound dwarf running serially should take roughly the same amount of time as the smallcontent.txt input file for the Dynamic Programming dwarf running the C# kernel serially. Of course this is a hard problem and machines are always changing so a more approximate guideline of 1-2 seconds for small, 8-15 seconds for medium and 1-2 minutes for large has been used. For general studies much larger and more carefully characterized input files are probably desirable.

Output file placement

The kernels will place output files in $(DwarfDir)\Bin\Dwarf.<Kernel>\Output\ and in $(DwarfDir)\Bin\Dwarf.<Kernel>\Traces. Computational output is placed in the Output directory and is dependent upon the dwarf. Timing information is placed in the Traces file and should adhere to the following format:

#Dwarfname:Dense Linear Algebra, unmanaged serial kernel.
#Time: 04/16/10 10:38:12
#Dimensions: 2
#Matrix Size: 229
#Result time (sec): 0.01570927
#eof

Most important are the #Dwarfname: and #Result time (sec): lines as these are parsed by exec-dwarfs and used in plotting timeing in Excel. The #eof line must also be present.

Some useful scripts

There are a number of useful scripts that are placed in $(DwarfDir)\Bin\Scripts and $(DwarfDir)\Bin\Scripts\Lib. These include scripts to compare output of dwarfs for consistency and to run dwarfs locally and on a Windows HPC 2008 cluster.

Use code analysis

All of the projects have the goal of being FxCop clean and /Analyze clean with all rules enabled. Also, we turn the compiler warnings up all the way. You will find that this is a goal we are still working on, but it is better to start with this in mind.

Last edited Apr 16, 2010 at 7:35 PM by RobertPalmer, version 3

Comments

No comments yet.