Installation of the Polyhedron Benchmark Suite

Simply extract the contents of the distributed .zip file in a convenient location. For example

unzip pb11.zip

creates a subdirectory called pb11. The pb11/win and pb11/lin directories contain the Windows and Linux versions, each with a subdirectory called source for the benchmark source code, parameter files etc.

In this document, we will use the Windows convention for directory names, using “\”, rather than “/” as the separator. Apart from this, and some Windows specific details in the examples, the following applies to both Windows and Linux versions of the Polyhedron Benchmarks.

Running the Polyhedron Benchmark Harness

To run the Polyhedron Benchmark Harness, open a command shell, and move to the directory containing the source code (e.g. pb11\win\source). Then type

..\pbharness parfile1 parfile2 ..

where the command line arguments are the names of files containing a description of the tests to be performed, including the compile command, the names of the benchmark files, accuracy requirements etc. The default extension for these “parameter files” is .par, and that extension is assumed if none is specified. The “.” character may only be used in the extension of parameter file names.

By default, the name of the first parameter file (excluding the extension) is taken as the name for this test run; the harness produces various output files by combining the test name with different extensions. For example,

..\pbharness int_90027_win_p4 standard

reads instructions from int_90027_win_p4.par and standard.par (they are, in effect, concatenated), and writes output files called int_90027_win_p4.cmp, int_90027_win_p4.run and int_90027_win_p4.sum.

You can specify the test name explicitly, by placing it in brackets after the first argument. For example,

..\pbharness archived(absoft_100_lin_p4) tidy

reads from archived.par and tidy.par, but the test name is absoft_100_lin_p4, and the output files are called absoft_100_lin_p4.cmp, absoft_100_lin_p4.run and absoft_100_lin_p4.sum. This facility is useful when retrieving executables from archive files, and also for assigning temporary names in disposable test runs.

It is recommended that test names be chosen carefully and consistently, to characterize the test as fully as possible (e.g. int_90027_win_p4 represents a test for version 9.0.027 of the Windows Intel compiler tuned for a Pentium 4 computer. However, if the test name exceeds 16 characters, it will be truncated in some contexts.

Format of the .par Files

The parameter files are ASCII files, created using a standard text editor. Blank lines, and lines beginning with a “#” are ignored.

Line 1 :	The command to be used to create the executable file. The name of the benchmark is represented by “%n”. For example: lf95 %n.for -inline -o1 -sse2 -nstchk -tp4 -ntrace -zfm creates the executable for benchmark %n. Note that the command could invoke the compiler directly, or using a batch file. It could also extract a pre-existing executable from an archive file. For example unzip archives\%r.zip %n.exe extracts the executable file for the current benchmark (%n) from the archive file for the current test. The name of the current test is represented by “%r”. The executable should be created in the current directory, with the name %n.exe (Windows) or just %n (Linux). Note that the harness deletes any pre-existing version of the executable before creating the new one.
Line 2:	A space delimited list of benchmark names – e.g. AC AERMOD DODUC TFFT2 FATIGUE2 NF TEST_FPU2 Individual benchmarks may be temporarily disabled by inserting a “-“ before the name – e.g. AC -AERMOD DODUC TFFT2 FATIGUE2 -NF TEST_FPU2 runs all of these benchmarks except AERMOD and NF.
Line 3:	Contains timing and target accuracy parameters. For example 10000 0.1 10 100 Specifies that the harness will spend up to 10000 seconds running the benchmark repeatedly, with a view to reducing the estimated timing error to 0.1%. Even if the required accuracy is achieved earlier, the harness will repeat the benchmark at least 10 times (or until 10000 seconds are up). The harness will not repeat the run more than 100 times, even if the accuracy target is not achieved.
Line 4 and on:	Contain a series of commands that will be executed after the last benchmark is complete, and before finishing execution. Typically these commands would create an archive file containing the executables and results files, and delete any temporary files, leaving the benchmark directory ready for further use. The “%r” placeholder may be used to represent the current test name. For example: move archives\%r-2.zip archives\%r-3.zip move archives\%r-1.zip archives\%r-2.zip move archives\%r.zip archives\%r-1.zip zip archives\%r.zip %r.* .txt .exe @tidy.par The last line specifies that further input should be taken from the the file “tidy.par”. Note that this is a “one-way” transfer – any lines after the redirection are not processed.

Notes on Timing

The Polyhedron Benchmark Harness reports “wall-clock” or “elapsed” times, rather than relying on internal timers. This approach has advantages and disadvantages:

Internal clocks may exclude significant amounts of time spent on system tasks, such as paging, which are essential to the run. An end user does not care about this distinction; in the real world it’s the “wall-clock” that matters.
Wall clock time can be measured without modifying the program.
With “wall-clock” timing, unrelated activities, such as screen-savers, cron jobs and network servicing may interfere. This may be realistic, but it makes accurate and consistent benchmarking difficult. Better results are obtained if as many as possible of these extraneous activities are disabled.

When a benchmark is run repeatedly, it is often found that the second and subsequent runs differ in timing from the first. This may be particularly noticeably for “Just-in-Time” compilers, which compile code on the fly, when it first runs. To counter this effect, the Polyhedron Benchmark eliminates the first run from the statistics after the 5th run is complete (so the 5th estimate is the average of runs 2 to 5).

If the error estimate for a particular benchmark takes an unusually long time (>30 repeats) to converge to an acceptable value, the benchmark harness displays a simple frequency v run-time histogram (using lines of “*”s). This can sometimes be useful in diagnosing the reason for slow convergence (e.g. a single rogue datum, or timings alternating between 2 values).

Included Parameter Files

The installation directory contains parameter files which may be used with many current compilers. In general, we use 2 concatenated parameter files. The first is compiler specific, and the second contains benchmark names and timing parameters etc., which are common to all compilers. The “archived.par” parameter file is used for cases where the executable files are extracted from a pre-existing archive. “makearchive.par” is used to create archives. These .par files assume the availability of the free zip and unzip utilities for Windows and Linux from www.info-zip.org.

Output Files

Each time the benchmark program runs to completion, it produces 3 new output files, and updates 3 summary files. If the testname is ‘test1, these are:

test1.cmp	Contains the output from the compiler for each benchmark.
test1.run	Contains the concatenated outputs from the first run of each benchmark. This file should be examined to check that the benchmarks ran correctly. Note that the outputs from different compilers will not, in general be identical. In particular, the treatment of floating point data varies, and some formatted outputs may differ. The Polyhedron Benchmark Validator (see Appendix 1) is a simple program which helps to determine the correctness of the output files.
test1.sum	Contains a summary of the run, including the compile command, benchmark names, timing parameters, compile and execution times for each benchmark, together with the executable size, the number of times each benchmark was run, and the estimated measurement error.
runtimes.txt	is updated with a single line summary of the execution times (in seconds) for each benchmark.
exesizes.txt	is updated with a single line summary of the executable file sizes (in kb) for each benchmark.
bldtimes.txt	is updated with a single line summary of the build time (in seconds) for each benchmark.

Appendix 1 The Polyhedron Benchmark Validator

The Polyhedron Benchmark Validator (PBValid) extracts specified strings and numbers from an output file, and checks them against target values, with a specified tolerance where appropriate. The format of the output from a bug-free program may vary from one compiler to another for at least 2 good reasons:

Floating point numbers are in general approximations to the exact values that we would like to have. Compilers may, quite properly, perform calculations in different ways, which would be mathematically equivalent if floating point numbers were precise. However because they are not precise, the errors accumulate in different ways, and different compilers may produce different answers. In a well designed program, the differences are small. However in validating an output file, we must take account of the possibility of these differences.
The Fortran standard does not define the precise way some formatted outputs will be rendered. For example, the spacing of items in a list directed output may vary from one compiler to another, and there is even scope for variations in formatted numeric output.

PBValid is a command line program which reqires that 2 filenames be specified on the command line:

The name of the output file to be validated.
The name of a file containing validation instructions.

All output from PBValid is sent to standard output. The validation instructions resemble editing commands (e.g. find a string, move down 10 lines, move to next space etc.), and they are used to extract strings which are to be checked. Spaces and new lines are ignored in the validation instructions. The available commands are:

N22	Advance the cursor by the specified number of lines (22 in this case). The cursor is left in column 1 on the new line.
P16	Move the cursor backwards by the specified number of lines (16 in this case). The cursor is left in column 1 on the new line.
+6	Advance the cursor by the specified number of columns (but do not go beyond column 1024)
-10	Move the cursor back by the specified number of columns (but do not go beyond column 1)
F/xyz/	Find the specified string (xyz in this case). The string is delimited by the ‘/’ character in the this case, but any character may be used, provided that it does not occur in the string. The search is case-sensitive. The cursor is left at the character immediately after the string.
W	Move the cursor forward to the next “whitespace” character on the current line (but do not go beyond column 1024)
B	Move the cursor forward to the next “blackspace” (i.e. anything except a space) character on the current line (but do not go beyond column 1024)
#	Move the cursor forward to the next numeric character (including “+”, “-” and “.”) on the current line (but do not go beyond column 1024)
<	Identify the current character as the first in a string which is to be validated.
>/123.45/0.02/	The string to be tested finishes at the character before the current cursor position. The string should contain the specified floating point value (123.45 in this case), but with a permitted variation as specified (0.02 in this case). The strings are delimited by the ‘/’ character in the this case, but any character may be used, provided that it does not occur in either string.
=/AA14/	The string to be tested finishes at the character before the current cursor position. The string should contains the specified string (“AA14” in this case). The string is delimited by the ‘/’ character in the this case, but any character may be used, provided that it does not occur in the string.
?	The test-string finishes at the character before the current cursor position. “?” causes the string to be copied to standard output.
R/name/	Causes a report to be generated summarizing the number of passed and failed tests since the last “R” command. The name is displayed in the summary, and is used to identify particular sets of tests.
X	Terminates the validation. A summary is written to standard output.
!	Ignore the remainder of this line (comment).

The benchmark installation includes validation files for the Fortran 90 benchmark suite. Note that the validator does not provide a definitive answer to the question of whether a given output is “correct”. An incorrect output may pass, and a correct output may fail. It is intended as a tool for identifying outputs that need closer examination, rather than a thorough validation.