1Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
2
3                 Reinhold P. Weicker
4                 Siemens AG, E STE 35
5                 Postfach 3240
6                 D-8520 Erlangen
7                 Germany (West)
8
9
10
11
12The Dhrystone benchmark program [1] has become a popular benchmark  for
13CPU/compiler  performance  measurement,  in  particular  in the area of
14minicomputers, workstations, PC's and  microprocesors.   It  apparently
15satisfies a need for an easy-to-use integer benchmark; it gives a first
16performance indication which  is  more  meaningful  than  MIPS  numbers
17which,  in  their  literal  meaning  (million instructions per second),
18cannot be used across different instruction sets (e.g. RISC vs.  CISC).
19With  the  increasing  use  of  the  benchmark,  it  seems necessary to
20reconsider the benchmark and to check whether it can still fulfill this
21function.   Version  2  of  Dhrystone  is  the  result  of  such  a re-
22evaluation, it has been made for two reasons:
23
24o Dhrystone has been published in Ada [1], and Versions in Ada,  Pascal
25  and  C  have  been  distributed  by Reinhold Weicker via floppy disk.
26  However, the version that was used most often  for  benchmarking  has
27  been  the version made by Rick Richardson by another translation from
28  the Ada version into the C programming language, this  has  been  the
29  version distributed via the UNIX network Usenet [2].
30
31  There is an obvious need for a common C version of Dhrystone, since C
32  is  at  present  the most popular system programming language for the
33  class of systems (microcomputers, minicomputers, workstations)  where
34  Dhrystone  is  used  most.  There should be, as far as possible, only
35  one C version of Dhrystone such that results can be compared  without
36  restrictions.  In  the  past,  the  C  versions  distributed  by Rick
37  Richardson (Version 1.1) and by Reinhold Weicker  had  small  (though
38  not significant) differences.
39
40  Together with the new C version, the Ada  and  Pascal  versions  have
41  been updated as well.
42
43o As far as it is possible without changes to the Dhrystone statistics,
44  optimizing  compilers  should  be prevented from removing significant
45  statements.  It has turned out in the past that optimizing  compilers
46  suppressed  code  generation  for  too many statements (by "dead code
47  removal" or "dead variable  elimination").   This  has  lead  to  the
48  danger  that  benchmarking results obtained by a naive application of
49  Dhrystone - without inspection of the code that was generated - could
50  become meaningless.
51
52The overall policiy for version 2 has been  that  the  distribution  of
53statements,  operand types and operand locality described in [1] should
54remain  unchanged  as  much  as  possible.   (Very  few  changes   were
55necessary;  their  impact  should  be  negligible.)  Also, the order of
56statements should  remain  unchanged.  Although  I  am  aware  of  some
57critical  remarks on the benchmark - I agree with several of them - and
58know some suggestions for improvement, I  didn't  want  to  change  the
59benchmark  into  something  different  from  what  has  become known as
60"Dhrystone"; the confusion generated by such a  change  would  probably
61outweight  the  benefits. If I were to write a new benchmark program, I
62wouldn't give it the name "Dhrystone" since this  denotes  the  program
63published in [1].  However, I do recognize the need for a larger number
64of representative programs that can be used as benchmarks; users should
65always be encouraged to use more than just one benchmark.
66
67The  new  versions  (version  2.1  for  C,  Pascal  and  Ada)  will  be
68distributed  as  widely as possible.  (Version 2.1 differs from version
692.0 distributed via the UNIX Network Usenet in March 1988 only in a few
70corrections  for  minor  deficiencies  found  by users of version 2.0.)
71Readers who want to use the benchmark for their  own  measurements  can
72obtain  a copy in machine-readable form on floppy disk (MS-DOS or XENIX
73format) from the author.
74
75
76In general, version 2 follows - in the parts that are  significant  for
77performance  measurement,  i.e.   within  the  measurement  loop  - the
78published (Ada) version and  the  C  versions  previously  distributed.
79Where  the  versions  distributed  by  Rick Richardson [2] and Reinhold
80Weicker have been different, it  follows  the  version  distributed  by
81Reinhold  Weicker.  (However,  the  differences have been so small that
82their impact on execution time in all likelihood has been  negligible.)
83The  initialization  and  UNIX  instrumentation  part  - which had been
84omitted in [1] - follows mostly  the  ideas  of  Rick  Richardson  [2].
85However,  any changes in the initialization part and in the printing of
86the result have no impact on performance  measurement  since  they  are
87outside  the  measaurement  loop.   As a concession to older compilers,
88names have been made unique within the first 8  characters  for  the  C
89version.
90
91The original publication of Dhrystone did not  contain  any  statements
92for  time  measurement  since  they  are  necessarily system-dependent.
93However, it turned out that it is not enough just to inclose  the  main
94procedure of Dhrystone in a loop and to measure the execution time.  If
95the variables that are computed are not  used  somehow,  there  is  the
96danger  that  the  compiler  considers  them  as  "dead  variables" and
97suppresses code generation for a part of the statements.  Therefore  in
98version  2  all  variables  of  "main"  are  printed  at the end of the
99program. This  also  permits  some  plausibility  control  for  correct
100execution of the benchmark.
101
102At several places in the benchmark, code has been added,  but  only  in
103branches  that  are  not  executed.  The  intention  is that optimizing
104compilers should be prevented from moving code out of  the  measurement
105loop,  or  from  removing code altogether. Statements that are executed
106have been changed in very few places only.  In these  cases,  only  the
107role  of  some operands has been changed, and it was made sure that the
108numbers  defining  the  "Dhrystone   distribution"   (distribution   of
109statements, operand types and locality) still hold as much as possible.
110Except for sophisticated  optimizing  compilers,  execution  times  for
111version 2.1 should be the same as for previous versions.
112
113Because of the self-imposed limitation that the order and  distribution
114of the executed statements should not be changed, there are still cases
115where optimizing compilers may not generate code for  some  statements.
116To   a   certain  degree,  this  is  unavoidable  for  small  synthetic
117benchmarks.  Users of the benchmark are advised to check code  listings
118whether code is generated for all statements of Dhrystone.
119
120Contrary to the suggestion in the published paper and  its  realization
121in  the  versions  previously  distributed, no attempt has been made to
122subtract the time for the measurement loop overhead. (This  calculation
123has  proven  difficult  to implement in a correct way, and its omission
124makes the program simpler.) However, since the loop check is  now  part
125of  the benchmark, this does have an impact - though a very minor one -
126on the  distribution  statistics  which  have  been  updated  for  this
127version.
128
129
130In this section, all changes are described that affect the  measurement
131loop and that are not just renamings of variables. All remarks refer to
132the C version; the other language versions have been updated similarly.
133
134In addition to adding the measurement loop and the printout statements,
135changes have been made at the following places:
136
137o In procedure "main", three statements have been  added  in  the  non-
138  executed "then" part of the statement
139    if (Enum_Loc == Func_1 (Ch_Index, 'C'))
140  they are
141    strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
142    Int_2_Loc = Run_Index;
143    Int_Glob = Run_Index;
144  The string assignment prevents movement of the  preceding  assignment
145  to  Str_2_Loc  (5'th statement of "main") out of the measurement loop
146  (This probably will not happen for the C version, but it  did  happen
147  with  another  language  and  compiler.)  The assignment to Int_2_Loc
148  prevents value propagation  for  Int_2_Loc,  and  the  assignment  to
149  Int_Glob  makes  the  value  of  Int_Glob possibly dependent from the
150  value of Run_Index.
151
152o In the three arithmetic computations at the end  of  the  measurement
153  loop  in  "main  ", the role of some variables has been exchanged, to
154  prevent the division from just cancelling out the  multiplication  as
155  it  was in [1].  A very smart compiler might have recognized this and
156  suppressed code generation for the division.
157
158o For Proc_2, no code has been changed, but the values  of  the  actual
159  parameter have changed due to changes in "main".
160
161o In Proc_4, the second assignment has been changed from
162    Bool_Loc = Bool_Loc | Bool_Glob;
163  to
164    Bool_Glob = Bool_Loc | Bool_Glob;
165  It now assigns a value to  a  global  variable  instead  of  a  local
166  variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
167  used afterwards.
168
169o In Func_1, the statement
170    Ch_1_Glob = Ch_1_Loc;
171  was added in the non-executed "else" part of the "if"  statement,  to
172  prevent  the  suppression  of  code  generation for the assignment to
173  Ch_1_Loc.
174
175o In Func_2, the second character comparison statement has been changed
176  to
177    if (Ch_Loc == 'R')
178  ('R' instead of 'X') because a comparison with 'X' is implied in  the
179  preceding "if" statement.
180
181  Also in Func_2, the statement
182    Int_Glob = Int_Loc;
183  has been added in the non-executed part of the last  "if"  statement,
184  in order to prevent Int_Loc from becoming a dead variable.
185
186o In Func_3, a non-executed "else" part has  been  added  to  the  "if"
187  statement.   While  the  program  would not be incorrect without this
188  "else" part, it is considered bad programming practice if a  function
189  can be left without a return value.
190
191  To compensate for this change, the (non-executed) "else" part in  the
192  "if" statement of Proc_3 was removed.
193
194The distribution statistics have been changed only by the  addition  of
195the  measurement  loop  iteration (1 additional statement, 4 additional
196local integer operands) and  by  the  change  in  Proc_4  (one  operand
197changed  from  local  to  global).  The  distribution statistics in the
198comment headers have been updated accordingly.
199
200
201The string operations (string assignment and  string  comparison)  have
202not  been  changed,  to  keep  the program consistent with the original
203version.
204
205There has been some  concern  that  the  string  operations  are  over-
206represented  in  the  program,  and that execution time is dominated by
207these  operations.   This  was  true  in  particular  when   optimizing
208compilers  removed  too much code in the main part of the program, this
209should have been mitigated in version 2.
210
211It should be noted that this is a language-dependent issue:   Dhrystone
212was  first published in Ada, and with Ada or Pascal semantics, the time
213spent in the string operations is,  at  least  in  all  implementations
214known  to  me, considerably smaller.  In Ada and Pascal, assignment and
215comparison of strings are operators defined in the  language,  and  the
216upper  bounds of the strings occuring in Dhrystone are part of the type
217information known at compilation time.   The  compilers  can  therefore
218generate efficient inline code.  In C, string assignemt and comparisons
219are not part  of  the  language,  so  the  string  operations  must  be
220expressed  in  terms  of the C library functions "strcpy" and "strcmp".
221(ANSI  C  allows  an  implementation  to  use  inline  code  for  these
222functions.)   In addition to the overhead caused by additional function
223calls, these functions are defined for  null-terminated  strings  where
224the  length  of  the  strings  is  not  known  at compilation time; the
225function has to check every byte for  the  termination  condition  (the
226null byte).
227
228Obviously, a C library which includes efficiently  coded  "strcpy"  and
229"strcmp"  functions  helps to obtain good Dhrystone results. However, I
230don't think that this is unfair since string functions do  occur  quite
231frequently  in real programs (editors, command interpreters, etc.).  If
232the strings functions are  implemented  efficiently,  this  helps  real
233programs as well as benchmark programs.
234
235I admit that the string comparison in Dhrystone terminates later (after
236scanning  20 characters) than most string comparisons in real programs.
237For consistency with  the  original  benchmark,  I  didn't  change  the
238program despite this weakness.
239
240
241When Dhrystone is used, the following "ground rules" apply:
242
243o Separate compilation (Ada and C versions)
244
245  As  mentioned  in  [1],  Dhrystone  was  written  to  reflect  actual
246  programming  practice  in  systems  programming.   The  division into
247  several compilation units (5 in the Ada version, 2 in the C  version)
248  is  intended, as is the distribution of inter-module and intra-module
249  subprogram  calls.   Although  on  many  systems  there  will  be  no
250  difference  in  execution  time  to  a  Dhrystone  version  where all
251  compilation units are merged into one file, the rule is that separate
252  compilation  should  be used.  The intention is that real programming
253  practice, where programs consist of  several  independently  compiled
254  units, should be reflected.  This also has implies that the compiler,
255  while compiling one  unit,  has  no  information  about  the  use  of
256  variables,  register  allocation  etc.  occuring in other compilation
257  units.  Although in real life  compilation  units  will  probably  be
258  larger,  the  intention is that these effects of separate compilation
259  are modeled in Dhrystone.
260
261  A few  language  systems  have  post-linkage  optimization  available
262  (e.g.,  final  register allocation is performed after linkage).  This
263  is a borderline case: Post-linkage optimization  involves  additional
264  program  preparation time (although not as much as compilation in one
265  unit) which may prevent its general use in practical programming.   I
266  think that since it defeats the intentions given above, it should not
267  be used for Dhrystone.
268
269  Unfortunately, ISO/ANSI Pascal does not contain language features for
270  separate  compilation.   Although  most  commercial  Pascal compilers
271  provide separate compilation in  some  way,  we  cannot  use  it  for
272  Dhrystone  since such a version would not be portable.  Therefore, no
273  attempt has been made  to  provide  a  Pascal  version  with  several
274  compilation units.
275
276o No procedure merging
277
278  Although  Dhrystone  contains  some  very  short   procedures   where
279  execution  would  benefit  from  procedure  merging  (inlining, macro
280  expansion of procedures), procedure merging is not to be  used.   The
281  reason is that the percentage of procedure and function calls is part
282  of the "Dhrystone distribution" of statements contained in [1].  This
283  restriction  does  not hold for the string functions of the C version
284  since ANSI C allows an implementation to use inline  code  for  these
285  functions.
286
287
288
289o Other optimizations are allowed, but they should be indicated
290
291  It is  often  hard  to  draw  an  exact  line  between  "normal  code
292  generation"  and  "optimization" in compilers: Some compilers perform
293  operations by default that are invoked in other compilers  only  when
294  optimization  is explicitly requested.  Also, we cannot avoid that in
295  benchmarking people try to achieve  results  that  look  as  good  as
296  possible.   Therefore,  optimizations  performed by compilers - other
297  than those listed above - are not forbidden when Dhrystone  execution
298  times  are measured.  Dhrystone is not intended to be non-optimizable
299  but is intended to be similarly optimizable as normal programs.   For
300  example,  there  are  several  places  in Dhrystone where performance
301  benefits from optimizations like  common  subexpression  elimination,
302  value propagation etc., but normal programs usually also benefit from
303  these optimizations.  Therefore, no effort was made  to  artificially
304  prevent  such  optimizations.   However,  measurement  reports should
305  indicate which compiler  optimization  levels  have  been  used,  and
306  reporting  results with different levels of compiler optimization for
307  the same hardware is encouraged.
308
309o Default results are those without "register" declarations (C version)
310
311  When Dhrystone results are quoted without  additional  qualification,
312  they  should  be  understood  as  results obtained without use of the
313  "register" attribute. Good compilers should be able to make good  use
314  of  registers  even  without  explicit register declarations ([3], p.
315  193).
316
317Of  course,  for  experimental  purposes,  post-linkage   optimization,
318procedure  merging  and/or  compilation  in  one  unit  can  be done to
319determine their effects.  However,  Dhrystone  numbers  obtained  under
320these   conditions  should  be  explicitly  marked  as  such;  "normal"
321Dhrystone results should be understood as  results  obtained  following
322the ground rules listed above.
323
324In any case, for serious performance evaluation, users are  advised  to
325ask  for  code listings and to check them carefully.  In this way, when
326results for different systems  are  compared,  the  reader  can  get  a
327feeling how much performance difference is due to compiler optimization
328and how much is due to hardware speed.
329
330
331The C version 2.1 of Dhrystone has been developed in  cooperation  with
332Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
333"Version 1.1" distributed previously  by  him  over  the  UNIX  network
334Usenet.  Through  his  activity with Usenet, Rick Richardson has made a
335very valuable contribution to the dissemination of  the  benchmark.   I
336also  thank  Chaim  Benedelac  (National  Semiconductor),  David Ditzel
337(SUN), Earl Killian and John  Mashey  (MIPS),  Alan  Smith  and  Rafael
338Saavedra-Barrera  (UC  at  Berkeley)  for  their  help with comments on
339earlier versions of the benchmark.
340
341
342[1]
343   Reinhold P. Weicker:  Dhrystone:  A  Synthetic  Systems  Programming
344   Benchmark.
345   Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
346
347[2]
348   Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
349   Informal Distribution via "Usenet", Last Version Known to me:  Sept.
350   21, 1987
351
352[3]
353   Brian W.  Kernighan  and  Dennis  M.  Ritchie:   The  C  Programming
354   Language.
355   Prentice-Hall, Englewood Cliffs (NJ) 1978
356
357
358
359
360
361