1Introduction:
2
3The Flexible Filesystem Benchmark (FFSB) is a filesystem performance
4measurement tool.  It is a multi-threaded application (using
5pthreads), written entirely in C with cross-platform portability in
6mind.  It differs from other filesystem benchmarks in that the user
7may supply a profile to create custom workloads, while most other
8filesystem benchmarks use a fixed set of workloads.
9
10As of version 5.1, it supports seven different basic operations, support
11for multiple groups of threads with different operation mixtures,
12support for operation across multiple filesystems, and support for
13filesystem aging prior to benchmarking.
14
15
16Differences from version 4.0 and older:
17
18Version 5.0 and above represent almost a total re-write and many
19things have changed.  In version 5.0 and above FFSB moved to a
20time-regulated run versus doing a set number of different operations
21and timing the whole thing.  This is primarily to better deal with the
22use of multiple threadgroups which would otherwise not be synchronized
23at termination time.
24
25Additionally, the FFSB configuration file format has changed in
26version 5.0, although we do support old-style configuration files
27along with a run-time passed on the command line.  In this mode,
28version 5.0 and above ignores the iterations parameter, and simply
29uses the time specified on the command line.
30
31Behaviorally, most of the old operations are the same -- sequential
32reads and sequential writes work as they did before.  One change in
33version 5.0 is the skip-read behavior of reading then seeking forward
34a fixed amount then reading again is removed, we now support fully
35randomized reads and writes from random offsets within the file.
36
37Version 4.0 didn't support overwrites (only appends) so we interpret
38writes in old config files to be append operations.
39
40On Linux, CPU utilization information will only be accurate for
41systems using NPTL, older Linuxthreads systems will probably only see
42zeros for CPU utilization because Linuxthreads is non-compliant to
43POSIX. Version 4.0 and older could be recompiled to work on
44Linuxthreads, but in 5.0 and later we no longer support this.
45
46We no longer support the "outputfile" on the command line.
47
48One should simply use tee or similar to capture the output.  FFSB
49unbuffers standard out for this purpose, and errors are sent on
50standard error.
51
52Global options:
53
54There are eight valid global options placed at the beginning of the
55profile.  Three of them are required: num_filesystems (number of
56filesystems), num_threadgroups (number of threadgroups), and time
57(running time of the benchmark).  The other five options are:
58
59directio   - each call to open will be made using O_DIRECT
60alignio    - aligns all block operations for random reads and writes
61             on 4k boundaries.
62bufferedio - currently ignored: it is intended to use libc
63             fread,rwrite, instead of just unix read and write calls
64verbose    - currently ignored
65
66callout    - calls and external command and waits for its termination
67	     before FFSB begins the benchmark phase.
68	     This is useful for synchronizing distributed clients,
69	     starting profilers, etc.
70
71They must be specified in the above order (num_filesystems,
72num_threadgroups, time, directio, alignio, bufferedio, verbose,
73callout).
74
75
76
77Filesystems:
78
79Filesystems are specified to FFSB in the form of a directory.  FFSB
80assumes that the filesystem is mounted at this directory and will not
81do any verification of this fact beyond ensuring it can read/write to
82the location.  So be careful to ensure something with enough space to
83handle the dataset is in fact mounted at the specified location.
84
85In the filesystem clause of the profile, one may set the starting
86number of files and directories as well as a minimum and maximum
87filesize for the filesystem.  One may also specify the blocksize
88used for creating the files separately in the filesystem clause.
89
90Also, if a filesystem is to be aged, a special threadgroup clause may
91be embedded in a filesystem clause to specify the operation mixture
92and number of threads used to age the filesystem.  This threadgroup is
93run until filesystem utilization reaches the specified amount.
94
95Inheritance --  if you are using multiple filesystems, all attributes
96except the location should be inherited from the previous filesystem.
97This is done to make it easier to add groups of similar filesystems.
98In this case, only the location is required in the filesystem clause.
99
100As of version 5.1, filesystem re-use is supported if a given
101filesystem hasn't been modified beyond it's orginal specifications
102(number of files and directories is correct, and file sizes are within
103specifications).  This can be a huge time saver if one wishes to do
104multiple runs on the same data-set without altering it during a run,
105because the fileset doesn't need to be recreated before each run.
106
107To do this, specify "reuse=1" in the filesystem clause, and FFSB will
108verify the fileset first, and if it checks out it will use it.
109Otherwise, it will remove everything and re-create the filesets for
110that filesystem.
111
112Threadgroups:
113
114An arbitrary number of threadgroups with differing numbers of threads
115and operation mixes can be specified.  The operations are specified
116using a weighting for each operation, if an operation isn't specified
117it's weighting is assumed to be zero (not used).
118
119"Think-time" for a threadgroup may also be specified in millisecond
120amounts using the "op_delay" parameter, where every thread will wait
121for the specified amount between each operation.
122
123Operations:
124
125All operations begin by randomly selecting a filesystem from the list
126of filesystems specified in the profile.  The distribution aims to be
127uniform across all filesystems.
128
129
130The seven operations are:
131
132reads  - read() calls with an overall amount and a blocksize
133         operates on existing files.  Care must be taken to ensure
134         that the read amount is smaller than the size of any possible
135         file.
136
137	 If random_read is specified, then the each individual blocks
138         will be read starting from a random point with the file, and
139         this will continue until the entire amount specified has been
140         read.  This offset of each random block will be totally
141         random to the byte level, unless the "alignio" global parameter
142         is on, and then the reads will be 4096 byte aligned.  This is
143         generally recommended.
144
145
146readall - Very similar to read above, except it doesn't take an
147          amount; it simply reads the entire file sequentially using the
148          read_blocksize.   This is useful for situations where
149	  different filesystems have differently sized files, and sequential
150	  read patterns across all filesystems are desired.
151
152writes - write() calls with an overall amount and blocksize
153         this is an overwrite operation and will not enlarge an existing
154         file, again one must be careful not to specify a write amount
155         that is larger than any possible file in the data set.
156
157	 If random_write is specified, then the each individual blocks
158         will be written starting from a random point with the file, and
159         this will continue until the entire amount specified has been
160         written out.  This offset of each random block will be totally
161         random to the byte level, unless the "alignio" global parameter
162         is on, and then the writes will be 4096 byte aligned.  This
163         is generally recommended.
164
165	 If the fsync_flag parameter for the threadgroup is non-zero,
166	 then after all of the write calls are finished, fsync() will
167	 be called on the file descriptor before the file is closed.
168
169
170creates - creates a file using open() call and determines the size
171          randomly between on the constraints (min_filesize and
172          max_filesize) for the selected filesystem. Write operations will
173          be done using the same blocksize as is specified for the
174          write operation.
175deletes - calls unlink() on a filename and removes it from the
176          internal data-structures.  One must be careful to ensure
177          there are enough files to delete at all times or else the benchmark
178          will terminate.
179appends - calls write() using the append flag with an overall amount
180          and a blocksize to be appended onto a randomly chosen file.
181metas   - this is actually a mix of several different directory
182          operations.  Each "meta" operation consists of two directory
183          creates, one directory remove, and a directory rename.
184          These operations are all carried out separately from the
185          other 5 operations.
186
187Operation accounting:
188
189Each operation which uses a blocksize counts each read/write of a
190blocksize as an operation (reads,writes,creates, and appends) whereas
191deletes and metas are considered single operations.
192
193Running the benchmark:
194
195There are three phases to running the benchmark, aging, fileset
196creates, and the benchmark phase.
197
198The create phase is carried out across all filesystems simultaneously
199with one dedicated thread per filesystem.
200
201After the create phase, sync() is called to ensure all dirty data gets
202written out before the benchmark phase begins, and sync() is again
203called at the end of the benchmark phase.  The time in sync() at the
204end of the benchmark phase is counted as part of the benchmark phase.
205
206Caveats/Holes/Bugs:
207
208Aging and aging across multiple filesystems simultaneously hasn't been tested
209very much.
210
211If *any* i/o operation or system call/libc call fails, the benchmark
212will terminate immediately.
213
214The parser doesn't handle mal-formed or incorrect profiles very well
215(or at all).
216
217The parser doesn't check to make sure all of the appropriate options
218have been specified.  For example, if writes are specified in a
219threadgroup but write_blocksize isn't specified, the parse won't catch
220it, but the benchmark run will fail later on.
221
222
223Configuration Files (new style):
224
225New Style Configuration allows for arbitrary newlines between lines,
226and comments using '#' at the start of a line.  Also it allows tabs,
227whitespace before and after configuration parameters.
228
229The new style configuration file is broken up into three main parts:
230
231global parameters, filesystems, and threadgroups
232
233The sections must be in the above order.
234
235Global parameters:
236
237Global parameters are described above, the first three are always
238required. Example:
239
240----------
241
242num_filesystems=1
243num_threadgroups=1
244time=30 		# time is in seconds
245
246directio=0 		# don't use direct io
247alignio=1  		# align random IOs to 4k
248bufferedio=0		# this does nothing right now
249verbose=0		# this does nothing right now
250
251			# calls and external command and waits
252			# everything until the newline is taken
253			# so you can have abritrary parmeters
254callout=synchronize.sh myhostname
255
256---------
257
258All of these must appear in this order, though you can leave out the
259optional ones.
260
261Filesystems:
262
263Filesystems describe different logical sets of files residing in
264different directorys.  There is no strict requirement that they
265actually be on different filesystems, only that the directory
266specified already exists.
267
268Filesystems are specified by a clause with a filesystem number like
269this:
270
271[filesystem0]
272	location=/mnt/testing/
273	num_files=10
274	num_dirs=1
275	max_filesize=4096
276	min_filesize=4096
277[end0]
278
279
280The clause must always begin with [filesystemX] and end with [endX]
281where X is the number of that filesystem.
282
283You should start wiht X = 0, and increment by one for each following
284filesystem.  If they are out of order, things will likely break.
285
286The required information for each filesystem is: location, num_files,
287num_dirs, max_filesize, and min_filesize.  Beyond those the following
288four options are supported:
289
290
291
292reuse=1 # check the filesystem to see if it is reusable
293
294	# filesystem aging, three components required
295	# takes agefs=1 to turn it on
296	# then a valid threadgroup specification
297	# then a desired utilization percentage
298
299agefs=1 # age the filesystem according to the following threadgroup
300	[threadgroup0]
301		num_threads=10
302		write_size=40960
303		write_blocksize=4096
304		create_weight=10
305		append_weight=10
306		delete_weight=1
307	[end0]
308desired_util=0.20	# In this case, age until the fs is 20% full
309
310create_blocksize=4096   # specify the blocksize to write()
311		        # for creating the fileset, defaults to 4096
312
313age_blocksize=4096      # specify the blocksize to write() for aging
314
315
316Also, to allow lazy people to use lots of filesystems, we support
317filesystem inheritance, which simply copies all options but the
318location from the previous filesystem clause if nothing is specified.
319Obviously, this doesn't work for filesystem0. (May not work for aging
320either?)
321
322Full blown filesystem clause example:
323
324----
325
326[filesystem0]
327
328	# required parts
329
330	location=/home/sonny/tmp
331	num_files=100
332	num_dirs=100
333	max_filesize=65536
334	min_filesize=4096
335
336	# aging part
337	agefs=0
338	[threadgroup0]
339		num_threads=10
340		write_size=40960
341		write_blocksize=4096
342		create_weight=10
343		append_weight=10
344		delete_weight=1
345	[end0]
346		desired_util=0.02	# age until 2% full
347
348	# other optional commands
349
350	create_blocksize=1024		# use a small create blocksize
351	age_blocksize=1024		# and smaller age create blocksize
352	reuse=0	                        # don't reuse it
353[end0]
354
355
356
357--
358
359Threadgroups:
360
361Threadgropus are very similar to filesystems in that any number of
362them can be specified in clauses, and they must be in order starting
363with threadgroup0.
364
365Example:
366
367---
368
369[threadgroup0]
370	num_threads=32
371	read_weight=4
372	append_weight=1
373
374	write_size=4096
375	write_blocksize=4096
376
377	read_size=4096
378	read_blocksize=4096
379[end0]
380
381---
382
383In a threadgroup clause, num_threads is required and must be at least
3841.  Then, at least one operation must be given a weight greater than 0
385to be a valid threadgroup.  Operations can be given a weighting of 0,
386and in this case they are ignored.
387
388Certain operations will also require other commands, for example, if
389read_weight is greater than zero, then one must also include a
390read_size and a read_blocksize.  Here's the table of requirements and
391options:
392
393
394Operation		Requirements			Options
395--			--				--
396read_weight		read_size, read_blocksize	read_random
397readall_weight		read_blocksize			none
398write_weight		write_size, write_blocksize	write_random,fsync_file
399create_weight		write_blocksize or create_blocksize	none
400append_weight		write_blocksize, write_size	none
401delete_weight		none				none
402meta_weight		none				none
403
404
405
406Other threadgroup options:
407
408op_delay=10  # specify a wait between operations in milli-seconds
409
410bindfs=3     # This allows you to restrict a threadgroup's operation
411             # to a specific filesystem number.  Currently only
412	     # binding to one specific filesystem is supported
413
414