1## Command Line Interface Tutorials
2
3* [Command Line Inputs](#command-line-inputs)
4* [Start `tfprof`](#start-tfprof)
5* [Examples](#examples)
6  * [Profile Python Time](#profile-python-time)
7  * [Profile Graph Time](#profile-graph-time)
8  * [Profile Checkpoint Value](#profile-checkpoint-value)
9  * [Profile Model Parameter](#profile-model-parameter)
10  * [Profile Device Placement](#profile-device-placement)
11  * [Define Customized Operation Type](#define-customized-operation-type)
12  * [Non-interactive Mode](#non-interactive-mode)
13
14
15### Command Line Inputs
16
17tfprof command line tool uses the following input:
18
19<b>--profile_path:</b> A ProfileProto binary proto file.
20See QuickStart on generating the file.
21
22<b>THE OLD WAY BELOW IS DEPRECATED:</b>
23
24<b>--graph_path:</b> GraphDef proto file (optional in eager execution).
25Used to build in-memory
26data structure of the model. For example, graph.pbtxt written by tf.Supervisor
27can be passed to --graph_path. You can also easily get GraphDef using
28tf.get_default_graph().as_graph_def(add_shapes=True) or other API.
29
30<b>--run_meta_path:</b> RunMetadata proto file (optional).
31Used to get the memory consumption and execution time of
32each op of the model.
33
34The following code snippet writes a RunMetadata file:
35
36```python
37run_options = config_pb2.RunOptions(trace_level=config_pb2.RunOptions.FULL_TRACE)
38run_metadata = config_pb2.RunMetadata()
39_ = self._sess.run(..., options=run_options, run_metadata=run_metadata)
40with tf.gfile.Open(os.path.join(output_dir, "run_meta"), "w") as f:
41  f.write(run_metadata.SerializeToString())
42```
43
44<b>--op_log_path:</b>
45tensorflow.tfprof.OpLogProto (optional). A proto used to provide extra operation
46information. 1) float operations. 2) code traces. 3) define customized operation
47type for `-account_type_regexes` option.
48
49The following code snippet writes a OpLogProto file.
50
51```python
52tf.profiler.write_op_log(graph, log_dir, op_log=None)
53```
54
55<b>--checkpoint_path:</b> TensorFlow checkpoint (optional).
56It defines _checkpoint_variable op type. It also provides checkpointed tensors' values.
57Note: this feature is not well maintained now.
58
59
60### Start `tfprof`
61
62#### Build `tfprof`
63
64```shell
65# Build the tool.
66bazel build --config opt tensorflow/core/profiler:profiler
67
68# Help information, including detail 'option' instructions.
69bazel-bin/tensorflow/core/profiler/profiler help
70```
71
72#### Start `tfprof` Interactive Mode
73```shell
74# The following commands will start tfprof interactive mode.
75#
76# Recommended:
77#
78# The file contains the binary string of ProfileProto.
79# It contains all needed information in one file.
80bazel-bin/tensorflow/core/profiler/profiler \
81    --profile_path=profile_xxx
82#
83# Alternatively, user can pass separate files.
84#
85# --graph_path contains the model architecture and tensor shapes.
86# --run_meta_path contains the memory and time information.
87# --op_log_path contains float operation and code traces.
88# --checkpoint_path contains the model checkpoint data.
89#
90# Only includes model architecture, parameters and shapes.
91bazel-bin/tensorflow/core/profiler/profiler \
92    --graph_path=graph.pbtxt
93
94# For profiling eager execution, user can only specify run_meta_path
95# and profile execution info of each operation.
96bazel-bin/tensorflow/core/profiler/profiler \
97    --run_meta_path=run_meta
98#
99# Additionally profile ops memory and timing.
100bazel-bin/tensorflow/core/profiler/profiler \
101    --graph_path=graph.pbtxt \
102    --run_meta_path=run_meta \
103#
104# tfprof_log is used to define customized op types, float ops and code traces.
105# Use tfprof_logger.write_op_log() to create tfprof_log.
106bazel-bin/tensorflow/core/profiler/profiler \
107    --graph_path=graph.pbtxt \
108    --run_meta_path=run_meta \
109    --op_log_path=tfprof_log \
110#
111# Additionally profile checkpoint statistics and values.
112# Use '-account_type_regexes _checkpoint_variables' to select
113# checkpoint tensors.
114bazel-bin/tensorflow/core/profiler/profiler \
115    --graph_path=graph.pbtxt \
116    --run_meta_path=run_meta \
117    --op_log_path=tfprof_log \
118    --checkpoint_path=model.ckpt
119```
120
121#### Start `tfprof` Non-interactive Mode.
122
123```python
124# Runs tfprof in one-shot.
125bazel-bin/tensorflow/core/profiler/profiler scope \
126    --graph_path=graph.pbtxt \
127    --max_depth=3
128```
129
130#### Press enter to show the default options
131
132Refer to [Options](options.md) for option instructions.
133
134```shell
135tfprof>
136-max_depth                  4
137-min_bytes                  0
138-min_micros                 0
139-min_params                 0
140-min_float_ops              0
141-min_occurrence             0
142-step                       -1
143-order_by                   name
144-account_type_regexes       Variable,VariableV2
145-start_name_regexes         .*
146-trim_name_regexes
147-show_name_regexes          .*
148-hide_name_regexes          IsVariableInitialized_[0-9]+,save\/.*,^zeros[0-9_]*
149-account_displayed_op_only  false
150# supported select fields. Availability depends on --[run_meta|checkpoint|op_log]_path.
151# [bytes|micros|params|float_ops|occurrence|tensor_value|device|op_types]
152-select                     params
153# format: output_type:key=value,key=value...
154# output_types: stdout (default), timeline, file.
155# key=value pairs:
156#   1. timeline: outfile=<filename>
157#   2. file: outfile=<filename>
158#   3. stdout: None.
159# E.g. timeline:outfile=/tmp/timeline.json
160-output
161```
162
163### Examples
164
165#### Profile Python Time
166```shell
167# Requires --graph_path --op_log_path
168tfprof> code -max_depth 1000 -show_name_regexes .*model_analyzer.*py.* -select micros -account_type_regexes .* -order_by micros
169_TFProfRoot (0us/22.44ms)
170  model_analyzer_test.py:149:run_filename_as_m...:none (0us/22.44ms)
171    model_analyzer_test.py:33:_run_code_in_main:none (0us/22.44ms)
172      model_analyzer_test.py:208:<module>:test.main() (0us/22.44ms)
173        model_analyzer_test.py:132:testComplexCodeView:x = lib.BuildFull... (0us/22.44ms)
174          model_analyzer_testlib.py:63:BuildFullModel:return sgd_op.min... (0us/21.83ms)
175          model_analyzer_testlib.py:58:BuildFullModel:cell, array_ops.c... (0us/333us)
176          model_analyzer_testlib.py:54:BuildFullModel:seq.append(array_... (0us/254us)
177            model_analyzer_testlib.py:42:BuildSmallModel:x = nn_ops.conv2d... (0us/134us)
178            model_analyzer_testlib.py:46:BuildSmallModel:initializer=init_... (0us/40us)
179            ...
180          model_analyzer_testlib.py:61:BuildFullModel:loss = nn_ops.l2_... (0us/28us)
181          model_analyzer_testlib.py:60:BuildFullModel:target = array_op... (0us/0us)
182        model_analyzer_test.py:134:testComplexCodeView:sess.run(variable... (0us/0us)
183```
184
185Set ```-output timeline:outfile=<filename>``` to generate timeline instead of stdout.
186<left>
187![CodeTimeline](code_timeline.png)
188</left>
189
190#### Profile Graph Time
191
192```shell
193# I defined an op named ‘cost’ to calculate the loss. I want to know what ops
194# it depends on take a long time to run.
195
196# Requires --graph_path, --run_meta_path.
197tfprof> graph -start_name_regexes cost.* -max_depth 100 -min_micros 10000 -select micros -account_type_regexes .*
198_TFProfRoot (0us/3.61sec)
199  init/init_conv/Conv2D (11.75ms/3.10sec)
200    random_shuffle_queue_DequeueMany (3.09sec/3.09sec)
201  unit_1_0/sub2/conv2/Conv2D (74.14ms/3.19sec)
202  unit_1_3/sub2/conv2/Conv2D (60.75ms/3.34sec)
203  unit_2_4/sub2/conv2/Conv2D (73.58ms/3.54sec)
204  unit_3_3/sub2/conv2/Conv2D (10.26ms/3.60sec)
205```
206
207#### Profile Checkpoint Value
208```shell
209# Requires --graph_path, --checkpoint_path.
210tfprof> scope -show_name_regexes unit_1_0.*gamma -select tensor_value -max_depth 5
211_TFProfRoot ()
212  unit_1_0/shared_activation/init_bn/gamma ()
213[1.80 2.10 2.06 1.91 2.26 1.86 1.81 1.37 1.78 1.85 1.96 1.54 2.04 2.34 2.22 1.99 ],
214  unit_1_0/sub2/bn2/gamma ()
215[1.57 1.83 1.30 1.25 1.59 1.14 1.26 0.82 1.19 1.10 1.48 1.01 0.82 1.23 1.21 1.14 ],
216```
217
218#### Profile Model Parameter
219
220```shell
221# Show the number of parameters of all `tf.trainable_variables()` in the model.
222# Requires --graph_path --op_log_path.
223# store option for future commands.
224tfprof> set -account_type_regexes _trainable_variables
225tfprof> scope -max_depth 4 -select params
226_TFProfRoot (--/464.15k params)
227  init/init_conv/DW (3x3x3x16, 432/432 params)
228  pool_logit/DW (64x10, 640/640 params)
229  pool_logit/biases (10, 10/10 params)
230  unit_last/final_bn/beta (64, 64/64 params)
231  unit_last/final_bn/gamma (64, 64/64 params)
232```
233
234Where does `_trainable_variables` come from? It is customized operation type
235defined through the OpLogProto file.
236Users can [Define Customized Operation Type](#define-customized-operation-type)
237
238<b>Following example shows importance of defining customized operation type.</b>
239In this example, extra `Variables` are created by TensorFlow
240implicitly and “/Momentum” is appended to their names. They shouldn't be
241included in you “model capacity” calculation.
242
243```shell
244tfprof> scope -account_type_regexes VariableV2 -max_depth 4 -select params
245_TFProfRoot (--/930.58k params)
246  global_step (1/1 params)
247  init/init_conv/DW (3x3x3x16, 432/864 params)
248  pool_logit/DW (64x10, 640/1.28k params)
249    pool_logit/DW/Momentum (64x10, 640/640 params)
250  pool_logit/biases (10, 10/20 params)
251    pool_logit/biases/Momentum (10, 10/10 params)
252  unit_last/final_bn/beta (64, 64/128 params)
253  unit_last/final_bn/gamma (64, 64/128 params)
254  unit_last/final_bn/moving_mean (64, 64/64 params)
255  unit_last/final_bn/moving_variance (64, 64/64 params)
256```
257
258#### Profile Device Placement
259
260In this tutorial, a model is split
261on several gpus at workers and several parameter servers.
262
263In tfprof, 'device' is an op_type. For example, if op1 and op2 are placed on
264gpu:0. They share an operation type.
265
266```shell
267bazel-bin/tensorflow/core/profiler/profiler \
268  --graph_path=/tmp/graph.pbtxt  \
269  --run_meta_path=/tmp/run_meta
270
271# Looks like ps task 1 is holding twice more parameters than task 0.
272tfprof> scope -select device,params -account_type_regexes .*ps.*task:0.* -max_depth 1
273_TFProfRoot (--/25.81m params)
274tfprof> scope -select device,params -account_type_regexes .*ps.*task:1.* -max_depth 1
275_TFProfRoot (--/58.84m params)
276```
277
278#### Define Customized Operation Type
279
280First, in Python code, create an `OpLogProto` proto and add op type
281information to it:
282
283```python
284op_log = tfprof_log_pb2.OpLogProto()
285entry = op_log.log_entries.add()
286entry.name = 'pool_logit/DW'
287entry.types.append('pool_logit')
288entry = op_log.log_entries.add()
289entry.name = 'pool_logit/biases'
290entry.types.append('pool_logit')
291```
292
293Second, call write_op_log to write the OpLogProto proto.
294
295```python
296tf.profiler.write_op_log(
297    sess.graph, /tmp/my_op_log_dir, op_log)
298
299# Get run-time shape information in order to fill shapes and get flops.
300tf.profiler.write_op_log(
301    sess.graph, /tmp/my_op_log_dir, op_log, run_meta)
302```
303
304Third, when starting the tfprof tool, specify
305"--op_log_path=/tmp/my_op_log_dir/op_log"
306
307```shell
308tfprof> scope -account_type_regexes pool_logit -max_depth 4 -select params
309_TFProfRoot (--/650 params)
310  pool_logit/DW (64x10, 640/640 params)
311  pool_logit/biases (10, 10/10 params)
312```
313
314Note that `tf.profiler.write_op_log(...)` automatically
315assigns all `Variables` inside `tf.trainable_variables()` a customized
316operation type: `_trainable_variables`.
317
318
319#### Non-interactive Mode
32012) Run tfprof in one-shot mode and dump result to file.
321
322```shell
323# By default output to stdout. Use -output option to change output types.
324tfprof scope --graph_path=graph.pbtxt  \
325             --max_depth=3 \
326             --output="file:outfile=/tmp/dump"
327Reading Files...
328Parsing GraphDef...
329Preparing Views...
330
331cat /tmp/dump
332_TFProfRoot (--/930.58k params)
333  global_step (0/0 params)
334  pool_logit/DW (64x10, 640/1.28k params)
335  pool_logit/biases (10, 10/20 params)
336```
337