1## Command Line Interface Tutorials 2 3* [Command Line Inputs](#command-line-inputs) 4* [Start `tfprof`](#start-tfprof) 5* [Examples](#examples) 6 * [Profile Python Time](#profile-python-time) 7 * [Profile Graph Time](#profile-graph-time) 8 * [Profile Checkpoint Value](#profile-checkpoint-value) 9 * [Profile Model Parameter](#profile-model-parameter) 10 * [Profile Device Placement](#profile-device-placement) 11 * [Define Customized Operation Type](#define-customized-operation-type) 12 * [Non-interactive Mode](#non-interactive-mode) 13 14 15### Command Line Inputs 16 17tfprof command line tool uses the following input: 18 19<b>--profile_path:</b> A ProfileProto binary proto file. 20See QuickStart on generating the file. 21 22<b>THE OLD WAY BELOW IS DEPRECATED:</b> 23 24<b>--graph_path:</b> GraphDef proto file (optional in eager execution). 25Used to build in-memory 26data structure of the model. For example, graph.pbtxt written by tf.Supervisor 27can be passed to --graph_path. You can also easily get GraphDef using 28tf.get_default_graph().as_graph_def(add_shapes=True) or other API. 29 30<b>--run_meta_path:</b> RunMetadata proto file (optional). 31Used to get the memory consumption and execution time of 32each op of the model. 33 34The following code snippet writes a RunMetadata file: 35 36```python 37run_options = config_pb2.RunOptions(trace_level=config_pb2.RunOptions.FULL_TRACE) 38run_metadata = config_pb2.RunMetadata() 39_ = self._sess.run(..., options=run_options, run_metadata=run_metadata) 40with tf.gfile.Open(os.path.join(output_dir, "run_meta"), "w") as f: 41 f.write(run_metadata.SerializeToString()) 42``` 43 44<b>--op_log_path:</b> 45tensorflow.tfprof.OpLogProto (optional). A proto used to provide extra operation 46information. 1) float operations. 2) code traces. 3) define customized operation 47type for `-account_type_regexes` option. 48 49The following code snippet writes a OpLogProto file. 50 51```python 52tf.profiler.write_op_log(graph, log_dir, op_log=None) 53``` 54 55<b>--checkpoint_path:</b> TensorFlow checkpoint (optional). 56It defines _checkpoint_variable op type. It also provides checkpointed tensors' values. 57Note: this feature is not well maintained now. 58 59 60### Start `tfprof` 61 62#### Build `tfprof` 63 64```shell 65# Build the tool. 66bazel build --config opt tensorflow/core/profiler:profiler 67 68# Help information, including detail 'option' instructions. 69bazel-bin/tensorflow/core/profiler/profiler help 70``` 71 72#### Start `tfprof` Interactive Mode 73```shell 74# The following commands will start tfprof interactive mode. 75# 76# Recommended: 77# 78# The file contains the binary string of ProfileProto. 79# It contains all needed information in one file. 80bazel-bin/tensorflow/core/profiler/profiler \ 81 --profile_path=profile_xxx 82# 83# Alternatively, user can pass separate files. 84# 85# --graph_path contains the model architecture and tensor shapes. 86# --run_meta_path contains the memory and time information. 87# --op_log_path contains float operation and code traces. 88# --checkpoint_path contains the model checkpoint data. 89# 90# Only includes model architecture, parameters and shapes. 91bazel-bin/tensorflow/core/profiler/profiler \ 92 --graph_path=graph.pbtxt 93 94# For profiling eager execution, user can only specify run_meta_path 95# and profile execution info of each operation. 96bazel-bin/tensorflow/core/profiler/profiler \ 97 --run_meta_path=run_meta 98# 99# Additionally profile ops memory and timing. 100bazel-bin/tensorflow/core/profiler/profiler \ 101 --graph_path=graph.pbtxt \ 102 --run_meta_path=run_meta \ 103# 104# tfprof_log is used to define customized op types, float ops and code traces. 105# Use tfprof_logger.write_op_log() to create tfprof_log. 106bazel-bin/tensorflow/core/profiler/profiler \ 107 --graph_path=graph.pbtxt \ 108 --run_meta_path=run_meta \ 109 --op_log_path=tfprof_log \ 110# 111# Additionally profile checkpoint statistics and values. 112# Use '-account_type_regexes _checkpoint_variables' to select 113# checkpoint tensors. 114bazel-bin/tensorflow/core/profiler/profiler \ 115 --graph_path=graph.pbtxt \ 116 --run_meta_path=run_meta \ 117 --op_log_path=tfprof_log \ 118 --checkpoint_path=model.ckpt 119``` 120 121#### Start `tfprof` Non-interactive Mode. 122 123```python 124# Runs tfprof in one-shot. 125bazel-bin/tensorflow/core/profiler/profiler scope \ 126 --graph_path=graph.pbtxt \ 127 --max_depth=3 128``` 129 130#### Press enter to show the default options 131 132Refer to [Options](options.md) for option instructions. 133 134```shell 135tfprof> 136-max_depth 4 137-min_bytes 0 138-min_micros 0 139-min_params 0 140-min_float_ops 0 141-min_occurrence 0 142-step -1 143-order_by name 144-account_type_regexes Variable,VariableV2 145-start_name_regexes .* 146-trim_name_regexes 147-show_name_regexes .* 148-hide_name_regexes IsVariableInitialized_[0-9]+,save\/.*,^zeros[0-9_]* 149-account_displayed_op_only false 150# supported select fields. Availability depends on --[run_meta|checkpoint|op_log]_path. 151# [bytes|micros|params|float_ops|occurrence|tensor_value|device|op_types] 152-select params 153# format: output_type:key=value,key=value... 154# output_types: stdout (default), timeline, file. 155# key=value pairs: 156# 1. timeline: outfile=<filename> 157# 2. file: outfile=<filename> 158# 3. stdout: None. 159# E.g. timeline:outfile=/tmp/timeline.json 160-output 161``` 162 163### Examples 164 165#### Profile Python Time 166```shell 167# Requires --graph_path --op_log_path 168tfprof> code -max_depth 1000 -show_name_regexes .*model_analyzer.*py.* -select micros -account_type_regexes .* -order_by micros 169_TFProfRoot (0us/22.44ms) 170 model_analyzer_test.py:149:run_filename_as_m...:none (0us/22.44ms) 171 model_analyzer_test.py:33:_run_code_in_main:none (0us/22.44ms) 172 model_analyzer_test.py:208:<module>:test.main() (0us/22.44ms) 173 model_analyzer_test.py:132:testComplexCodeView:x = lib.BuildFull... (0us/22.44ms) 174 model_analyzer_testlib.py:63:BuildFullModel:return sgd_op.min... (0us/21.83ms) 175 model_analyzer_testlib.py:58:BuildFullModel:cell, array_ops.c... (0us/333us) 176 model_analyzer_testlib.py:54:BuildFullModel:seq.append(array_... (0us/254us) 177 model_analyzer_testlib.py:42:BuildSmallModel:x = nn_ops.conv2d... (0us/134us) 178 model_analyzer_testlib.py:46:BuildSmallModel:initializer=init_... (0us/40us) 179 ... 180 model_analyzer_testlib.py:61:BuildFullModel:loss = nn_ops.l2_... (0us/28us) 181 model_analyzer_testlib.py:60:BuildFullModel:target = array_op... (0us/0us) 182 model_analyzer_test.py:134:testComplexCodeView:sess.run(variable... (0us/0us) 183``` 184 185Set ```-output timeline:outfile=<filename>``` to generate timeline instead of stdout. 186<left> 187![CodeTimeline](code_timeline.png) 188</left> 189 190#### Profile Graph Time 191 192```shell 193# I defined an op named ‘cost’ to calculate the loss. I want to know what ops 194# it depends on take a long time to run. 195 196# Requires --graph_path, --run_meta_path. 197tfprof> graph -start_name_regexes cost.* -max_depth 100 -min_micros 10000 -select micros -account_type_regexes .* 198_TFProfRoot (0us/3.61sec) 199 init/init_conv/Conv2D (11.75ms/3.10sec) 200 random_shuffle_queue_DequeueMany (3.09sec/3.09sec) 201 unit_1_0/sub2/conv2/Conv2D (74.14ms/3.19sec) 202 unit_1_3/sub2/conv2/Conv2D (60.75ms/3.34sec) 203 unit_2_4/sub2/conv2/Conv2D (73.58ms/3.54sec) 204 unit_3_3/sub2/conv2/Conv2D (10.26ms/3.60sec) 205``` 206 207#### Profile Checkpoint Value 208```shell 209# Requires --graph_path, --checkpoint_path. 210tfprof> scope -show_name_regexes unit_1_0.*gamma -select tensor_value -max_depth 5 211_TFProfRoot () 212 unit_1_0/shared_activation/init_bn/gamma () 213[1.80 2.10 2.06 1.91 2.26 1.86 1.81 1.37 1.78 1.85 1.96 1.54 2.04 2.34 2.22 1.99 ], 214 unit_1_0/sub2/bn2/gamma () 215[1.57 1.83 1.30 1.25 1.59 1.14 1.26 0.82 1.19 1.10 1.48 1.01 0.82 1.23 1.21 1.14 ], 216``` 217 218#### Profile Model Parameter 219 220```shell 221# Show the number of parameters of all `tf.trainable_variables()` in the model. 222# Requires --graph_path --op_log_path. 223# store option for future commands. 224tfprof> set -account_type_regexes _trainable_variables 225tfprof> scope -max_depth 4 -select params 226_TFProfRoot (--/464.15k params) 227 init/init_conv/DW (3x3x3x16, 432/432 params) 228 pool_logit/DW (64x10, 640/640 params) 229 pool_logit/biases (10, 10/10 params) 230 unit_last/final_bn/beta (64, 64/64 params) 231 unit_last/final_bn/gamma (64, 64/64 params) 232``` 233 234Where does `_trainable_variables` come from? It is customized operation type 235defined through the OpLogProto file. 236Users can [Define Customized Operation Type](#define-customized-operation-type) 237 238<b>Following example shows importance of defining customized operation type.</b> 239In this example, extra `Variables` are created by TensorFlow 240implicitly and “/Momentum” is appended to their names. They shouldn't be 241included in you “model capacity” calculation. 242 243```shell 244tfprof> scope -account_type_regexes VariableV2 -max_depth 4 -select params 245_TFProfRoot (--/930.58k params) 246 global_step (1/1 params) 247 init/init_conv/DW (3x3x3x16, 432/864 params) 248 pool_logit/DW (64x10, 640/1.28k params) 249 pool_logit/DW/Momentum (64x10, 640/640 params) 250 pool_logit/biases (10, 10/20 params) 251 pool_logit/biases/Momentum (10, 10/10 params) 252 unit_last/final_bn/beta (64, 64/128 params) 253 unit_last/final_bn/gamma (64, 64/128 params) 254 unit_last/final_bn/moving_mean (64, 64/64 params) 255 unit_last/final_bn/moving_variance (64, 64/64 params) 256``` 257 258#### Profile Device Placement 259 260In this tutorial, a model is split 261on several gpus at workers and several parameter servers. 262 263In tfprof, 'device' is an op_type. For example, if op1 and op2 are placed on 264gpu:0. They share an operation type. 265 266```shell 267bazel-bin/tensorflow/core/profiler/profiler \ 268 --graph_path=/tmp/graph.pbtxt \ 269 --run_meta_path=/tmp/run_meta 270 271# Looks like ps task 1 is holding twice more parameters than task 0. 272tfprof> scope -select device,params -account_type_regexes .*ps.*task:0.* -max_depth 1 273_TFProfRoot (--/25.81m params) 274tfprof> scope -select device,params -account_type_regexes .*ps.*task:1.* -max_depth 1 275_TFProfRoot (--/58.84m params) 276``` 277 278#### Define Customized Operation Type 279 280First, in Python code, create an `OpLogProto` proto and add op type 281information to it: 282 283```python 284op_log = tfprof_log_pb2.OpLogProto() 285entry = op_log.log_entries.add() 286entry.name = 'pool_logit/DW' 287entry.types.append('pool_logit') 288entry = op_log.log_entries.add() 289entry.name = 'pool_logit/biases' 290entry.types.append('pool_logit') 291``` 292 293Second, call write_op_log to write the OpLogProto proto. 294 295```python 296tf.profiler.write_op_log( 297 sess.graph, /tmp/my_op_log_dir, op_log) 298 299# Get run-time shape information in order to fill shapes and get flops. 300tf.profiler.write_op_log( 301 sess.graph, /tmp/my_op_log_dir, op_log, run_meta) 302``` 303 304Third, when starting the tfprof tool, specify 305"--op_log_path=/tmp/my_op_log_dir/op_log" 306 307```shell 308tfprof> scope -account_type_regexes pool_logit -max_depth 4 -select params 309_TFProfRoot (--/650 params) 310 pool_logit/DW (64x10, 640/640 params) 311 pool_logit/biases (10, 10/10 params) 312``` 313 314Note that `tf.profiler.write_op_log(...)` automatically 315assigns all `Variables` inside `tf.trainable_variables()` a customized 316operation type: `_trainable_variables`. 317 318 319#### Non-interactive Mode 32012) Run tfprof in one-shot mode and dump result to file. 321 322```shell 323# By default output to stdout. Use -output option to change output types. 324tfprof scope --graph_path=graph.pbtxt \ 325 --max_depth=3 \ 326 --output="file:outfile=/tmp/dump" 327Reading Files... 328Parsing GraphDef... 329Preparing Views... 330 331cat /tmp/dump 332_TFProfRoot (--/930.58k params) 333 global_step (0/0 params) 334 pool_logit/DW (64x10, 640/1.28k params) 335 pool_logit/biases (10, 10/20 params) 336``` 337