1# Using TensorFlow Securely 2 3This document discusses how to safely deal with untrusted programs (models or 4model parameters), and input data. Below, we also provide guidelines on how to 5report vulnerabilities in TensorFlow. 6 7## TensorFlow models are programs 8 9TensorFlow's runtime system interprets and executes programs. What machine 10learning practitioners term 11[**models**](https://developers.google.com/machine-learning/glossary/#model) are 12expressed as programs that TensorFlow executes. TensorFlow programs are encoded 13as computation 14[**graphs**](https://developers.google.com/machine-learning/glossary/#graph). 15The model's parameters are often stored separately in **checkpoints**. 16 17At runtime, TensorFlow executes the computation graph using the parameters 18provided. Note that the behavior of the computation graph may change 19depending on the parameters provided. TensorFlow itself is not a sandbox. When 20executing the computation graph, TensorFlow may read and write files, send and 21receive data over the network, and even spawn additional processes. All these 22tasks are performed with the permissions of the TensorFlow process. Allowing 23for this flexibility makes for a powerful machine learning platform, 24but it has implications for security. 25 26The computation graph may also accept **inputs**. Those inputs are the 27data you supply to TensorFlow to train a model, or to use a model to run 28inference on the data. 29 30**TensorFlow models are programs, and need to be treated as such from a security 31perspective.** 32 33## Running untrusted models 34 35As a general rule: **Always** execute untrusted models inside a sandbox (e.g., 36[nsjail](https://github.com/google/nsjail)). 37 38There are several ways in which a model could become untrusted. Obviously, if an 39untrusted party supplies TensorFlow kernels, arbitrary code may be executed. 40The same is true if the untrusted party provides Python code, such as the 41Python code that generates TensorFlow graphs. 42 43Even if the untrusted party only supplies the serialized computation 44graph (in form of a `GraphDef`, `SavedModel`, or equivalent on-disk format), the 45set of computation primitives available to TensorFlow is powerful enough that 46you should assume that the TensorFlow process effectively executes arbitrary 47code. One common solution is to whitelist only a few safe Ops. While this is 48possible in theory, we still recommend you sandbox the execution. 49 50It depends on the computation graph whether a user provided checkpoint is safe. 51It is easily possible to create computation graphs in which malicious 52checkpoints can trigger unsafe behavior. For example, consider a graph that 53contains a `tf.cond` depending on the value of a `tf.Variable`. One branch of 54the `tf.cond` is harmless, but the other is unsafe. Since the `tf.Variable` is 55stored in the checkpoint, whoever provides the checkpoint now has the ability to 56trigger unsafe behavior, even though the graph is not under their control. 57 58In other words, graphs can contain vulnerabilities of their own. To allow users 59to provide checkpoints to a model you run on their behalf (e.g., in order to 60compare model quality for a fixed model architecture), you must carefully audit 61your model, and we recommend you run the TensorFlow process in a sandbox. 62 63## Accepting untrusted Inputs 64 65It is possible to write models that are secure in a sense that they can safely 66process untrusted inputs assuming there are no bugs. There are two main reasons 67to not rely on this: first, it is easy to write models which must not be exposed 68to untrusted inputs, and second, there are bugs in any software system of 69sufficient complexity. Letting users control inputs could allow them to trigger 70bugs either in TensorFlow or in dependent libraries. 71 72In general, it is good practice to isolate parts of any system which is exposed 73to untrusted (e.g., user-provided) inputs in a sandbox. 74 75A useful analogy to how any TensorFlow graph is executed is any interpreted 76programming language, such as Python. While it is possible to write secure 77Python code which can be exposed to user supplied inputs (by, e.g., carefully 78quoting and sanitizing input strings, size-checking input blobs, etc.), it is 79very easy to write Python programs which are insecure. Even secure Python code 80could be rendered insecure by a bug in the Python interpreter, or in a bug in a 81Python library used (e.g., 82[this one](https://www.cvedetails.com/cve/CVE-2017-12852/)). 83 84## Running a TensorFlow server 85 86TensorFlow is a platform for distributed computing, and as such there is a 87TensorFlow server (`tf.train.Server`). **The TensorFlow server is meant for 88internal communication only. It is not built for use in an untrusted network.** 89 90For performance reasons, the default TensorFlow server does not include any 91authorization protocol and sends messages unencrypted. It accepts connections 92from anywhere, and executes the graphs it is sent without performing any checks. 93Therefore, if you run a `tf.train.Server` in your network, anybody with 94access to the network can execute what you should consider arbitrary code with 95the privileges of the process running the `tf.train.Server`. 96 97When running distributed TensorFlow, you must isolate the network in which the 98cluster lives. Cloud providers provide instructions for setting up isolated 99networks, which are sometimes branded as "virtual private cloud." Refer to the 100instructions for 101[GCP](https://cloud.google.com/compute/docs/networks-and-firewalls) and 102[AWS](https://aws.amazon.com/vpc/)) for details. 103 104Note that `tf.train.Server` is different from the server created by 105`tensorflow/serving` (the default binary for which is called `ModelServer`). 106By default, `ModelServer` also has no built-in mechanism for authentication. 107Connecting it to an untrusted network allows anyone on this network to run the 108graphs known to the `ModelServer`. This means that an attacker may run 109graphs using untrusted inputs as described above, but they would not be able to 110execute arbitrary graphs. It is possible to safely expose a `ModelServer` 111directly to an untrusted network, **but only if the graphs it is configured to 112use have been carefully audited to be safe**. 113 114Similar to best practices for other servers, we recommend running any 115`ModelServer` with appropriate privileges (i.e., using a separate user with 116reduced permissions). In the spirit of defense in depth, we recommend 117authenticating requests to any TensorFlow server connected to an untrusted 118network, as well as sandboxing the server to minimize the adverse effects of 119any breach. 120 121## Vulnerabilities in TensorFlow 122 123TensorFlow is a large and complex system. It also depends on a large set of 124third party libraries (e.g., `numpy`, `libjpeg-turbo`, PNG parsers, `protobuf`). 125It is possible that TensorFlow or its dependent libraries contain 126vulnerabilities that would allow triggering unexpected or dangerous behavior 127with specially crafted inputs. 128 129### What is a vulnerability? 130 131Given TensorFlow's flexibility, it is possible to specify computation graphs 132which exhibit unexpected or unwanted behavior. The fact that TensorFlow models 133can perform arbitrary computations means that they may read and write files, 134communicate via the network, produce deadlocks and infinite loops, or run out 135of memory. It is only when these behaviors are outside the specifications of the 136operations involved that such behavior is a vulnerability. 137 138A `FileWriter` writing a file is not unexpected behavior and therefore is not a 139vulnerability in TensorFlow. A `MatMul` allowing arbitrary binary code execution 140**is** a vulnerability. 141 142This is more subtle from a system perspective. For example, it is easy to cause 143a TensorFlow process to try to allocate more memory than available by specifying 144a computation graph containing an ill-considered `tf.tile` operation. TensorFlow 145should exit cleanly in this case (it would raise an exception in Python, or 146return an error `Status` in C++). However, if the surrounding system is not 147expecting the possibility, such behavior could be used in a denial of service 148attack (or worse). Because TensorFlow behaves correctly, this is not a 149vulnerability in TensorFlow (although it would be a vulnerability of this 150hypothetical system). 151 152As a general rule, it is incorrect behavior for Tensorflow to access memory it 153does not own, or to terminate in an unclean way. Bugs in TensorFlow that lead to 154such behaviors constitute a vulnerability. 155 156One of the most critical parts of any system is input handling. If malicious 157input can trigger side effects or incorrect behavior, this is a bug, and likely 158a vulnerability. 159 160### Reporting vulnerabilities 161 162Please email reports about any security related issues you find to 163`security@tensorflow.org`. This mail is delivered to a small security team. Your 164email will be acknowledged within one business day, and you'll receive a more 165detailed response to your email within 7 days indicating the next steps in 166handling your report. For critical problems, you may encrypt your report (see 167below). 168 169Please use a descriptive subject line for your report email. After the initial 170reply to your report, the security team will endeavor to keep you informed of 171the progress being made towards a fix and announcement. 172 173In addition, please include the following information along with your report: 174 175* Your name and affiliation (if any). 176* A description of the technical details of the vulnerabilities. It is very 177 important to let us know how we can reproduce your findings. 178* An explanation who can exploit this vulnerability, and what they gain when 179 doing so -- write an attack scenario. This will help us evaluate your report 180 quickly, especially if the issue is complex. 181* Whether this vulnerability public or known to third parties. If it is, please 182 provide details. 183 184If you believe that an existing (public) issue is security-related, please send 185an email to `security@tensorflow.org`. The email should include the issue ID and 186a short description of why it should be handled according to this security 187policy. 188 189Once an issue is reported, TensorFlow uses the following disclosure process: 190 191* When a report is received, we confirm the issue and determine its severity. 192* If we know of specific third-party services or software based on TensorFlow 193 that require mitigation before publication, those projects will be notified. 194* An advisory is prepared (but not published) which details the problem and 195 steps for mitigation. 196* Wherever possible, fixes are prepared for the last minor release of the two 197 latest major releases, as well as the master branch. We will attempt to 198 commit these fixes as soon as possible, and as close together as 199 possible. 200* Patch releases are published for all fixed released versions, a 201 notification is sent to discuss@tensorflow.org, and the advisory is published. 202 203Past security advisories are listed below. We credit reporters for identifying 204security issues, although we keep your name confidential if you request it. 205 206#### Encryption key for `security@tensorflow.org` 207 208If your disclosure is extremely sensitive, you may choose to encrypt your 209report using the key below. Please only use this for critical security 210reports. 211 212``` 213-----BEGIN PGP PUBLIC KEY BLOCK----- 214 215mQENBFpqdzwBCADTeAHLNEe9Vm77AxhmGP+CdjlY84O6DouOCDSq00zFYdIU/7aI 216LjYwhEmDEvLnRCYeFGdIHVtW9YrVktqYE9HXVQC7nULU6U6cvkQbwHCdrjaDaylP 217aJUXkNrrxibhx9YYdy465CfusAaZ0aM+T9DpcZg98SmsSml/HAiiY4mbg/yNVdPs 218SEp/Ui4zdIBNNs6at2gGZrd4qWhdM0MqGJlehqdeUKRICE/mdedXwsWLM8AfEA0e 219OeTVhZ+EtYCypiF4fVl/NsqJ/zhBJpCx/1FBI1Uf/lu2TE4eOS1FgmIqb2j4T+jY 220e+4C8kGB405PAC0n50YpOrOs6k7fiQDjYmbNABEBAAG0LVRlbnNvckZsb3cgU2Vj 221dXJpdHkgPHNlY3VyaXR5QHRlbnNvcmZsb3cub3JnPokBTgQTAQgAOBYhBEkvXzHm 222gOJBnwP4Wxnef3wVoM2yBQJaanc8AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA 223AAoJEBnef3wVoM2yNlkIAICqetv33MD9W6mPAXH3eon+KJoeHQHYOuwWfYkUF6CC 224o+X2dlPqBSqMG3bFuTrrcwjr9w1V8HkNuzzOJvCm1CJVKaxMzPuXhBq5+DeT67+a 225T/wK1L2R1bF0gs7Pp40W3np8iAFEh8sgqtxXvLGJLGDZ1Lnfdprg3HciqaVAiTum 226HBFwszszZZ1wAnKJs5KVteFN7GSSng3qBcj0E0ql2nPGEqCVh+6RG/TU5C8gEsEf 2273DX768M4okmFDKTzLNBm+l08kkBFt+P43rNK8dyC4PXk7yJa93SmS/dlK6DZ16Yw 2282FS1StiZSVqygTW59rM5XNwdhKVXy2mf/RtNSr84gSi5AQ0EWmp3PAEIALInfBLR 229N6fAUGPFj+K3za3PeD0fWDijlC9f4Ety/icwWPkOBdYVBn0atzI21thPRbfuUxfe 230zr76xNNrtRRlbDSAChA1J5T86EflowcQor8dNC6fS+oHFCGeUjfEAm16P6mGTo0p 231osdG2XnnTHOOEFbEUeWOwR/zT0QRaGGknoy2pc4doWcJptqJIdTl1K8xyBieik/b 232nSoClqQdZJa4XA3H9G+F4NmoZGEguC5GGb2P9NHYAJ3MLHBHywZip8g9oojIwda+ 233OCLL4UPEZ89cl0EyhXM0nIAmGn3Chdjfu3ebF0SeuToGN8E1goUs3qSE77ZdzIsR 234BzZSDFrgmZH+uP0AEQEAAYkBNgQYAQgAIBYhBEkvXzHmgOJBnwP4Wxnef3wVoM2y 235BQJaanc8AhsMAAoJEBnef3wVoM2yX4wIALcYZbQhSEzCsTl56UHofze6C3QuFQIH 236J4MIKrkTfwiHlCujv7GASGU2Vtis5YEyOoMidUVLlwnebE388MmaJYRm0fhYq6lP 237A3vnOCcczy1tbo846bRdv012zdUA+wY+mOITdOoUjAhYulUR0kiA2UdLSfYzbWwy 2387Obq96Jb/cPRxk8jKUu2rqC/KDrkFDtAtjdIHh6nbbQhFuaRuWntISZgpIJxd8Bt 239Gwi0imUVd9m9wZGuTbDGi6YTNk0GPpX5OMF5hjtM/objzTihSw9UN+65Y/oSQM81 240v//Fw6ZeY+HmRDFdirjD7wXtIuER4vqCryIqR6Xe9X8oJXz9L/Jhslc= 241=CDME 242-----END PGP PUBLIC KEY BLOCK----- 243``` 244 245### Known Vulnerabilities 246 247For a list of known vulnerabilities and security advisories for TensorFlow, 248[click here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/index.md). 249