1# Transport Explainer
2
3@vjpai
4
5## Existing Transports
6
7[gRPC
8transports](https://github.com/grpc/grpc/tree/master/src/core/ext/transport)
9plug in below the core API (one level below the C++ or other wrapped-language
10API). You can write your transport in C or C++ though; currently (Nov 2017) all
11the transports are nominally written in C++ though they are idiomatically C. The
12existing transports are:
13
14* [HTTP/2](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/chttp2)
15* [Cronet](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/cronet)
16* [In-process](https://github.com/grpc/grpc/tree/master/src/core/ext/transport/inproc)
17
18Among these, the in-process is likely the easiest to understand, though arguably
19also the least similar to a "real" sockets-based transport since it is only used
20in a single process.
21
22## Transport stream ops
23
24In the gRPC core implementation, a fundamental struct is the
25`grpc_transport_stream_op_batch` which represents a collection of stream
26operations sent to a transport. (Note that in gRPC, _stream_ and _RPC_ are used
27synonymously since all RPCs are actually streams internally.) The ops in a batch
28can include:
29
30* send\_initial\_metadata
31  - Client: initate an RPC
32  - Server: supply response headers
33* recv\_initial\_metadata
34  - Client: get response headers
35  - Server: accept an RPC
36* send\_message (zero or more) : send a data buffer
37* recv\_message (zero or more) : receive a data buffer
38* send\_trailing\_metadata
39  - Client: half-close indicating that no more messages will be coming
40  - Server: full-close providing final status for the RPC
41* recv\_trailing\_metadata: get final status for the RPC
42  - Server extra: This op shouldn't actually be considered complete until the
43    server has also sent trailing metadata to provide the other side with final
44    status
45* cancel\_stream: Attempt to cancel an RPC
46* collect\_stats: Get stats
47
48The fundamental responsibility of the transport is to transform between this
49internal format and an actual wire format, so the processing of these operations
50is largely transport-specific.
51
52One or more of these ops are grouped into a batch. Applications can start all of
53a call's ops in a single batch, or they can split them up into multiple
54batches. Results of each batch are returned asynchronously via a completion
55queue.
56
57Internally, we use callbacks to indicate completion. The surface layer creates a
58callback when starting a new batch and sends it down the filter stack along with
59the batch. The transport must invoke this callback when the batch is complete,
60and then the surface layer returns an event to the application via the
61completion queue. Each batch can have up to 3 callbacks:
62
63* recv\_initial\_metadata\_ready (called by the transport when the
64  recv\_initial\_metadata op is complete)
65* recv\_message\_ready (called by the transport when the recv_message op is
66  complete)
67* on\_complete (called by the transport when the entire batch is complete)
68
69## Timelines of transport stream op batches
70
71The transport's job is to sequence and interpret various possible interleavings
72of the basic stream ops. For example, a sample timeline of batches would be:
73
741. Client send\_initial\_metadata: Initiate an RPC with a path (method) and authority
751. Server recv\_initial\_metadata: accept an RPC
761. Client send\_message: Supply the input proto for the RPC
771. Server recv\_message: Get the input proto from the RPC
781. Client send\_trailing\_metadata: This is a half-close indicating that the
79   client will not be sending any more messages
801. Server recv\_trailing\_metadata: The server sees this from the client and
81   knows that it will not get any more messages. This won't complete yet though,
82   as described above.
831. Server send\_initial\_metadata, send\_message, send\_trailing\_metadata: A
84   batch can contain multiple ops, and this batch provides the RPC response
85   headers, response content, and status. Note that sending the trailing
86   metadata will also complete the server's receive of trailing metadata.
871. Client recv\_initial\_metadata: The number of ops in one side of the batch
88   has no relation with the number of ops on the other side of the batch. In
89   this case, the client is just collecting the response headers.
901. Client recv\_message, recv\_trailing\_metadata: Get the data response and
91   status
92
93
94There are other possible sample timelines. For example, for client-side streaming, a "typical" sequence would be:
95
961. Server: recv\_initial\_metadata
97   - At API-level, that would be the server requesting an RPC
981. Server: recv\_trailing\_metadata
99   - This is for when the server wants to know the final completion of the RPC
100     through an `AsyncNotifyWhenDone` API in C++
1011. Client: send\_initial\_metadata, recv\_message, recv\_trailing\_metadata
102   - At API-level, that's a client invoking a client-side streaming call. The
103     send\_initial\_metadata is the call invocation, the recv\_message colects
104     the final response from the server, and the recv\_trailing\_metadata gets
105     the `grpc::Status` value that will be returned from the call
1061. Client: send\_message / Server: recv\_message
107   - Repeat the above step numerous times; these correspond to a client issuing
108     `Write` in a loop and a server doing `Read` in a loop until `Read` fails
1091. Client: send\_trailing\_metadata / Server: recv\_message that indicates doneness (NULL)
110   - These correspond to a client issuing `WritesDone` which causes the server's
111     `Read` to fail
1121. Server: send\_message, send\_trailing\_metadata
113   - These correpond to the server doing `Finish`
114
115The sends on one side will call their own callbacks when complete, and they will
116in turn trigger actions that cause the other side's recv operations to
117complete. In some transports, a send can sometimes complete before the recv on
118the other side (e.g., in HTTP/2 if there is sufficient flow-control buffer space
119available)
120
121## Other transport duties
122
123In addition to these basic stream ops, the transport must handle cancellations
124of a stream at any time and pass their effects to the other side. For example,
125in HTTP/2, this triggers a `RST_STREAM` being sent on the wire. The transport
126must perform operations like pings and statistics that are used to shape
127transport-level characteristics like flow control (see, for example, their use
128in the HTTP/2 transport).
129
130## Putting things together with detail: Sending Metadata
131
132* API layer: `map<string, string>` that is specific to this RPC
133* Core surface layer: array of `{slice, slice}` pairs where each slice
134  references an underlying string
135* [Core transport
136  layer](https://github.com/grpc/grpc/tree/master/src/core/lib/transport): list
137  of `{slice, slice}` pairs that includes the above plus possibly some general
138  metadata (e.g., Method and Authority for initial metadata)
139* [Specific transport
140  layer](https://github.com/grpc/grpc/tree/master/src/core/ext/transport):
141  - Either send it to the other side using transport-specific API (e.g., Cronet)
142  - Or have it sent through the [iomgr/endpoint
143    layer](https://github.com/grpc/grpc/tree/master/src/core/lib/iomgr) (e.g.,
144    HTTP/2)
145  - Or just manipulate pointers to get it from one side to the other (e.g.,
146    In-process)
147
148## Requirements for any transport
149
150Each transport implements several operations in a vtbl (may change to actual
151virtual functions as transport moves to idiomatic C++).
152
153The most important and common one is `perform_stream_op`. This function
154processes a single stream op batch on a specific stream that is associated with
155a specific transport:
156
157* Gets the 6 ops/cancel passed down from the surface
158* Pass metadata from one side to the other as described above
159* Transform messages between slice buffer structure and stream of bytes to pass
160  to other side
161  - May require insertion of extra bytes (e.g., per-message headers in HTTP/2)
162* React to metadata to preserve expected orderings (*)
163* Schedule invocation of completion callbacks
164
165There are other functions in the vtbl as well.
166
167* `perform_transport_op`
168  - Configure the transport instance for the connectivity state change notifier
169    or the server-side accept callback
170  - Disconnect transport or set up a goaway for later streams
171* `init_stream`
172  - Starts a stream from the client-side
173  - (*) Server-side of the transport must call `accept_stream_cb` when a new
174  stream is available
175    * Triggers request-matcher
176* `destroy_stream`, `destroy_transport`
177  - Free up data related to a stream or transport
178* `set_pollset`, `set_pollset_set`, `get_endpoint`
179  - Map each specific instance of the transport to FDs being used by iomgr (for
180    HTTP/2)
181  - Get a pointer to the endpoint structure that actually moves the data
182    (wrapper around a socket for HTTP/2)
183
184## Book-keeping responsibilities of the transport layer
185
186A given transport must keep all of its transport and streams ref-counted. This
187is essential to make sure that no struct disappears before it is done being
188used.
189
190A transport must also preserve relevant orders for the different categories of
191ops on a stream, as described above. A transport must also make sure that all
192relevant batch operations have completed before scheduling the `on_complete`
193closure for a batch. Further examples include the idea that the server logic
194expects to not complete recv\_trailing\_metadata until after it actually sends
195trailing metadata since it would have already found this out by seeing a NULL’ed
196recv\_message. This is considered part of the transport's duties in preserving
197orders.
198