1# Protocol Buffers in Swift
2
3## Objective
4
5This document describes the user-facing API and internal implementation of
6proto2 and proto3 messages in Apple’s Swift programming language.
7
8One of the key goals of protobufs is to provide idiomatic APIs for each
9language. In that vein, **interoperability with Objective-C is a non-goal of
10this proposal.** Protobuf users who need to pass messages between Objective-C
11and Swift code in the same application should use the existing Objective-C proto
12library. The goal of the effort described here is to provide an API for protobuf
13messages that uses features specific to Swift—optional types, algebraic
14enumerated types, value types, and so forth—in a natural way that will delight,
15rather than surprise, users of the language.
16
17## Naming
18
19*   By convention, both typical protobuf message names and Swift structs/classes
20    are `UpperCamelCase`, so for most messages, the name of a message can be the
21    same as the name of its generated type. (However, see the discussion below
22    about prefixes under [Packages](#packages).)
23
24*   Enum cases in protobufs typically are `UPPERCASE_WITH_UNDERSCORES`, whereas
25    in Swift they are `lowerCamelCase` (as of the Swift 3 API design
26    guidelines). We will transform the names to match Swift convention, using
27    a whitelist similar to the Objective-C compiler plugin to handle commonly
28    used acronyms.
29
30*   Typical fields in proto messages are `lowercase_with_underscores`, while in
31    Swift they are `lowerCamelCase`. We will transform the names to match
32    Swift convention by removing the underscores and uppercasing the subsequent
33    letter.
34
35## Swift reserved words
36
37Swift has a large set of reserved words—some always reserved and some
38contextually reserved (that is, they can be used as identifiers in contexts
39where they would not be confused). As of Swift 2.2, the set of always-reserved
40words is:
41
42```
43_, #available, #column, #else, #elseif, #endif, #file, #function, #if, #line,
44#selector, as, associatedtype, break, case, catch, class, continue, default,
45defer, deinit, do, dynamicType, else, enum, extension, fallthrough, false, for,
46func, guard, if, import, in, init, inout, internal, is, let, nil, operator,
47private, protocol, public, repeat, rethrows, return, self, Self, static,
48struct, subscript, super, switch, throw, throws, true, try, typealias, var,
49where, while
50```
51
52The set of contextually reserved words is:
53
54```
55associativity, convenience, dynamic, didSet, final, get, infix, indirect,
56lazy, left, mutating, none, nonmutating, optional, override, postfix,
57precedence, prefix, Protocol, required, right, set, Type, unowned, weak,
58willSet
59```
60
61It is possible to use any reserved word as an identifier by escaping it with
62backticks (for example, ``let `class` = 5``). Other name-mangling schemes would
63require us to transform the names themselves (for example, by appending an
64underscore), which requires us to then ensure that the new name does not collide
65with something else in the same namespace.
66
67While the backtick feature may not be widely known by all Swift developers, a
68small amount of user education can address this and it seems like the best
69approach. We can unconditionally surround all property names with backticks to
70simplify generation.
71
72Some remapping will still be required, though, to avoid collisions between
73generated properties and the names of methods and properties defined in the base
74protocol/implementation of messages.
75
76# Features of Protocol Buffers
77
78This section describes how the features of the protocol buffer syntaxes (proto2
79and proto3) map to features in Swift—what the code generated from a proto will
80look like, and how it will be implemented in the underlying library.
81
82## Packages
83
84Modules are the main form of namespacing in Swift, but they are not declared
85using syntactic constructs like namespaces in C++ or packages in Java. Instead,
86they are tied to build targets in Xcode (or, in the future with open-source
87Swift, declarations in a Swift Package Manager manifest). They also do not
88easily support nesting submodules (Clang module maps support this, but pure
89Swift does not yet provide a way to define submodules).
90
91We will generate types with fully-qualified underscore-delimited names. For
92example, a message `Baz` in package `foo.bar` would generate a struct named
93`Foo_Bar_Baz`. For each fully-qualified proto message, there will be exactly one
94unique type symbol emitted in the generated binary.
95
96Users are likely to balk at the ugliness of underscore-delimited names for every
97generated type. To improve upon this situation, we will add a new string file
98level option, `swift_package_typealias`, that can be added to `.proto` files.
99When present, this will cause `typealias`es to be added to the generated Swift
100messages that replace the package name prefix with the provided string. For
101example, the following `.proto` file:
102
103```protobuf
104option swift_package_typealias = "FBP";
105package foo.bar;
106
107message Baz {
108  // Message fields
109}
110```
111
112would generate the following Swift source:
113
114```swift
115public struct Foo_Bar_Baz {
116  // Message fields and other methods
117}
118
119typealias FBPBaz = Foo_Bar_Baz
120```
121
122It should be noted that this type alias is recorded in the generated
123`.swiftmodule` so that code importing the module can refer to it, but it does
124not cause a new symbol to be generated in the compiled binary (i.e., we do not
125risk compiled size bloat by adding `typealias`es for every type).
126
127Other strategies to handle packages that were considered and rejected can be
128found in [Appendix A](#appendix-a-rejected-strategies-to-handle-packages).
129
130## Messages
131
132Proto messages are natural value types and we will generate messages as structs
133instead of classes. Users will benefit from Swift’s built-in behavior with
134regard to mutability. We will define a `ProtoMessage` protocol that defines the
135common methods and properties for all messages (such as serialization) and also
136lets users treat messages polymorphically. Any shared method implementations
137that do not differ between individual messages can be implemented in a protocol
138extension.
139
140The backing storage itself for fields of a message will be managed by a
141`ProtoFieldStorage` type that uses an internal dictionary keyed by field number,
142and whose values are the value of the field with that number (up-cast to Swift’s
143`Any` type). This class will provide type-safe getters and setters so that
144generated messages can manipulate this storage, and core serialization logic
145will live here as well. Furthermore, factoring the storage out into a separate
146type, rather than inlining the fields as stored properties in the message
147itself, lets us implement copy-on-write efficiently to support passing around
148large messages. (Furthermore, because the messages themselves are value types,
149inlining fields is not possible if the fields are submessages of the same type,
150or a type that eventually includes a submessage of the same type.)
151
152### Required fields (proto2 only)
153
154Required fields in proto2 messages seem like they could be naturally represented
155by non-optional properties in Swift, but this presents some problems/concerns.
156
157Serialization APIs permit partial serialization, which allows required fields to
158remain unset. Furthermore, other language APIs still provide `has*` and `clear*`
159methods for required fields, and knowing whether a property has a value when the
160message is in memory is still useful.
161
162For example, an e-mail draft message may have the “to” address required on the
163wire, but when the user constructs it in memory, it doesn’t make sense to force
164a value until they provide one. We only want to force a value to be present when
165the message is serialized to the wire. Using non-optional properties prevents
166this use case, and makes client usage awkward because the user would be forced
167to select a sentinel or placeholder value for any required fields at the time
168the message was created.
169
170### Default values
171
172In proto2, fields can have a default value specified that may be a value other
173than the default value for its corresponding language type (for example, a
174default value of 5 instead of 0 for an integer). When reading a field that is
175not explicitly set, the user expects to get that value. This makes Swift
176optionals (i.e., `Foo?`) unsuitable for fields in general. Unfortunately, we
177cannot implement our own “enhanced optional” type without severely complicating
178usage (Swift’s use of type inference and its lack of implicit conversions would
179require manual unwrapping of every property value).
180
181Instead, we can use **implicitly unwrapped optionals.** For example, a property
182generated for a field of type `int32` would have Swift type `Int32!`. These
183properties would behave with the following characteristics, which mirror the
184nil-resettable properties used elsewhere in Apple’s SDKs (for example,
185`UIView.tintColor`):
186
187*   Assigning a non-nil value to a property sets the field to that value.
188*   Assigning nil to a property clears the field (its internal representation is
189    nilled out).
190*   Reading the value of a property returns its value if it is set, or returns
191    its default value if it is not set. Reading a property never returns nil.
192
193The final point in the list above implies that the optional cannot be checked to
194determine if the field is set to a value other than its default: it will never
195be nil. Instead, we must provide `has*` methods for each field to allow the user
196to check this. These methods will be public in proto2. In proto3, these methods
197will be private (if generated at all), since the user can test the returned
198value against the zero value for that type.
199
200### Autocreation of nested messages
201
202For convenience, dotting into an unset field representing a nested message will
203return an instance of that message with default values. As in the Objective-C
204implementation, this does not actually cause the field to be set until the
205returned message is mutated. Fortunately, thanks to the way mutability of value
206types is implemented in Swift, the language automatically handles the
207reassignment-on-mutation for us. A static singleton instance containing default
208values can be associated with each message that can be returned when reading, so
209copies are only made by the Swift runtime when mutation occurs. For example,
210given the following proto:
211
212```protobuf
213message Node {
214  Node child = 1;
215  string value = 2 [default = "foo"];
216}
217```
218
219The following Swift code would act as commented, where setting deeply nested
220properties causes the copies and mutations to occur as the assignment statement
221is unwound:
222
223```swift
224var node = Node()
225
226let s = node.child.child.value
227// 1. node.child returns the "default Node".
228// 2. Reading .child on the result of (1) returns the same default Node.
229// 3. Reading .value on the result of (2) returns the default value "foo".
230
231node.child.child.value = "bar"
232// 4. Setting .value on the default Node causes a copy to be made and sets
233//    the property on that copy. Subsequently, the language updates the
234//    value of "node.child.child" to point to that copy.
235// 5. Updating "node.child.child" in (4) requires another copy, because
236//    "node.child" was also the instance of the default node. The copy is
237//    assigned back to "node.child".
238// 6. Setting "node.child" in (5) is a simple value reassignment, since
239//    "node" is a mutable var.
240```
241
242In other words, the generated messages do not internally have to manage parental
243relationships to backfill the appropriate properties on mutation. Swift provides
244this for free.
245
246## Scalar value fields
247
248Proto scalar value fields will map to Swift types in the following way:
249
250.proto Type | Swift Type
251----------- | -------------------
252`double`    | `Double`
253`float`     | `Float`
254`int32`     | `Int32`
255`int64`     | `Int64`
256`uint32`    | `UInt32`
257`uint64`    | `UInt64`
258`sint32`    | `Int32`
259`sint64`    | `Int64`
260`fixed32`   | `UInt32`
261`fixed64`   | `UInt64`
262`sfixed32`  | `Int32`
263`sfixed64`  | `Int64`
264`bool`      | `Bool`
265`string`    | `String`
266`bytes`     | `Foundation.NSData`
267
268The proto spec defines a number of integral types that map to the same Swift
269type; for example, `intXX`, `sintXX`, and `sfixedXX` are all signed integers,
270and `uintXX` and `fixedXX` are both unsigned integers. No other language
271implementation distinguishes these further, so we do not do so either. The
272rationale is that the various types only serve to distinguish how the value is
273**encoded on the wire**; once loaded in memory, the user is not concerned about
274these variations.
275
276Swift’s lack of implicit conversions among types will make it slightly annoying
277to use these types in a context expecting an `Int`, or vice-versa, but since
278this is a data-interchange format with explicitly-sized fields, we should not
279hide that information from the user. Users will have to explicitly write
280`Int(message.myField)`, for example.
281
282## Embedded message fields
283
284Embedded message fields can be represented using an optional variable of the
285generated message type. Thus, the message
286
287```protobuf
288message Foo {
289  Bar bar = 1;
290}
291```
292
293would be represented in Swift as
294
295```swift
296public struct Foo: ProtoMessage {
297  public var bar: Bar! {
298    get { ... }
299    set { ... }
300  }
301}
302```
303
304If the user explicitly sets `bar` to nil, or if it was never set when read from
305the wire, retrieving the value of `bar` would return a default, statically
306allocated instance of `Bar` containing default values for its fields. This
307achieves the desired behavior for default values in the same way that scalar
308fields are designed, and also allows users to deep-drill into complex object
309graphs to get or set fields without checking for nil at each step.
310
311## Enum fields
312
313The design and implementation of enum fields will differ somewhat drastically
314depending on whether the message being generated is a proto2 or proto3 message.
315
316### proto2 enums
317
318For proto2, we do not need to be concerned about unknown enum values, so we can
319use the simple raw-value enum syntax provided by Swift. So the following enum in
320proto2:
321
322```protobuf
323enum ContentType {
324  TEXT = 0;
325  IMAGE = 1;
326}
327```
328
329would become this Swift enum:
330
331```swift
332public enum ContentType: Int32, NilLiteralConvertible {
333  case text = 0
334  case image = 1
335
336  public init(nilLiteral: ()) {
337    self = .text
338  }
339}
340```
341
342See below for the discussion about `NilLiteralConvertible`.
343
344### proto3 enums
345
346For proto3, we need to be able to preserve unknown enum values that may come
347across the wire so that they can be written back if unmodified. We can
348accomplish this in Swift by using a case with an associated value for unknowns.
349So the following enum in proto3:
350
351```protobuf
352enum ContentType {
353  TEXT = 0;
354  IMAGE = 1;
355}
356```
357
358would become this Swift enum:
359
360```swift
361public enum ContentType: RawRepresentable, NilLiteralConvertible {
362  case text
363  case image
364  case UNKNOWN_VALUE(Int32)
365
366  public typealias RawValue = Int32
367
368  public init(nilLiteral: ()) {
369    self = .text
370  }
371
372  public init(rawValue: RawValue) {
373    switch rawValue {
374      case 0: self = .text
375      case 1: self = .image
376      default: self = .UNKNOWN_VALUE(rawValue)
377  }
378
379  public var rawValue: RawValue {
380    switch self {
381      case .text: return 0
382      case .image: return 1
383      case .UNKNOWN_VALUE(let value): return value
384    }
385  }
386}
387```
388
389Note that the use of a parameterized case prevents us from inheriting from the
390raw `Int32` type; Swift does not allow an enum with a raw type to have cases
391with arguments. Instead, we must implement the raw value initializer and
392computed property manually. The `UNKNOWN_VALUE` case is explicitly chosen to be
393"ugly" so that it stands out and does not conflict with other possible case
394names.
395
396Using this approach, proto3 consumers must always have a default case or handle
397the `.UNKNOWN_VALUE` case to satisfy case exhaustion in a switch statement; the
398Swift compiler considers it an error if switch statements are not exhaustive.
399
400### NilLiteralConvertible conformance
401
402This is required to clean up the usage of enum-typed properties in switch
403statements. Unlike other field types, enum properties cannot be
404implicitly-unwrapped optionals without requiring that uses in switch statements
405be explicitly unwrapped. For example, if we consider a message with the enum
406above, this usage will fail to compile:
407
408```swift
409// Without NilLiteralConvertible conformance on ContentType
410public struct SomeMessage: ProtoMessage {
411  public var contentType: ContentType! { ... }
412}
413
414// ERROR: no case named text or image
415switch someMessage.contentType {
416  case .text: { ... }
417  case .image: { ... }
418}
419```
420
421Even though our implementation guarantees that `contentType` will never be nil,
422if it is an optional type, its cases would be `some` and `none`, not the cases
423of the underlying enum type. In order to use it in this context, the user must
424write `someMessage.contentType!` in their switch statement.
425
426Making the enum itself `NilLiteralConvertible` permits us to make the property
427non-optional, so the user can still set it to nil to clear it (i.e., reset it to
428its default value), while eliminating the need to explicitly unwrap it in a
429switch statement.
430
431```swift
432// With NilLiteralConvertible conformance on ContentType
433public struct SomeMessage: ProtoMessage {
434  // Note that the property type is no longer optional
435  public var contentType: ContentType { ... }
436}
437
438// OK: Compiles and runs as expected
439switch someMessage.contentType {
440  case .text: { ... }
441  case .image: { ... }
442}
443
444// The enum can be reset to its default value this way
445someMessage.contentType = nil
446```
447
448One minor oddity with this approach is that nil will be auto-converted to the
449default value of the enum in any context, not just field assignment. In other
450words, this is valid:
451
452```swift
453func foo(contentType: ContentType) { ... }
454foo(nil) // Inside foo, contentType == .text
455```
456
457That being said, the advantage of being able to simultaneously support
458nil-resettability and switch-without-unwrapping outweighs this side effect,
459especially if appropriately documented. It is our hope that a new form of
460resettable properties will be added to Swift that eliminates this inconsistency.
461Some community members have already drafted or sent proposals for review that
462would benefit our designs:
463
464*   [SE-0030: Property Behaviors]
465    (https://github.com/apple/swift-evolution/blob/master/proposals/0030-property-behavior-decls.md)
466*   [Drafted: Resettable Properties]
467    (https://github.com/patters/swift-evolution/blob/master/proposals/0000-resettable-properties.md)
468
469### Enum aliases
470
471The `allow_alias` option in protobuf slightly complicates the use of Swift enums
472to represent that type, because raw values of cases in an enum must be unique.
473Swift lets us define static variables in an enum that alias actual cases. For
474example, the following protobuf enum:
475
476```protobuf
477enum Foo {
478  option allow_alias = true;
479  BAR = 0;
480  BAZ = 0;
481}
482```
483
484will be represented in Swift as:
485
486```swift
487public enum Foo: Int32, NilLiteralConvertible {
488  case bar = 0
489  static public let baz = bar
490
491  // ... etc.
492}
493
494// Can still use .baz shorthand to reference the alias in contexts
495// where the type is inferred
496```
497
498That is, we use the first name as the actual case and use static variables for
499the other aliases. One drawback to this approach is that the static aliases
500cannot be used as cases in a switch statement (the compiler emits the error
501*“Enum case ‘baz’ not found in type ‘Foo’”*). However, in our own code bases,
502there are only a few places where enum aliases are not mere renamings of an
503older value, but they also don’t appear to be the type of value that one would
504expect to switch on (for example, a group of named constants representing
505metrics rather than a set of options), so this restriction is not significant.
506
507This strategy also implies that changing the name of an enum and adding the old
508name as an alias below the new name will be a breaking change in the generated
509Swift code.
510
511## Oneof types
512
513The `oneof` feature represents a “variant/union” data type that maps nicely to
514Swift enums with associated values (algebraic types). These fields can also be
515accessed independently though, and, specifically in the case of proto2, it’s
516reasonable to expect access to default values when accessing a field that is not
517explicitly set.
518
519Taking all this into account, we can represent a `oneof` in Swift with two sets
520of constructs:
521
522*   Properties in the message that correspond to the `oneof` fields.
523*   A nested enum named after the `oneof` and which provides the corresponding
524    field values as case arguments.
525
526This approach fulfills the needs of proto consumers by providing a
527Swift-idiomatic way of simultaneously checking which field is set and accessing
528its value, providing individual properties to access the default values
529(important for proto2), and safely allows a field to be moved into a `oneof`
530without breaking clients.
531
532Consider the following proto:
533
534```protobuf
535message MyMessage {
536  oneof record {
537    string name = 1 [default = "unnamed"];
538    int32 id_number = 2 [default = 0];
539  }
540}
541```
542
543In Swift, we would generate an enum, a property for that enum, and properties
544for the fields themselves:
545
546```swift
547public struct MyMessage: ProtoMessage {
548  public enum Record: NilLiteralConvertible {
549    case name(String)
550    case idNumber(Int32)
551    case NOT_SET
552
553    public init(nilLiteral: ()) { self = .NOT_SET }
554  }
555
556  // This is the "Swifty" way of accessing the value
557  public var record: Record { ... }
558
559  // Direct access to the underlying fields
560  public var name: String! { ... }
561  public var idNumber: Int32! { ... }
562}
563```
564
565This makes both usage patterns possible:
566
567```swift
568// Usage 1: Case-based dispatch
569switch message.record {
570  case .name(let name):
571    // Do something with name if it was explicitly set
572  case .idNumber(let id):
573    // Do something with id_number if it was explicitly set
574  case .NOT_SET:
575    // Do something if it’s not set
576}
577
578// Usage 2: Direct access for default value fallback
579// Sets the label text to the name if it was explicitly set, or to
580// "unnamed" (the default value for the field) if id_number was set
581// instead
582let myLabel = UILabel()
583myLabel.text = message.name
584```
585
586As with proto enums, the generated `oneof` enum conforms to
587`NilLiteralConvertible` to avoid switch statement issues. Setting the property
588to nil will clear it (i.e., reset it to `NOT_SET`).
589
590## Unknown Fields (proto2 only)
591
592To be written.
593
594## Extensions (proto2 only)
595
596To be written.
597
598## Reflection and Descriptors
599
600We will not include reflection or descriptors in the first version of the Swift
601library. The use cases for reflection on mobile are not as strong and the static
602data to represent the descriptors would add bloat when we wish to keep the code
603size small.
604
605In the future, we will investigate whether they can be included as extensions
606which might be able to be excluded from a build and/or automatically dead
607stripped by the compiler if they are not used.
608
609## Appendix A: Rejected strategies to handle packages
610
611### Each package is its own Swift module
612
613Each proto package could be declared as its own Swift module, replacing dots
614with underscores (e.g., package `foo.bar` becomes module `Foo_Bar`). Then, users
615would simply import modules containing whatever proto modules they want to use
616and refer to the generated types by their short names.
617
618**This solution is simply not possible, however.** Swift modules cannot
619circularly reference each other, but there is no restriction against proto
620packages doing so. Circular imports are forbidden (e.g., `foo.proto` importing
621`bar.proto` importing `foo.proto`), but nothing prevents package `foo` from
622using a type in package `bar` which uses a different type in package `foo`, as
623long as there is no import cycle. If these packages were generated as Swift
624modules, then `Foo` would contain an `import Bar` statement and `Bar` would
625contain an `import Foo` statement, and there is no way to compile this.
626
627### Ad hoc namespacing with structs
628
629We can “fake” namespaces in Swift by declaring empty structs with private
630initializers. Since modules are constructed based on compiler arguments, not by
631syntactic constructs, and because there is no pure Swift way to define
632submodules (even though Clang module maps support this), there is no
633source-drive way to group generated code into namespaces aside from this
634approach.
635
636Types can be added to those intermediate package structs using Swift extensions.
637For example, a message `Baz` in package `foo.bar` could be represented in Swift
638as follows:
639
640```swift
641public struct Foo {
642  private init() {}
643}
644
645public extension Foo {
646  public struct Bar {
647    private init() {}
648  }
649}
650
651public extension Foo.Bar {
652  public struct Baz {
653    // Message fields and other methods
654  }
655}
656
657let baz = Foo.Bar.Baz()
658```
659
660Each of these constructs would actually be defined in a separate file; Swift
661lets us keep them separate and add multiple structs to a single “namespace”
662through extensions.
663
664Unfortunately, these intermediate structs generate symbols of their own
665(metatype information in the data segment). This becomes problematic if multiple
666build targets contain Swift sources generated from different messages in the
667same package. At link time, these symbols would collide, resulting in multiple
668definition errors.
669
670This approach also has the disadvantage that there is no automatic “short” way
671to refer to the generated messages at the deepest nesting levels; since this use
672of structs is a hack around the lack of namespaces, there is no equivalent to
673import (Java) or using (C++) to simplify this. Users would have to declare type
674aliases to make this cleaner, or we would have to generate them for users.
675