1# Protocol Buffers in Swift 2 3## Objective 4 5This document describes the user-facing API and internal implementation of 6proto2 and proto3 messages in Apple’s Swift programming language. 7 8One of the key goals of protobufs is to provide idiomatic APIs for each 9language. In that vein, **interoperability with Objective-C is a non-goal of 10this proposal.** Protobuf users who need to pass messages between Objective-C 11and Swift code in the same application should use the existing Objective-C proto 12library. The goal of the effort described here is to provide an API for protobuf 13messages that uses features specific to Swift—optional types, algebraic 14enumerated types, value types, and so forth—in a natural way that will delight, 15rather than surprise, users of the language. 16 17## Naming 18 19* By convention, both typical protobuf message names and Swift structs/classes 20 are `UpperCamelCase`, so for most messages, the name of a message can be the 21 same as the name of its generated type. (However, see the discussion below 22 about prefixes under [Packages](#packages).) 23 24* Enum cases in protobufs typically are `UPPERCASE_WITH_UNDERSCORES`, whereas 25 in Swift they are `lowerCamelCase` (as of the Swift 3 API design 26 guidelines). We will transform the names to match Swift convention, using 27 a whitelist similar to the Objective-C compiler plugin to handle commonly 28 used acronyms. 29 30* Typical fields in proto messages are `lowercase_with_underscores`, while in 31 Swift they are `lowerCamelCase`. We will transform the names to match 32 Swift convention by removing the underscores and uppercasing the subsequent 33 letter. 34 35## Swift reserved words 36 37Swift has a large set of reserved words—some always reserved and some 38contextually reserved (that is, they can be used as identifiers in contexts 39where they would not be confused). As of Swift 2.2, the set of always-reserved 40words is: 41 42``` 43_, #available, #column, #else, #elseif, #endif, #file, #function, #if, #line, 44#selector, as, associatedtype, break, case, catch, class, continue, default, 45defer, deinit, do, dynamicType, else, enum, extension, fallthrough, false, for, 46func, guard, if, import, in, init, inout, internal, is, let, nil, operator, 47private, protocol, public, repeat, rethrows, return, self, Self, static, 48struct, subscript, super, switch, throw, throws, true, try, typealias, var, 49where, while 50``` 51 52The set of contextually reserved words is: 53 54``` 55associativity, convenience, dynamic, didSet, final, get, infix, indirect, 56lazy, left, mutating, none, nonmutating, optional, override, postfix, 57precedence, prefix, Protocol, required, right, set, Type, unowned, weak, 58willSet 59``` 60 61It is possible to use any reserved word as an identifier by escaping it with 62backticks (for example, ``let `class` = 5``). Other name-mangling schemes would 63require us to transform the names themselves (for example, by appending an 64underscore), which requires us to then ensure that the new name does not collide 65with something else in the same namespace. 66 67While the backtick feature may not be widely known by all Swift developers, a 68small amount of user education can address this and it seems like the best 69approach. We can unconditionally surround all property names with backticks to 70simplify generation. 71 72Some remapping will still be required, though, to avoid collisions between 73generated properties and the names of methods and properties defined in the base 74protocol/implementation of messages. 75 76# Features of Protocol Buffers 77 78This section describes how the features of the protocol buffer syntaxes (proto2 79and proto3) map to features in Swift—what the code generated from a proto will 80look like, and how it will be implemented in the underlying library. 81 82## Packages 83 84Modules are the main form of namespacing in Swift, but they are not declared 85using syntactic constructs like namespaces in C++ or packages in Java. Instead, 86they are tied to build targets in Xcode (or, in the future with open-source 87Swift, declarations in a Swift Package Manager manifest). They also do not 88easily support nesting submodules (Clang module maps support this, but pure 89Swift does not yet provide a way to define submodules). 90 91We will generate types with fully-qualified underscore-delimited names. For 92example, a message `Baz` in package `foo.bar` would generate a struct named 93`Foo_Bar_Baz`. For each fully-qualified proto message, there will be exactly one 94unique type symbol emitted in the generated binary. 95 96Users are likely to balk at the ugliness of underscore-delimited names for every 97generated type. To improve upon this situation, we will add a new string file 98level option, `swift_package_typealias`, that can be added to `.proto` files. 99When present, this will cause `typealias`es to be added to the generated Swift 100messages that replace the package name prefix with the provided string. For 101example, the following `.proto` file: 102 103```protobuf 104option swift_package_typealias = "FBP"; 105package foo.bar; 106 107message Baz { 108 // Message fields 109} 110``` 111 112would generate the following Swift source: 113 114```swift 115public struct Foo_Bar_Baz { 116 // Message fields and other methods 117} 118 119typealias FBPBaz = Foo_Bar_Baz 120``` 121 122It should be noted that this type alias is recorded in the generated 123`.swiftmodule` so that code importing the module can refer to it, but it does 124not cause a new symbol to be generated in the compiled binary (i.e., we do not 125risk compiled size bloat by adding `typealias`es for every type). 126 127Other strategies to handle packages that were considered and rejected can be 128found in [Appendix A](#appendix-a-rejected-strategies-to-handle-packages). 129 130## Messages 131 132Proto messages are natural value types and we will generate messages as structs 133instead of classes. Users will benefit from Swift’s built-in behavior with 134regard to mutability. We will define a `ProtoMessage` protocol that defines the 135common methods and properties for all messages (such as serialization) and also 136lets users treat messages polymorphically. Any shared method implementations 137that do not differ between individual messages can be implemented in a protocol 138extension. 139 140The backing storage itself for fields of a message will be managed by a 141`ProtoFieldStorage` type that uses an internal dictionary keyed by field number, 142and whose values are the value of the field with that number (up-cast to Swift’s 143`Any` type). This class will provide type-safe getters and setters so that 144generated messages can manipulate this storage, and core serialization logic 145will live here as well. Furthermore, factoring the storage out into a separate 146type, rather than inlining the fields as stored properties in the message 147itself, lets us implement copy-on-write efficiently to support passing around 148large messages. (Furthermore, because the messages themselves are value types, 149inlining fields is not possible if the fields are submessages of the same type, 150or a type that eventually includes a submessage of the same type.) 151 152### Required fields (proto2 only) 153 154Required fields in proto2 messages seem like they could be naturally represented 155by non-optional properties in Swift, but this presents some problems/concerns. 156 157Serialization APIs permit partial serialization, which allows required fields to 158remain unset. Furthermore, other language APIs still provide `has*` and `clear*` 159methods for required fields, and knowing whether a property has a value when the 160message is in memory is still useful. 161 162For example, an e-mail draft message may have the “to” address required on the 163wire, but when the user constructs it in memory, it doesn’t make sense to force 164a value until they provide one. We only want to force a value to be present when 165the message is serialized to the wire. Using non-optional properties prevents 166this use case, and makes client usage awkward because the user would be forced 167to select a sentinel or placeholder value for any required fields at the time 168the message was created. 169 170### Default values 171 172In proto2, fields can have a default value specified that may be a value other 173than the default value for its corresponding language type (for example, a 174default value of 5 instead of 0 for an integer). When reading a field that is 175not explicitly set, the user expects to get that value. This makes Swift 176optionals (i.e., `Foo?`) unsuitable for fields in general. Unfortunately, we 177cannot implement our own “enhanced optional” type without severely complicating 178usage (Swift’s use of type inference and its lack of implicit conversions would 179require manual unwrapping of every property value). 180 181Instead, we can use **implicitly unwrapped optionals.** For example, a property 182generated for a field of type `int32` would have Swift type `Int32!`. These 183properties would behave with the following characteristics, which mirror the 184nil-resettable properties used elsewhere in Apple’s SDKs (for example, 185`UIView.tintColor`): 186 187* Assigning a non-nil value to a property sets the field to that value. 188* Assigning nil to a property clears the field (its internal representation is 189 nilled out). 190* Reading the value of a property returns its value if it is set, or returns 191 its default value if it is not set. Reading a property never returns nil. 192 193The final point in the list above implies that the optional cannot be checked to 194determine if the field is set to a value other than its default: it will never 195be nil. Instead, we must provide `has*` methods for each field to allow the user 196to check this. These methods will be public in proto2. In proto3, these methods 197will be private (if generated at all), since the user can test the returned 198value against the zero value for that type. 199 200### Autocreation of nested messages 201 202For convenience, dotting into an unset field representing a nested message will 203return an instance of that message with default values. As in the Objective-C 204implementation, this does not actually cause the field to be set until the 205returned message is mutated. Fortunately, thanks to the way mutability of value 206types is implemented in Swift, the language automatically handles the 207reassignment-on-mutation for us. A static singleton instance containing default 208values can be associated with each message that can be returned when reading, so 209copies are only made by the Swift runtime when mutation occurs. For example, 210given the following proto: 211 212```protobuf 213message Node { 214 Node child = 1; 215 string value = 2 [default = "foo"]; 216} 217``` 218 219The following Swift code would act as commented, where setting deeply nested 220properties causes the copies and mutations to occur as the assignment statement 221is unwound: 222 223```swift 224var node = Node() 225 226let s = node.child.child.value 227// 1. node.child returns the "default Node". 228// 2. Reading .child on the result of (1) returns the same default Node. 229// 3. Reading .value on the result of (2) returns the default value "foo". 230 231node.child.child.value = "bar" 232// 4. Setting .value on the default Node causes a copy to be made and sets 233// the property on that copy. Subsequently, the language updates the 234// value of "node.child.child" to point to that copy. 235// 5. Updating "node.child.child" in (4) requires another copy, because 236// "node.child" was also the instance of the default node. The copy is 237// assigned back to "node.child". 238// 6. Setting "node.child" in (5) is a simple value reassignment, since 239// "node" is a mutable var. 240``` 241 242In other words, the generated messages do not internally have to manage parental 243relationships to backfill the appropriate properties on mutation. Swift provides 244this for free. 245 246## Scalar value fields 247 248Proto scalar value fields will map to Swift types in the following way: 249 250.proto Type | Swift Type 251----------- | ------------------- 252`double` | `Double` 253`float` | `Float` 254`int32` | `Int32` 255`int64` | `Int64` 256`uint32` | `UInt32` 257`uint64` | `UInt64` 258`sint32` | `Int32` 259`sint64` | `Int64` 260`fixed32` | `UInt32` 261`fixed64` | `UInt64` 262`sfixed32` | `Int32` 263`sfixed64` | `Int64` 264`bool` | `Bool` 265`string` | `String` 266`bytes` | `Foundation.NSData` 267 268The proto spec defines a number of integral types that map to the same Swift 269type; for example, `intXX`, `sintXX`, and `sfixedXX` are all signed integers, 270and `uintXX` and `fixedXX` are both unsigned integers. No other language 271implementation distinguishes these further, so we do not do so either. The 272rationale is that the various types only serve to distinguish how the value is 273**encoded on the wire**; once loaded in memory, the user is not concerned about 274these variations. 275 276Swift’s lack of implicit conversions among types will make it slightly annoying 277to use these types in a context expecting an `Int`, or vice-versa, but since 278this is a data-interchange format with explicitly-sized fields, we should not 279hide that information from the user. Users will have to explicitly write 280`Int(message.myField)`, for example. 281 282## Embedded message fields 283 284Embedded message fields can be represented using an optional variable of the 285generated message type. Thus, the message 286 287```protobuf 288message Foo { 289 Bar bar = 1; 290} 291``` 292 293would be represented in Swift as 294 295```swift 296public struct Foo: ProtoMessage { 297 public var bar: Bar! { 298 get { ... } 299 set { ... } 300 } 301} 302``` 303 304If the user explicitly sets `bar` to nil, or if it was never set when read from 305the wire, retrieving the value of `bar` would return a default, statically 306allocated instance of `Bar` containing default values for its fields. This 307achieves the desired behavior for default values in the same way that scalar 308fields are designed, and also allows users to deep-drill into complex object 309graphs to get or set fields without checking for nil at each step. 310 311## Enum fields 312 313The design and implementation of enum fields will differ somewhat drastically 314depending on whether the message being generated is a proto2 or proto3 message. 315 316### proto2 enums 317 318For proto2, we do not need to be concerned about unknown enum values, so we can 319use the simple raw-value enum syntax provided by Swift. So the following enum in 320proto2: 321 322```protobuf 323enum ContentType { 324 TEXT = 0; 325 IMAGE = 1; 326} 327``` 328 329would become this Swift enum: 330 331```swift 332public enum ContentType: Int32, NilLiteralConvertible { 333 case text = 0 334 case image = 1 335 336 public init(nilLiteral: ()) { 337 self = .text 338 } 339} 340``` 341 342See below for the discussion about `NilLiteralConvertible`. 343 344### proto3 enums 345 346For proto3, we need to be able to preserve unknown enum values that may come 347across the wire so that they can be written back if unmodified. We can 348accomplish this in Swift by using a case with an associated value for unknowns. 349So the following enum in proto3: 350 351```protobuf 352enum ContentType { 353 TEXT = 0; 354 IMAGE = 1; 355} 356``` 357 358would become this Swift enum: 359 360```swift 361public enum ContentType: RawRepresentable, NilLiteralConvertible { 362 case text 363 case image 364 case UNKNOWN_VALUE(Int32) 365 366 public typealias RawValue = Int32 367 368 public init(nilLiteral: ()) { 369 self = .text 370 } 371 372 public init(rawValue: RawValue) { 373 switch rawValue { 374 case 0: self = .text 375 case 1: self = .image 376 default: self = .UNKNOWN_VALUE(rawValue) 377 } 378 379 public var rawValue: RawValue { 380 switch self { 381 case .text: return 0 382 case .image: return 1 383 case .UNKNOWN_VALUE(let value): return value 384 } 385 } 386} 387``` 388 389Note that the use of a parameterized case prevents us from inheriting from the 390raw `Int32` type; Swift does not allow an enum with a raw type to have cases 391with arguments. Instead, we must implement the raw value initializer and 392computed property manually. The `UNKNOWN_VALUE` case is explicitly chosen to be 393"ugly" so that it stands out and does not conflict with other possible case 394names. 395 396Using this approach, proto3 consumers must always have a default case or handle 397the `.UNKNOWN_VALUE` case to satisfy case exhaustion in a switch statement; the 398Swift compiler considers it an error if switch statements are not exhaustive. 399 400### NilLiteralConvertible conformance 401 402This is required to clean up the usage of enum-typed properties in switch 403statements. Unlike other field types, enum properties cannot be 404implicitly-unwrapped optionals without requiring that uses in switch statements 405be explicitly unwrapped. For example, if we consider a message with the enum 406above, this usage will fail to compile: 407 408```swift 409// Without NilLiteralConvertible conformance on ContentType 410public struct SomeMessage: ProtoMessage { 411 public var contentType: ContentType! { ... } 412} 413 414// ERROR: no case named text or image 415switch someMessage.contentType { 416 case .text: { ... } 417 case .image: { ... } 418} 419``` 420 421Even though our implementation guarantees that `contentType` will never be nil, 422if it is an optional type, its cases would be `some` and `none`, not the cases 423of the underlying enum type. In order to use it in this context, the user must 424write `someMessage.contentType!` in their switch statement. 425 426Making the enum itself `NilLiteralConvertible` permits us to make the property 427non-optional, so the user can still set it to nil to clear it (i.e., reset it to 428its default value), while eliminating the need to explicitly unwrap it in a 429switch statement. 430 431```swift 432// With NilLiteralConvertible conformance on ContentType 433public struct SomeMessage: ProtoMessage { 434 // Note that the property type is no longer optional 435 public var contentType: ContentType { ... } 436} 437 438// OK: Compiles and runs as expected 439switch someMessage.contentType { 440 case .text: { ... } 441 case .image: { ... } 442} 443 444// The enum can be reset to its default value this way 445someMessage.contentType = nil 446``` 447 448One minor oddity with this approach is that nil will be auto-converted to the 449default value of the enum in any context, not just field assignment. In other 450words, this is valid: 451 452```swift 453func foo(contentType: ContentType) { ... } 454foo(nil) // Inside foo, contentType == .text 455``` 456 457That being said, the advantage of being able to simultaneously support 458nil-resettability and switch-without-unwrapping outweighs this side effect, 459especially if appropriately documented. It is our hope that a new form of 460resettable properties will be added to Swift that eliminates this inconsistency. 461Some community members have already drafted or sent proposals for review that 462would benefit our designs: 463 464* [SE-0030: Property Behaviors] 465 (https://github.com/apple/swift-evolution/blob/master/proposals/0030-property-behavior-decls.md) 466* [Drafted: Resettable Properties] 467 (https://github.com/patters/swift-evolution/blob/master/proposals/0000-resettable-properties.md) 468 469### Enum aliases 470 471The `allow_alias` option in protobuf slightly complicates the use of Swift enums 472to represent that type, because raw values of cases in an enum must be unique. 473Swift lets us define static variables in an enum that alias actual cases. For 474example, the following protobuf enum: 475 476```protobuf 477enum Foo { 478 option allow_alias = true; 479 BAR = 0; 480 BAZ = 0; 481} 482``` 483 484will be represented in Swift as: 485 486```swift 487public enum Foo: Int32, NilLiteralConvertible { 488 case bar = 0 489 static public let baz = bar 490 491 // ... etc. 492} 493 494// Can still use .baz shorthand to reference the alias in contexts 495// where the type is inferred 496``` 497 498That is, we use the first name as the actual case and use static variables for 499the other aliases. One drawback to this approach is that the static aliases 500cannot be used as cases in a switch statement (the compiler emits the error 501*“Enum case ‘baz’ not found in type ‘Foo’”*). However, in our own code bases, 502there are only a few places where enum aliases are not mere renamings of an 503older value, but they also don’t appear to be the type of value that one would 504expect to switch on (for example, a group of named constants representing 505metrics rather than a set of options), so this restriction is not significant. 506 507This strategy also implies that changing the name of an enum and adding the old 508name as an alias below the new name will be a breaking change in the generated 509Swift code. 510 511## Oneof types 512 513The `oneof` feature represents a “variant/union” data type that maps nicely to 514Swift enums with associated values (algebraic types). These fields can also be 515accessed independently though, and, specifically in the case of proto2, it’s 516reasonable to expect access to default values when accessing a field that is not 517explicitly set. 518 519Taking all this into account, we can represent a `oneof` in Swift with two sets 520of constructs: 521 522* Properties in the message that correspond to the `oneof` fields. 523* A nested enum named after the `oneof` and which provides the corresponding 524 field values as case arguments. 525 526This approach fulfills the needs of proto consumers by providing a 527Swift-idiomatic way of simultaneously checking which field is set and accessing 528its value, providing individual properties to access the default values 529(important for proto2), and safely allows a field to be moved into a `oneof` 530without breaking clients. 531 532Consider the following proto: 533 534```protobuf 535message MyMessage { 536 oneof record { 537 string name = 1 [default = "unnamed"]; 538 int32 id_number = 2 [default = 0]; 539 } 540} 541``` 542 543In Swift, we would generate an enum, a property for that enum, and properties 544for the fields themselves: 545 546```swift 547public struct MyMessage: ProtoMessage { 548 public enum Record: NilLiteralConvertible { 549 case name(String) 550 case idNumber(Int32) 551 case NOT_SET 552 553 public init(nilLiteral: ()) { self = .NOT_SET } 554 } 555 556 // This is the "Swifty" way of accessing the value 557 public var record: Record { ... } 558 559 // Direct access to the underlying fields 560 public var name: String! { ... } 561 public var idNumber: Int32! { ... } 562} 563``` 564 565This makes both usage patterns possible: 566 567```swift 568// Usage 1: Case-based dispatch 569switch message.record { 570 case .name(let name): 571 // Do something with name if it was explicitly set 572 case .idNumber(let id): 573 // Do something with id_number if it was explicitly set 574 case .NOT_SET: 575 // Do something if it’s not set 576} 577 578// Usage 2: Direct access for default value fallback 579// Sets the label text to the name if it was explicitly set, or to 580// "unnamed" (the default value for the field) if id_number was set 581// instead 582let myLabel = UILabel() 583myLabel.text = message.name 584``` 585 586As with proto enums, the generated `oneof` enum conforms to 587`NilLiteralConvertible` to avoid switch statement issues. Setting the property 588to nil will clear it (i.e., reset it to `NOT_SET`). 589 590## Unknown Fields (proto2 only) 591 592To be written. 593 594## Extensions (proto2 only) 595 596To be written. 597 598## Reflection and Descriptors 599 600We will not include reflection or descriptors in the first version of the Swift 601library. The use cases for reflection on mobile are not as strong and the static 602data to represent the descriptors would add bloat when we wish to keep the code 603size small. 604 605In the future, we will investigate whether they can be included as extensions 606which might be able to be excluded from a build and/or automatically dead 607stripped by the compiler if they are not used. 608 609## Appendix A: Rejected strategies to handle packages 610 611### Each package is its own Swift module 612 613Each proto package could be declared as its own Swift module, replacing dots 614with underscores (e.g., package `foo.bar` becomes module `Foo_Bar`). Then, users 615would simply import modules containing whatever proto modules they want to use 616and refer to the generated types by their short names. 617 618**This solution is simply not possible, however.** Swift modules cannot 619circularly reference each other, but there is no restriction against proto 620packages doing so. Circular imports are forbidden (e.g., `foo.proto` importing 621`bar.proto` importing `foo.proto`), but nothing prevents package `foo` from 622using a type in package `bar` which uses a different type in package `foo`, as 623long as there is no import cycle. If these packages were generated as Swift 624modules, then `Foo` would contain an `import Bar` statement and `Bar` would 625contain an `import Foo` statement, and there is no way to compile this. 626 627### Ad hoc namespacing with structs 628 629We can “fake” namespaces in Swift by declaring empty structs with private 630initializers. Since modules are constructed based on compiler arguments, not by 631syntactic constructs, and because there is no pure Swift way to define 632submodules (even though Clang module maps support this), there is no 633source-drive way to group generated code into namespaces aside from this 634approach. 635 636Types can be added to those intermediate package structs using Swift extensions. 637For example, a message `Baz` in package `foo.bar` could be represented in Swift 638as follows: 639 640```swift 641public struct Foo { 642 private init() {} 643} 644 645public extension Foo { 646 public struct Bar { 647 private init() {} 648 } 649} 650 651public extension Foo.Bar { 652 public struct Baz { 653 // Message fields and other methods 654 } 655} 656 657let baz = Foo.Bar.Baz() 658``` 659 660Each of these constructs would actually be defined in a separate file; Swift 661lets us keep them separate and add multiple structs to a single “namespace” 662through extensions. 663 664Unfortunately, these intermediate structs generate symbols of their own 665(metatype information in the data segment). This becomes problematic if multiple 666build targets contain Swift sources generated from different messages in the 667same package. At link time, these symbols would collide, resulting in multiple 668definition errors. 669 670This approach also has the disadvantage that there is no automatic “short” way 671to refer to the generated messages at the deepest nesting levels; since this use 672of structs is a hack around the lack of namespaces, there is no equivalent to 673import (Java) or using (C++) to simplify this. Users would have to declare type 674aliases to make this cleaner, or we would have to generate them for users. 675