1# Tutorial 2 3This tutorial introduces the basics of the Document Object Model(DOM) API. 4 5As shown in [Usage at a glance](@ref index), a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON. 6 7[TOC] 8 9# Value & Document {#ValueDocument} 10 11Each JSON value is stored in a type called `Value`. A `Document`, representing the DOM, contains the root `Value` of the DOM tree. All public types and functions of RapidJSON are defined in the `rapidjson` namespace. 12 13# Query Value {#QueryValue} 14 15In this section, we will use excerpt of `example/tutorial/tutorial.cpp`. 16 17Assumes we have a JSON stored in a C string (`const char* json`): 18~~~~~~~~~~js 19{ 20 "hello": "world", 21 "t": true , 22 "f": false, 23 "n": null, 24 "i": 123, 25 "pi": 3.1416, 26 "a": [1, 2, 3, 4] 27} 28~~~~~~~~~~ 29 30Parse it into a `Document`: 31~~~~~~~~~~cpp 32#include "rapidjson/document.h" 33 34using namespace rapidjson; 35 36// ... 37Document document; 38document.Parse(json); 39~~~~~~~~~~ 40 41The JSON is now parsed into `document` as a *DOM tree*: 42 43![DOM in the tutorial](diagram/tutorial.png) 44 45Since the update to RFC 7159, the root of a conforming JSON document can be any JSON value. In earlier RFC 4627, only objects or arrays were allowed as root values. In this case, the root is an object. 46~~~~~~~~~~cpp 47assert(document.IsObject()); 48~~~~~~~~~~ 49 50Let's query whether a `"hello"` member exists in the root object. Since a `Value` can contain different types of value, we may need to verify its type and use suitable API to obtain the value. In this example, `"hello"` member associates with a JSON string. 51~~~~~~~~~~cpp 52assert(document.HasMember("hello")); 53assert(document["hello"].IsString()); 54printf("hello = %s\n", document["hello"].GetString()); 55~~~~~~~~~~ 56 57~~~~~~~~~~ 58world 59~~~~~~~~~~ 60 61JSON true/false values are represented as `bool`. 62~~~~~~~~~~cpp 63assert(document["t"].IsBool()); 64printf("t = %s\n", document["t"].GetBool() ? "true" : "false"); 65~~~~~~~~~~ 66 67~~~~~~~~~~ 68true 69~~~~~~~~~~ 70 71JSON null can be queryed by `IsNull()`. 72~~~~~~~~~~cpp 73printf("n = %s\n", document["n"].IsNull() ? "null" : "?"); 74~~~~~~~~~~ 75 76~~~~~~~~~~ 77null 78~~~~~~~~~~ 79 80JSON number type represents all numeric values. However, C++ needs more specific type for manipulation. 81 82~~~~~~~~~~cpp 83assert(document["i"].IsNumber()); 84 85// In this case, IsUint()/IsInt64()/IsUInt64() also return true. 86assert(document["i"].IsInt()); 87printf("i = %d\n", document["i"].GetInt()); 88// Alternative (int)document["i"] 89 90assert(document["pi"].IsNumber()); 91assert(document["pi"].IsDouble()); 92printf("pi = %g\n", document["pi"].GetDouble()); 93~~~~~~~~~~ 94 95~~~~~~~~~~ 96i = 123 97pi = 3.1416 98~~~~~~~~~~ 99 100JSON array contains a number of elements. 101~~~~~~~~~~cpp 102// Using a reference for consecutive access is handy and faster. 103const Value& a = document["a"]; 104assert(a.IsArray()); 105for (SizeType i = 0; i < a.Size(); i++) // Uses SizeType instead of size_t 106 printf("a[%d] = %d\n", i, a[i].GetInt()); 107~~~~~~~~~~ 108 109~~~~~~~~~~ 110a[0] = 1 111a[1] = 2 112a[2] = 3 113a[3] = 4 114~~~~~~~~~~ 115 116Note that, RapidJSON does not automatically convert values between JSON types. If a value is a string, it is invalid to call `GetInt()`, for example. In debug mode it will fail an assertion. In release mode, the behavior is undefined. 117 118In the following, details about querying individual types are discussed. 119 120## Query Array {#QueryArray} 121 122By default, `SizeType` is typedef of `unsigned`. In most systems, array is limited to store up to 2^32-1 elements. 123 124You may access the elements in array by integer literal, for example, `a[0]`, `a[1]`, `a[2]`. 125 126Array is similar to `std::vector`, instead of using indices, you may also use iterator to access all the elements. 127~~~~~~~~~~cpp 128for (Value::ConstValueIterator itr = a.Begin(); itr != a.End(); ++itr) 129 printf("%d ", itr->GetInt()); 130~~~~~~~~~~ 131 132And other familiar query functions: 133* `SizeType Capacity() const` 134* `bool Empty() const` 135 136## Query Object {#QueryObject} 137 138Similar to array, we can access all object members by iterator: 139 140~~~~~~~~~~cpp 141static const char* kTypeNames[] = 142 { "Null", "False", "True", "Object", "Array", "String", "Number" }; 143 144for (Value::ConstMemberIterator itr = document.MemberBegin(); 145 itr != document.MemberEnd(); ++itr) 146{ 147 printf("Type of member %s is %s\n", 148 itr->name.GetString(), kTypeNames[itr->value.GetType()]); 149} 150~~~~~~~~~~ 151 152~~~~~~~~~~ 153Type of member hello is String 154Type of member t is True 155Type of member f is False 156Type of member n is Null 157Type of member i is Number 158Type of member pi is Number 159Type of member a is Array 160~~~~~~~~~~ 161 162Note that, when `operator[](const char*)` cannot find the member, it will fail an assertion. 163 164If we are unsure whether a member exists, we need to call `HasMember()` before calling `operator[](const char*)`. However, this incurs two lookup. A better way is to call `FindMember()`, which can check the existence of member and obtain its value at once: 165 166~~~~~~~~~~cpp 167Value::ConstMemberIterator itr = document.FindMember("hello"); 168if (itr != document.MemberEnd()) 169 printf("%s %s\n", itr->value.GetString()); 170~~~~~~~~~~ 171 172## Querying Number {#QueryNumber} 173 174JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser. 175 176As C++ provides several integer and floating point number types, the DOM tries to handle these with widest possible range and good performance. 177 178When a Number is parsed, it is stored in the DOM as either one of the following type: 179 180Type | Description 181-----------|--------------------------------------- 182`unsigned` | 32-bit unsigned integer 183`int` | 32-bit signed integer 184`uint64_t` | 64-bit unsigned integer 185`int64_t` | 64-bit signed integer 186`double` | 64-bit double precision floating point 187 188When querying a number, you can check whether the number can be obtained as target type: 189 190Checking | Obtaining 191------------------|--------------------- 192`bool IsNumber()` | N/A 193`bool IsUint()` | `unsigned GetUint()` 194`bool IsInt()` | `int GetInt()` 195`bool IsUint64()` | `uint64_t GetUint64()` 196`bool IsInt64()` | `int64_t GetInt64()` 197`bool IsDouble()` | `double GetDouble()` 198 199Note that, an integer value may be obtained in various ways without conversion. For example, A value `x` containing 123 will make `x.IsInt() == x.IsUint() == x.IsInt64() == x.IsUint64() == true`. But a value `y` containing -3000000000 will only makes `x.IsInt64() == true`. 200 201When obtaining the numeric values, `GetDouble()` will convert internal integer representation to a `double`. Note that, `int` and `unsigned` can be safely convert to `double`, but `int64_t` and `uint64_t` may lose precision (since mantissa of `double` is only 52-bits). 202 203## Query String {#QueryString} 204 205In addition to `GetString()`, the `Value` class also contains `GetStringLength()`. Here explains why. 206 207According to RFC 4627, JSON strings can contain Unicode character `U+0000`, which must be escaped as `"\u0000"`. The problem is that, C/C++ often uses null-terminated string, which treats ``\0'` as the terminator symbol. 208 209To conform RFC 4627, RapidJSON supports string containing `U+0000`. If you need to handle this, you can use `GetStringLength()` API to obtain the correct length of string. 210 211For example, after parsing a the following JSON to `Document d`: 212 213~~~~~~~~~~js 214{ "s" : "a\u0000b" } 215~~~~~~~~~~ 216The correct length of the value `"a\u0000b"` is 3. But `strlen()` returns 1. 217 218`GetStringLength()` can also improve performance, as user may often need to call `strlen()` for allocating buffer. 219 220Besides, `std::string` also support a constructor: 221 222~~~~~~~~~~cpp 223string(const char* s, size_t count); 224~~~~~~~~~~ 225 226which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance. 227 228## Comparing values 229 230You can use `==` and `!=` to compare values. Two values are equal if and only if they are have same type and contents. You can also compare values with primitive types. Here is an example. 231 232~~~~~~~~~~cpp 233if (document["hello"] == document["n"]) /*...*/; // Compare values 234if (document["hello"] == "world") /*...*/; // Compare value with literal string 235if (document["i"] != 123) /*...*/; // Compare with integers 236if (document["pi"] != 3.14) /*...*/; // Compare with double. 237~~~~~~~~~~ 238 239Array/object compares their elements/members in order. They are equal if and only if their whole subtrees are equal. 240 241Note that, currently if an object contains duplicated named member, comparing equality with any object is always `false`. 242 243# Create/Modify Values {#CreateModifyValues} 244 245There are several ways to create values. After a DOM tree is created and/or modified, it can be saved as JSON again using `Writer`. 246 247## Change Value Type {#ChangeValueType} 248When creating a Value or Document by default constructor, its type is Null. To change its type, call `SetXXX()` or assignment operator, for example: 249 250~~~~~~~~~~cpp 251Document d; // Null 252d.SetObject(); 253 254Value v; // Null 255v.SetInt(10); 256v = 10; // Shortcut, same as above 257~~~~~~~~~~ 258 259### Overloaded Constructors 260There are also overloaded constructors for several types: 261 262~~~~~~~~~~cpp 263Value b(true); // calls Value(bool) 264Value i(-123); // calls Value(int) 265Value u(123u); // calls Value(unsigned) 266Value d(1.5); // calls Value(double) 267~~~~~~~~~~ 268 269To create empty object or array, you may use `SetObject()`/`SetArray()` after default constructor, or using the `Value(Type)` in one shot: 270 271~~~~~~~~~~cpp 272Value o(kObjectType); 273Value a(kArrayType); 274~~~~~~~~~~ 275 276## Move Semantics {#MoveSemantics} 277 278A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example, 279 280~~~~~~~~~~cpp 281Value a(123); 282Value b(456); 283b = a; // a becomes a Null value, b becomes number 123. 284~~~~~~~~~~ 285 286![Assignment with move semantics.](diagram/move1.png) 287 288Why? What is the advantage of this semantics? 289 290The simple answer is performance. For fixed size JSON types (Number, True, False, Null), copying them is fast and easy. However, For variable size JSON types (String, Array, Object), copying them will incur a lot of overheads. And these overheads are often unnoticed. Especially when we need to create temporary object, copy it to another variable, and then destruct it. 291 292For example, if normal *copy* semantics was used: 293 294~~~~~~~~~~cpp 295Document d; 296Value o(kObjectType); 297{ 298 Value contacts(kArrayType); 299 // adding elements to contacts array. 300 // ... 301 o.AddMember("contacts", contacts, d.GetAllocator()); // deep clone contacts (may be with lots of allocations) 302 // destruct contacts. 303} 304~~~~~~~~~~ 305 306![Copy semantics makes a lots of copy operations.](diagram/move2.png) 307 308The object `o` needs to allocate a buffer of same size as contacts, makes a deep clone of it, and then finally contacts is destructed. This will incur a lot of unnecessary allocations/deallocations and memory copying. 309 310There are solutions to prevent actual copying these data, such as reference counting and garbage collection(GC). 311 312To make RapidJSON simple and fast, we chose to use *move* semantics for assignment. It is similar to `std::auto_ptr` which transfer ownership during assignment. Move is much faster and simpler, it just destructs the original value, `memcpy()` the source to destination, and finally sets the source as Null type. 313 314So, with move semantics, the above example becomes: 315 316~~~~~~~~~~cpp 317Document d; 318Value o(kObjectType); 319{ 320 Value contacts(kArrayType); 321 // adding elements to contacts array. 322 o.AddMember("contacts", contacts, d.GetAllocator()); // just memcpy() of contacts itself to the value of new member (16 bytes) 323 // contacts became Null here. Its destruction is trivial. 324} 325~~~~~~~~~~ 326 327![Move semantics makes no copying.](diagram/move3.png) 328 329This is called move assignment operator in C++11. As RapidJSON supports C++03, it adopts move semantics using assignment operator, and all other modifying function like `AddMember()`, `PushBack()`. 330 331### Move semantics and temporary values {#TemporaryValues} 332 333Sometimes, it is convenient to construct a Value in place, before passing it to one of the "moving" functions, like `PushBack()` or `AddMember()`. As temporary objects can't be converted to proper Value references, the convenience function `Move()` is available: 334 335~~~~~~~~~~cpp 336Value a(kArrayType); 337Document::AllocatorType& allocator = document.GetAllocator(); 338// a.PushBack(Value(42), allocator); // will not compile 339a.PushBack(Value().SetInt(42), allocator); // fluent API 340a.PushBack(Value(42).Move(), allocator); // same as above 341~~~~~~~~~~ 342 343## Create String {#CreateString} 344RapidJSON provide two strategies for storing string. 345 3461. copy-string: allocates a buffer, and then copy the source data into it. 3472. const-string: simply store a pointer of string. 348 349Copy-string is always safe because it owns a copy of the data. Const-string can be used for storing string literal, and in-situ parsing which we will mentioned in Document section. 350 351To make memory allocation customizable, RapidJSON requires user to pass an instance of allocator, whenever an operation may require allocation. This design is needed to prevent storing a allocator (or Document) pointer per Value. 352 353Therefore, when we assign a copy-string, we call this overloaded `SetString()` with allocator: 354 355~~~~~~~~~~cpp 356Document document; 357Value author; 358char buffer[10]; 359int len = sprintf(buffer, "%s %s", "Milo", "Yip"); // dynamically created string. 360author.SetString(buffer, len, document.GetAllocator()); 361memset(buffer, 0, sizeof(buffer)); 362// author.GetString() still contains "Milo Yip" after buffer is destroyed 363~~~~~~~~~~ 364 365In this example, we get the allocator from a `Document` instance. This is a common idiom when using RapidJSON. But you may use other instances of allocator. 366 367Besides, the above `SetString()` requires length. This can handle null characters within a string. There is another `SetString()` overloaded function without the length parameter. And it assumes the input is null-terminated and calls a `strlen()`-like function to obtain the length. 368 369Finally, for string literal or string with safe life-cycle can use const-string version of `SetString()`, which lacks allocator parameter. For string literals (or constant character arrays), simply passing the literal as parameter is safe and efficient: 370 371~~~~~~~~~~cpp 372Value s; 373s.SetString("rapidjson"); // can contain null character, length derived at compile time 374s = "rapidjson"; // shortcut, same as above 375~~~~~~~~~~ 376 377For character pointer, the RapidJSON requires to mark it as safe before using it without copying. This can be achieved by using the `StringRef` function: 378 379~~~~~~~~~cpp 380const char * cstr = getenv("USER"); 381size_t cstr_len = ...; // in case length is available 382Value s; 383// s.SetString(cstr); // will not compile 384s.SetString(StringRef(cstr)); // ok, assume safe lifetime, null-terminated 385s = StringRef(cstr); // shortcut, same as above 386s.SetString(StringRef(cstr,cstr_len)); // faster, can contain null character 387s = StringRef(cstr,cstr_len); // shortcut, same as above 388 389~~~~~~~~~ 390 391## Modify Array {#ModifyArray} 392Value with array type provides similar APIs as `std::vector`. 393 394* `Clear()` 395* `Reserve(SizeType, Allocator&)` 396* `Value& PushBack(Value&, Allocator&)` 397* `template <typename T> GenericValue& PushBack(T, Allocator&)` 398* `Value& PopBack()` 399* `ValueIterator Erase(ConstValueIterator pos)` 400* `ValueIterator Erase(ConstValueIterator first, ConstValueIterator last)` 401 402Note that, `Reserve(...)` and `PushBack(...)` may allocate memory for the array elements, therefore require an allocator. 403 404Here is an example of `PushBack()`: 405 406~~~~~~~~~~cpp 407Value a(kArrayType); 408Document::AllocatorType& allocator = document.GetAllocator(); 409 410for (int i = 5; i <= 10; i++) 411 a.PushBack(i, allocator); // allocator is needed for potential realloc(). 412 413// Fluent interface 414a.PushBack("Lua", allocator).PushBack("Mio", allocator); 415~~~~~~~~~~ 416 417Differs from STL, `PushBack()`/`PopBack()` returns the array reference itself. This is called _fluent interface_. 418 419If you want to add a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)) to the array, you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place: 420 421~~~~~~~~~~cpp 422// in-place Value parameter 423contact.PushBack(Value("copy", document.GetAllocator()).Move(), // copy string 424 document.GetAllocator()); 425 426// explicit parameters 427Value val("key", document.GetAllocator()); // copy string 428contact.PushBack(val, document.GetAllocator()); 429~~~~~~~~~~ 430 431## Modify Object {#ModifyObject} 432Object is a collection of key-value pairs (members). Each key must be a string value. To modify an object, either add or remove members. THe following APIs are for adding members: 433 434* `Value& AddMember(Value&, Value&, Allocator& allocator)` 435* `Value& AddMember(StringRefType, Value&, Allocator&)` 436* `template <typename T> Value& AddMember(StringRefType, T value, Allocator&)` 437 438Here is an example. 439 440~~~~~~~~~~cpp 441Value contact(kObject); 442contact.AddMember("name", "Milo", document.GetAllocator()); 443contact.AddMember("married", true, document.GetAllocator()); 444~~~~~~~~~~ 445 446The name parameter with `StringRefType` is similar to the interface of `SetString` function for string values. These overloads are used to avoid the need for copying the `name` string, as constant key names are very common in JSON objects. 447 448If you need to create a name from a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)), you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place: 449 450~~~~~~~~~~cpp 451// in-place Value parameter 452contact.AddMember(Value("copy", document.GetAllocator()).Move(), // copy string 453 Value().Move(), // null value 454 document.GetAllocator()); 455 456// explicit parameters 457Value key("key", document.GetAllocator()); // copy string name 458Value val(42); // some value 459contact.AddMember(key, val, document.GetAllocator()); 460~~~~~~~~~~ 461 462For removing members, there are several choices: 463 464* `bool RemoveMember(const Ch* name)`: Remove a member by search its name (linear time complexity). 465* `bool RemoveMember(const Value& name)`: same as above but `name` is a Value. 466* `MemberIterator RemoveMember(MemberIterator)`: Remove a member by iterator (_constant_ time complexity). 467* `MemberIterator EraseMember(MemberIterator)`: similar to the above but it preserves order of members (linear time complexity). 468* `MemberIterator EraseMember(MemberIterator first, MemberIterator last)`: remove a range of members, preserves order (linear time complexity). 469 470`MemberIterator RemoveMember(MemberIterator)` uses a "move-last" trick to achieve constant time complexity. Basically the member at iterator is destructed, and then the last element is moved to that position. So the order of the remaining members are changed. 471 472## Deep Copy Value {#DeepCopyValue} 473If we really need to copy a DOM tree, we can use two APIs for deep copy: constructor with allocator, and `CopyFrom()`. 474 475~~~~~~~~~~cpp 476Document d; 477Document::AllocatorType& a = d.GetAllocator(); 478Value v1("foo"); 479// Value v2(v1); // not allowed 480 481Value v2(v1, a); // make a copy 482assert(v1.IsString()); // v1 untouched 483d.SetArray().PushBack(v1, a).PushBack(v2, a); 484assert(v1.IsNull() && v2.IsNull()); // both moved to d 485 486v2.CopyFrom(d, a); // copy whole document to v2 487assert(d.IsArray() && d.Size() == 2); // d untouched 488v1.SetObject().AddMember("array", v2, a); 489d.PushBack(v1, a); 490~~~~~~~~~~ 491 492## Swap Values {#SwapValues} 493 494`Swap()` is also provided. 495 496~~~~~~~~~~cpp 497Value a(123); 498Value b("Hello"); 499a.Swap(b); 500assert(a.IsString()); 501assert(b.IsInt()); 502~~~~~~~~~~ 503 504Swapping two DOM trees is fast (constant time), despite the complexity of the trees. 505 506# What's next {#WhatsNext} 507 508This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON: 509 5101. [Streams](doc/stream.md) are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams. 5112. [Encoding](doc/encoding.md) defines which character encoding is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally. 5123. [DOM](doc/dom.md)'s basics are already covered in this tutorial. Uncover more advanced features such as *in situ* parsing, other parsing options and advanced usages. 5134. [SAX](doc/sax.md) is the foundation of parsing/generating facility in RapidJSON. Learn how to use `Reader`/`Writer` to implement even faster applications. Also try `PrettyWriter` to format the JSON. 5145. [Performance](doc/performance.md) shows some in-house and third-party benchmarks. 5156. [Internals](doc/internals.md) describes some internal designs and techniques of RapidJSON. 516 517You may also refer to the [FAQ](doc/faq.md), API documentation, examples and unit tests. 518