1# Tutorial
2
3This tutorial introduces the basics of the Document Object Model(DOM) API.
4
5As shown in [Usage at a glance](@ref index), a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON.
6
7[TOC]
8
9# Value & Document {#ValueDocument}
10
11Each JSON value is stored in a type called `Value`. A `Document`, representing the DOM, contains the root `Value` of the DOM tree. All public types and functions of RapidJSON are defined in the `rapidjson` namespace.
12
13# Query Value {#QueryValue}
14
15In this section, we will use excerpt of `example/tutorial/tutorial.cpp`.
16
17Assumes we have a JSON stored in a C string (`const char* json`):
18~~~~~~~~~~js
19{
20    "hello": "world",
21    "t": true ,
22    "f": false,
23    "n": null,
24    "i": 123,
25    "pi": 3.1416,
26    "a": [1, 2, 3, 4]
27}
28~~~~~~~~~~
29
30Parse it into a `Document`:
31~~~~~~~~~~cpp
32#include "rapidjson/document.h"
33
34using namespace rapidjson;
35
36// ...
37Document document;
38document.Parse(json);
39~~~~~~~~~~
40
41The JSON is now parsed into `document` as a *DOM tree*:
42
43![DOM in the tutorial](diagram/tutorial.png)
44
45Since the update to RFC 7159, the root of a conforming JSON document can be any JSON value.  In earlier RFC 4627, only objects or arrays were allowed as root values. In this case, the root is an object.
46~~~~~~~~~~cpp
47assert(document.IsObject());
48~~~~~~~~~~
49
50Let's query whether a `"hello"` member exists in the root object. Since a `Value` can contain different types of value, we may need to verify its type and use suitable API to obtain the value. In this example, `"hello"` member associates with a JSON string.
51~~~~~~~~~~cpp
52assert(document.HasMember("hello"));
53assert(document["hello"].IsString());
54printf("hello = %s\n", document["hello"].GetString());
55~~~~~~~~~~
56
57~~~~~~~~~~
58world
59~~~~~~~~~~
60
61JSON true/false values are represented as `bool`.
62~~~~~~~~~~cpp
63assert(document["t"].IsBool());
64printf("t = %s\n", document["t"].GetBool() ? "true" : "false");
65~~~~~~~~~~
66
67~~~~~~~~~~
68true
69~~~~~~~~~~
70
71JSON null can be queryed by `IsNull()`.
72~~~~~~~~~~cpp
73printf("n = %s\n", document["n"].IsNull() ? "null" : "?");
74~~~~~~~~~~
75
76~~~~~~~~~~
77null
78~~~~~~~~~~
79
80JSON number type represents all numeric values. However, C++ needs more specific type for manipulation.
81
82~~~~~~~~~~cpp
83assert(document["i"].IsNumber());
84
85// In this case, IsUint()/IsInt64()/IsUInt64() also return true.
86assert(document["i"].IsInt());
87printf("i = %d\n", document["i"].GetInt());
88// Alternative (int)document["i"]
89
90assert(document["pi"].IsNumber());
91assert(document["pi"].IsDouble());
92printf("pi = %g\n", document["pi"].GetDouble());
93~~~~~~~~~~
94
95~~~~~~~~~~
96i = 123
97pi = 3.1416
98~~~~~~~~~~
99
100JSON array contains a number of elements.
101~~~~~~~~~~cpp
102// Using a reference for consecutive access is handy and faster.
103const Value& a = document["a"];
104assert(a.IsArray());
105for (SizeType i = 0; i < a.Size(); i++) // Uses SizeType instead of size_t
106        printf("a[%d] = %d\n", i, a[i].GetInt());
107~~~~~~~~~~
108
109~~~~~~~~~~
110a[0] = 1
111a[1] = 2
112a[2] = 3
113a[3] = 4
114~~~~~~~~~~
115
116Note that, RapidJSON does not automatically convert values between JSON types. If a value is a string, it is invalid to call `GetInt()`, for example. In debug mode it will fail an assertion. In release mode, the behavior is undefined.
117
118In the following, details about querying individual types are discussed.
119
120## Query Array {#QueryArray}
121
122By default, `SizeType` is typedef of `unsigned`. In most systems, array is limited to store up to 2^32-1 elements.
123
124You may access the elements in array by integer literal, for example, `a[0]`, `a[1]`, `a[2]`.
125
126Array is similar to `std::vector`, instead of using indices, you may also use iterator to access all the elements.
127~~~~~~~~~~cpp
128for (Value::ConstValueIterator itr = a.Begin(); itr != a.End(); ++itr)
129    printf("%d ", itr->GetInt());
130~~~~~~~~~~
131
132And other familiar query functions:
133* `SizeType Capacity() const`
134* `bool Empty() const`
135
136## Query Object {#QueryObject}
137
138Similar to array, we can access all object members by iterator:
139
140~~~~~~~~~~cpp
141static const char* kTypeNames[] =
142    { "Null", "False", "True", "Object", "Array", "String", "Number" };
143
144for (Value::ConstMemberIterator itr = document.MemberBegin();
145    itr != document.MemberEnd(); ++itr)
146{
147    printf("Type of member %s is %s\n",
148        itr->name.GetString(), kTypeNames[itr->value.GetType()]);
149}
150~~~~~~~~~~
151
152~~~~~~~~~~
153Type of member hello is String
154Type of member t is True
155Type of member f is False
156Type of member n is Null
157Type of member i is Number
158Type of member pi is Number
159Type of member a is Array
160~~~~~~~~~~
161
162Note that, when `operator[](const char*)` cannot find the member, it will fail an assertion.
163
164If we are unsure whether a member exists, we need to call `HasMember()` before calling `operator[](const char*)`. However, this incurs two lookup. A better way is to call `FindMember()`, which can check the existence of member and obtain its value at once:
165
166~~~~~~~~~~cpp
167Value::ConstMemberIterator itr = document.FindMember("hello");
168if (itr != document.MemberEnd())
169    printf("%s %s\n", itr->value.GetString());
170~~~~~~~~~~
171
172## Querying Number {#QueryNumber}
173
174JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser.
175
176As C++ provides several integer and floating point number types, the DOM tries to handle these with widest possible range and good performance.
177
178When a Number is parsed, it is stored in the DOM as either one of the following type:
179
180Type       | Description
181-----------|---------------------------------------
182`unsigned` | 32-bit unsigned integer
183`int`      | 32-bit signed integer
184`uint64_t` | 64-bit unsigned integer
185`int64_t`  | 64-bit signed integer
186`double`   | 64-bit double precision floating point
187
188When querying a number, you can check whether the number can be obtained as target type:
189
190Checking          | Obtaining
191------------------|---------------------
192`bool IsNumber()` | N/A
193`bool IsUint()`   | `unsigned GetUint()`
194`bool IsInt()`    | `int GetInt()`
195`bool IsUint64()` | `uint64_t GetUint64()`
196`bool IsInt64()`  | `int64_t GetInt64()`
197`bool IsDouble()` | `double GetDouble()`
198
199Note that, an integer value may be obtained in various ways without conversion. For example, A value `x` containing 123 will make `x.IsInt() == x.IsUint() == x.IsInt64() == x.IsUint64() == true`. But a value `y` containing -3000000000 will only makes `x.IsInt64() == true`.
200
201When obtaining the numeric values, `GetDouble()` will convert internal integer representation to a `double`. Note that, `int` and `unsigned` can be safely convert to `double`, but `int64_t` and `uint64_t` may lose precision (since mantissa of `double` is only 52-bits).
202
203## Query String {#QueryString}
204
205In addition to `GetString()`, the `Value` class also contains `GetStringLength()`. Here explains why.
206
207According to RFC 4627, JSON strings can contain Unicode character `U+0000`, which must be escaped as `"\u0000"`. The problem is that, C/C++ often uses null-terminated string, which treats ``\0'` as the terminator symbol.
208
209To conform RFC 4627, RapidJSON supports string containing `U+0000`. If you need to handle this, you can use `GetStringLength()` API to obtain the correct length of string.
210
211For example, after parsing a the following JSON to `Document d`:
212
213~~~~~~~~~~js
214{ "s" :  "a\u0000b" }
215~~~~~~~~~~
216The correct length of the value `"a\u0000b"` is 3. But `strlen()` returns 1.
217
218`GetStringLength()` can also improve performance, as user may often need to call `strlen()` for allocating buffer.
219
220Besides, `std::string` also support a constructor:
221
222~~~~~~~~~~cpp
223string(const char* s, size_t count);
224~~~~~~~~~~
225
226which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.
227
228## Comparing values
229
230You can use `==` and `!=` to compare values. Two values are equal if and only if they are have same type and contents. You can also compare values with primitive types. Here is an example.
231
232~~~~~~~~~~cpp
233if (document["hello"] == document["n"]) /*...*/;    // Compare values
234if (document["hello"] == "world") /*...*/;          // Compare value with literal string
235if (document["i"] != 123) /*...*/;                  // Compare with integers
236if (document["pi"] != 3.14) /*...*/;                // Compare with double.
237~~~~~~~~~~
238
239Array/object compares their elements/members in order. They are equal if and only if their whole subtrees are equal.
240
241Note that, currently if an object contains duplicated named member, comparing equality with any object is always `false`.
242
243# Create/Modify Values {#CreateModifyValues}
244
245There are several ways to create values. After a DOM tree is created and/or modified, it can be saved as JSON again using `Writer`.
246
247## Change Value Type {#ChangeValueType}
248When creating a Value or Document by default constructor, its type is Null. To change its type, call `SetXXX()` or assignment operator, for example:
249
250~~~~~~~~~~cpp
251Document d; // Null
252d.SetObject();
253
254Value v;    // Null
255v.SetInt(10);
256v = 10;     // Shortcut, same as above
257~~~~~~~~~~
258
259### Overloaded Constructors
260There are also overloaded constructors for several types:
261
262~~~~~~~~~~cpp
263Value b(true);    // calls Value(bool)
264Value i(-123);    // calls Value(int)
265Value u(123u);    // calls Value(unsigned)
266Value d(1.5);     // calls Value(double)
267~~~~~~~~~~
268
269To create empty object or array, you may use `SetObject()`/`SetArray()` after default constructor, or using the `Value(Type)` in one shot:
270
271~~~~~~~~~~cpp
272Value o(kObjectType);
273Value a(kArrayType);
274~~~~~~~~~~
275
276## Move Semantics {#MoveSemantics}
277
278A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example,
279
280~~~~~~~~~~cpp
281Value a(123);
282Value b(456);
283b = a;         // a becomes a Null value, b becomes number 123.
284~~~~~~~~~~
285
286![Assignment with move semantics.](diagram/move1.png)
287
288Why? What is the advantage of this semantics?
289
290The simple answer is performance. For fixed size JSON types (Number, True, False, Null), copying them is fast and easy. However, For variable size JSON types (String, Array, Object), copying them will incur a lot of overheads. And these overheads are often unnoticed. Especially when we need to create temporary object, copy it to another variable, and then destruct it.
291
292For example, if normal *copy* semantics was used:
293
294~~~~~~~~~~cpp
295Document d;
296Value o(kObjectType);
297{
298    Value contacts(kArrayType);
299    // adding elements to contacts array.
300    // ...
301    o.AddMember("contacts", contacts, d.GetAllocator());  // deep clone contacts (may be with lots of allocations)
302    // destruct contacts.
303}
304~~~~~~~~~~
305
306![Copy semantics makes a lots of copy operations.](diagram/move2.png)
307
308The object `o` needs to allocate a buffer of same size as contacts, makes a deep clone of it, and then finally contacts is destructed. This will incur a lot of unnecessary allocations/deallocations and memory copying.
309
310There are solutions to prevent actual copying these data, such as reference counting and garbage collection(GC).
311
312To make RapidJSON simple and fast, we chose to use *move* semantics for assignment. It is similar to `std::auto_ptr` which transfer ownership during assignment. Move is much faster and simpler, it just destructs the original value, `memcpy()` the source to destination, and finally sets the source as Null type.
313
314So, with move semantics, the above example becomes:
315
316~~~~~~~~~~cpp
317Document d;
318Value o(kObjectType);
319{
320    Value contacts(kArrayType);
321    // adding elements to contacts array.
322    o.AddMember("contacts", contacts, d.GetAllocator());  // just memcpy() of contacts itself to the value of new member (16 bytes)
323    // contacts became Null here. Its destruction is trivial.
324}
325~~~~~~~~~~
326
327![Move semantics makes no copying.](diagram/move3.png)
328
329This is called move assignment operator in C++11. As RapidJSON supports C++03, it adopts move semantics using assignment operator, and all other modifying function like `AddMember()`, `PushBack()`.
330
331### Move semantics and temporary values {#TemporaryValues}
332
333Sometimes, it is convenient to construct a Value in place, before passing it to one of the "moving" functions, like `PushBack()` or `AddMember()`.  As temporary objects can't be converted to proper Value references, the convenience function `Move()` is available:
334
335~~~~~~~~~~cpp
336Value a(kArrayType);
337Document::AllocatorType& allocator = document.GetAllocator();
338// a.PushBack(Value(42), allocator);       // will not compile
339a.PushBack(Value().SetInt(42), allocator); // fluent API
340a.PushBack(Value(42).Move(), allocator);   // same as above
341~~~~~~~~~~
342
343## Create String {#CreateString}
344RapidJSON provide two strategies for storing string.
345
3461. copy-string: allocates a buffer, and then copy the source data into it.
3472. const-string: simply store a pointer of string.
348
349Copy-string is always safe because it owns a copy of the data. Const-string can be used for storing string literal, and in-situ parsing which we will mentioned in Document section.
350
351To make memory allocation customizable, RapidJSON requires user to pass an instance of allocator, whenever an operation may require allocation. This design is needed to prevent storing a allocator (or Document) pointer per Value.
352
353Therefore, when we assign a copy-string, we call this overloaded `SetString()` with allocator:
354
355~~~~~~~~~~cpp
356Document document;
357Value author;
358char buffer[10];
359int len = sprintf(buffer, "%s %s", "Milo", "Yip"); // dynamically created string.
360author.SetString(buffer, len, document.GetAllocator());
361memset(buffer, 0, sizeof(buffer));
362// author.GetString() still contains "Milo Yip" after buffer is destroyed
363~~~~~~~~~~
364
365In this example, we get the allocator from a `Document` instance. This is a common idiom when using RapidJSON. But you may use other instances of allocator.
366
367Besides, the above `SetString()` requires length. This can handle null characters within a string. There is another `SetString()` overloaded function without the length parameter. And it assumes the input is null-terminated and calls a `strlen()`-like function to obtain the length.
368
369Finally, for string literal or string with safe life-cycle can use const-string version of `SetString()`, which lacks allocator parameter.  For string literals (or constant character arrays), simply passing the literal as parameter is safe and efficient:
370
371~~~~~~~~~~cpp
372Value s;
373s.SetString("rapidjson");    // can contain null character, length derived at compile time
374s = "rapidjson";             // shortcut, same as above
375~~~~~~~~~~
376
377For character pointer, the RapidJSON requires to mark it as safe before using it without copying. This can be achieved by using the `StringRef` function:
378
379~~~~~~~~~cpp
380const char * cstr = getenv("USER");
381size_t cstr_len = ...;                 // in case length is available
382Value s;
383// s.SetString(cstr);                  // will not compile
384s.SetString(StringRef(cstr));          // ok, assume safe lifetime, null-terminated
385s = StringRef(cstr);                   // shortcut, same as above
386s.SetString(StringRef(cstr,cstr_len)); // faster, can contain null character
387s = StringRef(cstr,cstr_len);          // shortcut, same as above
388
389~~~~~~~~~
390
391## Modify Array {#ModifyArray}
392Value with array type provides similar APIs as `std::vector`.
393
394* `Clear()`
395* `Reserve(SizeType, Allocator&)`
396* `Value& PushBack(Value&, Allocator&)`
397* `template <typename T> GenericValue& PushBack(T, Allocator&)`
398* `Value& PopBack()`
399* `ValueIterator Erase(ConstValueIterator pos)`
400* `ValueIterator Erase(ConstValueIterator first, ConstValueIterator last)`
401
402Note that, `Reserve(...)` and `PushBack(...)` may allocate memory for the array elements, therefore require an allocator.
403
404Here is an example of `PushBack()`:
405
406~~~~~~~~~~cpp
407Value a(kArrayType);
408Document::AllocatorType& allocator = document.GetAllocator();
409
410for (int i = 5; i <= 10; i++)
411    a.PushBack(i, allocator);   // allocator is needed for potential realloc().
412
413// Fluent interface
414a.PushBack("Lua", allocator).PushBack("Mio", allocator);
415~~~~~~~~~~
416
417Differs from STL, `PushBack()`/`PopBack()` returns the array reference itself. This is called _fluent interface_.
418
419If you want to add a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)) to the array, you need to create a string Value by using the copy-string API.  To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place:
420
421~~~~~~~~~~cpp
422// in-place Value parameter
423contact.PushBack(Value("copy", document.GetAllocator()).Move(), // copy string
424                 document.GetAllocator());
425
426// explicit parameters
427Value val("key", document.GetAllocator()); // copy string
428contact.PushBack(val, document.GetAllocator());
429~~~~~~~~~~
430
431## Modify Object {#ModifyObject}
432Object is a collection of key-value pairs (members). Each key must be a string value. To modify an object, either add or remove members. THe following APIs are for adding members:
433
434* `Value& AddMember(Value&, Value&, Allocator& allocator)`
435* `Value& AddMember(StringRefType, Value&, Allocator&)`
436* `template <typename T> Value& AddMember(StringRefType, T value, Allocator&)`
437
438Here is an example.
439
440~~~~~~~~~~cpp
441Value contact(kObject);
442contact.AddMember("name", "Milo", document.GetAllocator());
443contact.AddMember("married", true, document.GetAllocator());
444~~~~~~~~~~
445
446The name parameter with `StringRefType` is similar to the interface of `SetString` function for string values. These overloads are used to avoid the need for copying the `name` string, as constant key names are very common in JSON objects.
447
448If you need to create a name from a non-constant string or a string without sufficient lifetime (see [Create String](#CreateString)), you need to create a string Value by using the copy-string API.  To avoid the need for an intermediate variable, you can use a [temporary value](#TemporaryValues) in place:
449
450~~~~~~~~~~cpp
451// in-place Value parameter
452contact.AddMember(Value("copy", document.GetAllocator()).Move(), // copy string
453                  Value().Move(),                                // null value
454                  document.GetAllocator());
455
456// explicit parameters
457Value key("key", document.GetAllocator()); // copy string name
458Value val(42);                             // some value
459contact.AddMember(key, val, document.GetAllocator());
460~~~~~~~~~~
461
462For removing members, there are several choices:
463
464* `bool RemoveMember(const Ch* name)`: Remove a member by search its name (linear time complexity).
465* `bool RemoveMember(const Value& name)`: same as above but `name` is a Value.
466* `MemberIterator RemoveMember(MemberIterator)`: Remove a member by iterator (_constant_ time complexity).
467* `MemberIterator EraseMember(MemberIterator)`: similar to the above but it preserves order of members (linear time complexity).
468* `MemberIterator EraseMember(MemberIterator first, MemberIterator last)`: remove a range of members, preserves order (linear time complexity).
469
470`MemberIterator RemoveMember(MemberIterator)` uses a "move-last" trick to achieve constant time complexity. Basically the member at iterator is destructed, and then the last element is moved to that position. So the order of the remaining members are changed.
471
472## Deep Copy Value {#DeepCopyValue}
473If we really need to copy a DOM tree, we can use two APIs for deep copy: constructor with allocator, and `CopyFrom()`.
474
475~~~~~~~~~~cpp
476Document d;
477Document::AllocatorType& a = d.GetAllocator();
478Value v1("foo");
479// Value v2(v1); // not allowed
480
481Value v2(v1, a);                      // make a copy
482assert(v1.IsString());                // v1 untouched
483d.SetArray().PushBack(v1, a).PushBack(v2, a);
484assert(v1.IsNull() && v2.IsNull());   // both moved to d
485
486v2.CopyFrom(d, a);                    // copy whole document to v2
487assert(d.IsArray() && d.Size() == 2); // d untouched
488v1.SetObject().AddMember("array", v2, a);
489d.PushBack(v1, a);
490~~~~~~~~~~
491
492## Swap Values {#SwapValues}
493
494`Swap()` is also provided.
495
496~~~~~~~~~~cpp
497Value a(123);
498Value b("Hello");
499a.Swap(b);
500assert(a.IsString());
501assert(b.IsInt());
502~~~~~~~~~~
503
504Swapping two DOM trees is fast (constant time), despite the complexity of the trees.
505
506# What's next {#WhatsNext}
507
508This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON:
509
5101. [Streams](doc/stream.md) are channels for reading/writing JSON, which can be a in-memory string, or file stream, etc. User can also create their streams.
5112. [Encoding](doc/encoding.md) defines which character encoding is used in streams and memory. RapidJSON also provide Unicode conversion/validation internally.
5123. [DOM](doc/dom.md)'s basics are already covered in this tutorial. Uncover more advanced features such as *in situ* parsing, other parsing options and advanced usages.
5134. [SAX](doc/sax.md) is the foundation of parsing/generating facility in RapidJSON. Learn how to use `Reader`/`Writer` to implement even faster applications. Also try `PrettyWriter` to format the JSON.
5145. [Performance](doc/performance.md) shows some in-house and third-party benchmarks.
5156. [Internals](doc/internals.md) describes some internal designs and techniques of RapidJSON.
516
517You may also refer to the [FAQ](doc/faq.md), API documentation, examples and unit tests.
518