1/** @mainpage 2 3<h1> TinyXml </h1> 4 5TinyXml is a simple, small, C++ XML parser that can be easily 6integrating into other programs. 7 8<h2> What it does. </h2> 9 10In brief, TinyXml parses an XML document, and builds from that a 11Document Object Model (DOM) that can be read, modified, and saved. 12 13XML stands for "eXtensible Markup Language." It allows you to create 14your own document markups. Where HTML does a very good job of marking 15documents for browsers, XML allows you to define any kind of document 16markup, for example a document that describes a "to do" list for an 17organizer application. XML is a very structured and convenient format. 18All those random file formats created to store application data can 19all be replaced with XML. One parser for everything. 20 21The best place for the complete, correct, and quite frankly hard to 22read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/"> 23http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML 24(that I really like) can be found at 25<a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>. 26 27There are different ways to access and interact with XML data. 28TinyXml uses a Document Object Model (DOM), meaning the XML data is parsed 29into a C++ objects that can be browsed and manipulated, and then 30written to disk or another output stream. You can also construct an XML document from 31scratch with C++ objects and write this to disk or another output 32stream. 33 34TinyXml is designed to be easy and fast to learn. It is two headers 35and four cpp files. Simply add these to your project and off you go. 36There is an example file - xmltest.cpp - to get you started. 37 38TinyXml is released under the ZLib license, 39so you can use it in open source or commercial code. The details 40of the license are at the top of every source file. 41 42TinyXml attempts to be a flexible parser, but with truly correct and 43compliant XML output. TinyXml should compile on any reasonably C++ 44compliant system. It does not rely on exceptions or RTTI. It can be 45compiled with or without STL support. TinyXml fully supports 46the UTF-8 encoding, and the first 64k character entities. 47 48 49<h2> What it doesn't do. </h2> 50 51It doesnt parse or use DTDs (Document Type Definitions) or XSLs 52(eXtensible Stylesheet Language.) There are other parsers out there 53(check out www.sourceforge.org, search for XML) that are much more fully 54featured. But they are also much bigger, take longer to set up in 55your project, have a higher learning curve, and often have a more 56restrictive license. If you are working with browsers or have more 57complete XML needs, TinyXml is not the parser for you. 58 59The following DTD syntax will not parse at this time in TinyXml: 60 61@verbatim 62 <!DOCTYPE Archiv [ 63 <!ELEMENT Comment (#PCDATA)> 64 ]> 65@endverbatim 66 67because TinyXml sees this as a !DOCTYPE node with an illegally 68embedded !ELEMENT node. This may be addressed in the future. 69 70<h2> Tutorials. </h2> 71 72For the impatient, here is a tutorial to get you going. A great way to get started, 73but it is worth your time to read this (very short) manual completely. 74 75- @subpage tutorial0 76 77<h2> Code Status. </h2> 78 79TinyXml is mature, tested code. It is very stable. If you find 80bugs, please file a bug report on the sourceforge web site 81(www.sourceforge.net/projects/tinyxml). 82We'll get them straightened out as soon as possible. 83 84There are some areas of improvement; please check sourceforge if you are 85interested in working on TinyXml. 86 87 88<h2> Features </h2> 89 90<h3> Using STL </h3> 91 92TinyXml can be compiled to use or not use STL. When using STL, TinyXml 93uses the std::string class, and fully supports std::istream, std::ostream, 94operator<<, and operator>>. Many API methods have both 'const char*' and 95'const std::string&' forms. 96 97When STL support is compiled out, no STL files are included whatsover. All 98the string classes are implemented by TinyXml itself. API methods 99all use the 'const char*' form for input. 100 101Use the compile time #define: 102 103 TIXML_USE_STL 104 105to compile one version or the other. This can be passed by the compiler, 106or set as the first line of "tinyxml.h". 107 108Note: If compiling the test code in Linux, setting the environment 109variable TINYXML_USE_STL=YES/NO will control STL compilation. In the 110Windows project file, STL and non STL targets are provided. In your project, 111its probably easiest to add the line "#define TIXML_USE_STL" as the first 112line of tinyxml.h. 113 114<h3> UTF-8 </h3> 115 116TinyXml supports UTF-8 allowing to manipulate XML files in any language. TinyXml 117also supports "legacy mode" - the encoding used before UTF-8 support and 118probably best described as "extended ascii". 119 120Normally, TinyXml will try to detect the correct encoding and use it. However, 121by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXml 122can be forced to always use one encoding. 123 124TinyXml will assume Legacy Mode until one of the following occurs: 125<ol> 126 <li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf) 127 begin the file or data stream, TinyXml will read it as UTF-8. </li> 128 <li> If the declaration tag is read, and it has an encoding="UTF-8", then 129 TinyXml will read it as UTF-8. </li> 130 <li> If the declaration tag is read, and it has no encoding specified, then 131 TinyXml will read it as UTF-8. </li> 132 <li> If the declaration tag is read, and it has an encoding="something else", then 133 TinyXml will read it as Legacy Mode. In legacy mode, TinyXml will 134 work as it did before. It's not clear what that mode does exactly, but 135 old content should keep working.</li> 136 <li> Until one of the above criteria is met, TinyXml runs in Legacy Mode.</li> 137</ol> 138 139What happens if the encoding is incorrectly set or detected? TinyXml will try 140to read and pass through text seen as improperly encoded. You may get some strange 141results or mangled characters. You may want to force TinyXml to the correct mode. 142 143<b> You may force TinyXml to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or 144LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all 145the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may 146force it to TIXML_ENCODING_UTF8 with the same technique.</b> 147 148For English users, using English XML, UTF-8 is the same as low-ASCII. You 149don't need to be aware of UTF-8 or change your code in any way. You can think 150of UTF-8 as a "superset" of ASCII. 151 152UTF-8 is not a double byte format - but it is a standard encoding of Unicode! 153TinyXml does not use or directly support wchar, TCHAR, or Microsofts _UNICODE at this time. 154It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding 155of unicode. This is a source of confusion. 156 157For "high-ascii" languages - everything not English, pretty much - TinyXml can 158handle all languages, at the same time, as long as the XML is encoded 159in UTF-8. That can be a little tricky, older programs and operating systems 160tend to use the "default" or "traditional" code page. Many apps (and almost all 161modern ones) can output UTF-8, but older or stubborn (or just broken) ones 162still output text in the default code page. 163 164For example, Japanese systems traditionally use SHIFT-JIS encoding. 165Text encoded as SHIFT-JIS can not be read by tinyxml. 166A good text editor can import SHIFT-JIS and then save as UTF-8. 167 168The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great 169job covering the encoding issue. 170 171The test file "utf8test.xml" is an XML containing English, Spanish, Russian, 172and Simplified Chinese. (Hopefully they are translated correctly). The file 173"utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that 174if you don't have the correct fonts (Simplified Chinese or Russian) on your 175system, you won't see output that matches the GIF file even if you can parse 176it correctly. Also note that (at least on my Windows machine) console output 177is in a Western code page, so that Print() or printf() cannot correctly display 178the file. This is not a bug in TinyXml - just an OS issue. No data is lost or 179destroyed by TinyXml. The console just doesn't render UTF-8. 180 181 182<h3> Entities </h3> 183TinyXml recognizes the pre-defined "character entities", meaning special 184characters. Namely: 185 186@verbatim 187 & & 188 < < 189 > > 190 " " 191 ' ' 192@endverbatim 193 194These are recognized when the XML document is read, and translated to there 195UTF-8 equivalents. For instance, text with the XML of: 196 197@verbatim 198 Far & Away 199@endverbatim 200 201will have the Value() of "Far & Away" when queried from the TiXmlText object, 202and will be written back to the XML stream/file as an ampersand. Older versions 203of TinyXml "preserved" character entities, but the newer versions will translate 204them into characters. 205 206Additionally, any character can be specified by its Unicode code point: 207The syntax " " or " " are both to the non-breaking space characher. 208 209 210<h3> Streams </h3> 211With TIXML_USE_STL on, 212TiXml has been modified to support both C (FILE) and C++ (operator <<,>>) 213streams. There are some differences that you may need to be aware of. 214 215C style output: 216 - based on FILE* 217 - the Print() and SaveFile() methods 218 219 Generates formatted output, with plenty of white space, intended to be as 220 human-readable as possible. They are very fast, and tolerant of ill formed 221 XML documents. For example, an XML document that contains 2 root elements 222 and 2 declarations, will still print. 223 224C style input: 225 - based on FILE* 226 - the Parse() and LoadFile() methods 227 228 A fast, tolerant read. Use whenever you don't need the C++ streams. 229 230C++ style ouput: 231 - based on std::ostream 232 - operator<< 233 234 Generates condensed output, intended for network transmission rather than 235 readability. Depending on your system's implementation of the ostream class, 236 these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML: 237 a document should contain the correct one root element. Additional root level 238 elements will not be streamed out. 239 240C++ style input: 241 - based on std::istream 242 - operator>> 243 244 Reads XML from a stream, making it useful for network transmission. The tricky 245 part is knowing when the XML document is complete, since there will almost 246 certainly be other data in the stream. TinyXml will assume the XML data is 247 complete after it reads the root element. Put another way, documents that 248 are ill-constructed with more than one root element will not read correctly. 249 Also note that operator>> is somewhat slower than Parse, due to both 250 implementation of the STL and limitations of TinyXml. 251 252<h3> White space </h3> 253The world simply does not agree on whether white space should be kept, or condensed. 254For example, pretend the '_' is a space, and look at "Hello____world". HTML, and 255at least some XML parsers, will interpret this as "Hello_world". They condense white 256space. Some XML parsers do not, and will leave it as "Hello____world". (Remember 257to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become 258Hello___world. 259 260It's an issue that hasn't been resolved to my satisfaction. TinyXml supports the 261first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior. 262The default is to condense white space. 263 264If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool ) 265before making any calls to Parse XML data, and I don't recommend changing it after 266it has been set. 267 268 269<h3> Handles </h3> 270 271Where browsing an XML document in a robust way, it is important to check 272for null returns from method calls. An error safe implementation can 273generate a lot of code like: 274 275@verbatim 276TiXmlElement* root = document.FirstChildElement( "Document" ); 277if ( root ) 278{ 279 TiXmlElement* element = root->FirstChildElement( "Element" ); 280 if ( element ) 281 { 282 TiXmlElement* child = element->FirstChildElement( "Child" ); 283 if ( child ) 284 { 285 TiXmlElement* child2 = child->NextSiblingElement( "Child" ); 286 if ( child2 ) 287 { 288 // Finally do something useful. 289@endverbatim 290 291Handles have been introduced to clean this up. Using the TiXmlHandle class, 292the previous code reduces to: 293 294@verbatim 295TiXmlHandle docHandle( &document ); 296TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).Element(); 297if ( child2 ) 298{ 299 // do something useful 300@endverbatim 301 302Which is much easier to deal with. See TiXmlHandle for more information. 303 304 305<h3> Row and Column tracking </h3> 306Being able to track nodes and attributes back to their origin location 307in source files can be very important for some applications. Additionally, 308knowing where parsing errors occured in the original source can be very 309time saving. 310 311TinyXml can tracks the row and column origin of all nodes and attributes 312in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return 313the origin of the node in the source text. The correct tabs can be 314configured in TiXmlDocument::SetTabSize(). 315 316 317<h2> Using and Installing </h2> 318 319To Compile and Run xmltest: 320 321A Linux Makefile and a Windows Visual C++ .dsw file is provided. 322Simply compile and run. It will write the file demotest.xml to your 323disk and generate output on the screen. It also tests walking the 324DOM by printing out the number of nodes found using different 325techniques. 326 327The Linux makefile is very generic and will 328probably run on other systems, but is only tested on Linux. You no 329longer need to run 'make depend'. The dependecies have been 330hard coded. 331 332<h3>Windows project file for VC6</h3> 333<ul> 334<li>tinyxml: tinyxml library, non-STL </li> 335<li>tinyxmlSTL: tinyxml library, STL </li> 336<li>tinyXmlTest: test app, non-STL </li> 337<li>tinyXmlTestSTL: test app, STL </li> 338</ul> 339 340<h3>Linux Make file</h3> 341At the top of the makefile you can set: 342 343PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in 344the makefile. 345 346In the tinyxml directory, type "make clean" then "make". The executable 347file 'xmltest' will be created. 348 349 350 351<h3>To Use in an Application:</h3> 352 353Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your 354project or make file. That's it! It should compile on any reasonably 355compliant C++ system. You do not need to enable exceptions or 356RTTI for TinyXml. 357 358 359<h2> How TinyXml works. </h2> 360 361An example is probably the best way to go. Take: 362@verbatim 363 <?xml version="1.0" standalone=no> 364 <!-- Our to do list data --> 365 <ToDo> 366 <Item priority="1"> Go to the <bold>Toy store!</bold></Item> 367 <Item priority="2"> Do bills</Item> 368 </ToDo> 369@endverbatim 370 371Its not much of a To Do list, but it will do. To read this file 372(say "demo.xml") you would create a document, and parse it in: 373@verbatim 374 TiXmlDocument doc( "demo.xml" ); 375 doc.LoadFile(); 376@endverbatim 377 378And its ready to go. Now lets look at some lines and how they 379relate to the DOM. 380 381@verbatim 382<?xml version="1.0" standalone=no> 383@endverbatim 384 385 The first line is a declaration, and gets turned into the 386 TiXmlDeclaration class. It will be the first child of the 387 document node. 388 389 This is the only directive/special tag parsed by by TinyXml. 390 Generally directive targs are stored in TiXmlUnknown so the 391 commands wont be lost when it is saved back to disk. 392 393@verbatim 394<!-- Our to do list data --> 395@endverbatim 396 397 A comment. Will become a TiXmlComment object. 398 399@verbatim 400<ToDo> 401@endverbatim 402 403 The "ToDo" tag defines a TiXmlElement object. This one does not have 404 any attributes, but does contain 2 other elements. 405 406@verbatim 407<Item priority="1"> 408@endverbatim 409 410 Creates another TiXmlElement which is a child of the "ToDo" element. 411 This element has 1 attribute, with the name "priority" and the value 412 "1". 413 414Go to the 415 416 A TiXmlText. This is a leaf node and cannot contain other nodes. 417 It is a child of the "Item" TiXmlElement. 418 419@verbatim 420<bold> 421@endverbatim 422 423 424 Another TiXmlElement, this one a child of the "Item" element. 425 426Etc. 427 428Looking at the entire object tree, you end up with: 429@verbatim 430TiXmlDocument "demo.xml" 431 TiXmlDeclaration "version='1.0'" "standalone=no" 432 TiXmlComment " Our to do list data" 433 TiXmlElement "ToDo" 434 TiXmlElement "Item" Attribtutes: priority = 1 435 TiXmlText "Go to the " 436 TiXmlElement "bold" 437 TiXmlText "Toy store!" 438 TiXmlElement "Item" Attributes: priority=2 439 TiXmlText "Do bills" 440@endverbatim 441 442<h2> Documentation </h2> 443 444The documentation is build with Doxygen, using the 'dox' 445configuration file. 446 447<h2> License </h2> 448 449TinyXml is released under the zlib license: 450 451This software is provided 'as-is', without any express or implied 452warranty. In no event will the authors be held liable for any 453damages arising from the use of this software. 454 455Permission is granted to anyone to use this software for any 456purpose, including commercial applications, and to alter it and 457redistribute it freely, subject to the following restrictions: 458 4591. The origin of this software must not be misrepresented; you must 460not claim that you wrote the original software. If you use this 461software in a product, an acknowledgment in the product documentation 462would be appreciated but is not required. 463 4642. Altered source versions must be plainly marked as such, and 465must not be misrepresented as being the original software. 466 4673. This notice may not be removed or altered from any source 468distribution. 469 470<h2> References </h2> 471 472The World Wide Web Consortium is the definitive standard body for 473XML, and there web pages contain huge amounts of information. 474 475The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/"> 476http://www.w3.org/TR/2004/REC-xml-20040204/</a> 477 478I also recommend "XML Pocket Reference" by Robert Eckstein and published by 479OReilly...the book that got the whole thing started. 480 481<h2> Contributors, Contacts, and a Brief History </h2> 482 483Thanks very much to everyone who sends suggestions, bugs, ideas, and 484encouragement. It all helps, and makes this project fun. A special thanks 485to the contributors on the web pages that keep it lively. 486 487So many people have sent in bugs and ideas, that rather than list here 488we try to give credit due in the "changes.txt" file. 489 490TinyXml was originally written be Lee Thomason. (Often the "I" still 491in the documenation.) Lee reviews changes and releases new versions, 492with the help of Yves Berquin and the tinyXml community. 493 494We appreciate your suggestions, and would love to know if you 495use TinyXml. Hopefully you will enjoy it and find it useful. 496Please post questions, comments, file bugs, or contact us at: 497 498www.sourceforge.net/projects/tinyxml 499 500Lee Thomason, 501Yves Berquin 502*/ 503