1page.title=Parsing XML Data
2parent.title=Performing Network Operations
3parent.link=index.html
4
5trainingnavtop=true
6
7previous.title=Managing Network Usage
8previous.link=managing.html
9
10@jd:body
11
12<div id="tb-wrapper">
13<div id="tb">
14
15
16
17<h2>This lesson teaches you to</h2>
18<ol>
19  <li><a href="#choose">Choose a Parser</a></li>
20  <li><a href="#analyze">Analyze the Feed</a></li>
21  <li><a href="#instantiate">Instantiate the Parser</a></li>
22  <li><a href="#read">Read the Feed</a></li>
23  <li><a href="#parse">Parse XML</a></li>
24  <li><a href="#skip">Skip Tags You Don't Care About</a></li>
25  <li><a href="#consume">Consume XML Data</a></li>
26</ol>
27
28<h2>You should also read</h2>
29<ul>
30  <li><a href="{@docRoot}guide/webapps/index.html">Web Apps Overview</a></li>
31</ul>
32
33<h2>Try it out</h2>
34
35<div class="download-box">
36  <a href="{@docRoot}shareables/training/NetworkUsage.zip"
37class="button">Download the sample</a>
38 <p class="filename">NetworkUsage.zip</p>
39</div>
40
41</div>
42</div>
43
44<p>Extensible Markup Language (XML) is a set of rules for encoding documents in
45machine-readable form. XML is a popular format for sharing data on the internet.
46Websites that frequently update their content, such as news sites or blogs,
47often provide an XML feed so that external programs can keep abreast of content
48changes. Uploading and parsing XML data is a common task for network-connected
49apps. This lesson explains how to parse XML documents and use their data.</p>
50
51<h2 id="choose">Choose a Parser</h2>
52
53<p>We recommend {@link org.xmlpull.v1.XmlPullParser}, which is an efficient and
54maintainable way to parse XML on Android. Historically Android has had two
55implementations of this interface:</p>
56
57<ul>
58  <li><a href="http://kxml.sourceforge.net/"><code>KXmlParser</code></a>
59  via {@link org.xmlpull.v1.XmlPullParserFactory#newPullParser XmlPullParserFactory.newPullParser()}.
60  </li>
61  <li><code>ExpatPullParser</code>, via
62  {@link android.util.Xml#newPullParser Xml.newPullParser()}.
63  </li>
64</ul>
65
66<p>Either choice is fine. The
67example in this section uses <code>ExpatPullParser</code>, via
68{@link android.util.Xml#newPullParser Xml.newPullParser()}. </p>
69
70<h2 id="analyze">Analyze the Feed</h2>
71
72<p>The first step in parsing a feed is to decide which fields you're interested in.
73The parser extracts data for those fields and ignores the rest.</p>
74
75<p>Here is an excerpt from the feed that's being parsed in the sample app. Each
76post to <a href="http://stackoverflow.com">StackOverflow.com</a> appears in the
77feed as an <code>entry</code> tag that contains several nested tags:</p>
78
79<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;
80&lt;feed xmlns=&quot;http://www.w3.org/2005/Atom&quot; xmlns:creativeCommons=&quot;http://backend.userland.com/creativeCommonsRssModule&quot; ...&quot;&gt;
81&lt;title type=&quot;text&quot;&gt;newest questions tagged android - Stack Overflow&lt;/title&gt;
82...
83    &lt;entry&gt;
84    ...
85    &lt;/entry&gt;
86    &lt;entry&gt;
87        &lt;id&gt;http://stackoverflow.com/q/9439999&lt;/id&gt;
88        &lt;re:rank scheme="http://stackoverflow.com"&gt;0&lt;/re:rank&gt;
89        &lt;title type="text"&gt;Where is my data file?&lt;/title&gt;
90        &lt;category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&amp;sort=newest/tags" term="android"/&gt;
91        &lt;category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&amp;sort=newest/tags" term="file"/&gt;
92        &lt;author&gt;
93            &lt;name&gt;cliff2310&lt;/name&gt;
94            &lt;uri&gt;http://stackoverflow.com/users/1128925&lt;/uri&gt;
95        &lt;/author&gt;
96        &lt;link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /&gt;
97        &lt;published&gt;2012-02-25T00:30:54Z&lt;/published&gt;
98        &lt;updated&gt;2012-02-25T00:30:54Z&lt;/updated&gt;
99        &lt;summary type="html"&gt;
100            &lt;p&gt;I have an Application that requires a data file...&lt;/p&gt;
101
102        &lt;/summary&gt;
103    &lt;/entry&gt;
104    &lt;entry&gt;
105    ...
106    &lt;/entry&gt;
107...
108&lt;/feed&gt;</pre>
109
110<p>The sample app
111extracts data for the <code>entry</code> tag and its nested tags
112<code>title</code>, <code>link</code>, and <code>summary</code>.</p>
113
114
115<h2 id="instantiate">Instantiate the Parser</h2>
116
117<p>The next step is to
118instantiate a parser and kick off the parsing process. In this snippet, a parser
119is initialized to not process namespaces, and to use the provided {@link
120java.io.InputStream} as its input. It starts the parsing process with a call to
121{@link org.xmlpull.v1.XmlPullParser#nextTag() nextTag()} and invokes the
122<code>readFeed()</code> method, which extracts and processes the data the app is
123interested in:</p>
124
125<pre>public class StackOverflowXmlParser {
126    // We don't use namespaces
127    private static final String ns = null;
128
129    public List<Entry> parse(InputStream in) throws XmlPullParserException, IOException {
130        try {
131            XmlPullParser parser = Xml.newPullParser();
132            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
133            parser.setInput(in, null);
134            parser.nextTag();
135            return readFeed(parser);
136        } finally {
137            in.close();
138        }
139    }
140 ...
141}</pre>
142
143<h2 id="read">Read the Feed</h2>
144
145<p>The <code>readFeed()</code> method does the actual work of processing the
146feed. It looks for elements tagged "entry" as a starting point for recursively
147processing the feed. If a tag isn't an {@code entry} tag, it skips it. Once the whole
148feed has been recursively processed, <code>readFeed()</code> returns a {@link
149java.util.List} containing the entries (including nested data members) it
150extracted from the feed. This {@link java.util.List} is then returned by the
151parser.</p>
152
153<pre>
154private List<Entry> readFeed(XmlPullParser parser) throws XmlPullParserException, IOException {
155    List<Entry> entries = new ArrayList<Entry>();
156
157    parser.require(XmlPullParser.START_TAG, ns, "feed");
158    while (parser.next() != XmlPullParser.END_TAG) {
159        if (parser.getEventType() != XmlPullParser.START_TAG) {
160            continue;
161        }
162        String name = parser.getName();
163        // Starts by looking for the entry tag
164        if (name.equals("entry")) {
165            entries.add(readEntry(parser));
166        } else {
167            skip(parser);
168        }
169    }
170    return entries;
171}</pre>
172
173
174<h2 id="parse">Parse XML</h2>
175
176
177<p>The steps for parsing an XML feed are as follows:</p>
178<ol>
179
180  <li>As described in <a href="#analyze">Analyze the Feed</a>, identify the tags you want to include in your app. This
181example extracts data for the <code>entry</code> tag and its nested tags
182<code>title</code>, <code>link</code>, and <code>summary</code>.</li>
183
184<li>Create the following methods:</p>
185
186<ul>
187
188<li>A "read" method for each tag you're interested in. For example,
189<code>readEntry()</code>, <code>readTitle()</code>, and so on. The parser reads
190tags from the input stream. When it encounters a tag named <code>entry</code>,
191<code>title</code>,
192<code>link</code> or <code>summary</code>, it calls the appropriate method
193for that tag. Otherwise, it skips the tag.
194</li>
195
196<li>Methods to extract data for each different type of tag and to advance the
197parser to the next tag. For example:
198<ul>
199
200<li>For the <code>title</code> and <code>summary</code> tags, the parser calls
201<code>readText()</code>. This method extracts data for these tags by calling
202<code>parser.getText()</code>.</li>
203
204<li>For the <code>link</code> tag, the parser extracts data for links by first
205determining if the link is the kind
206it's interested in. Then it uses <code>parser.getAttributeValue()</code> to
207extract the link's value.</li>
208
209<li>For the <code>entry</code> tag, the parser calls <code>readEntry()</code>.
210This method parses the entry's nested tags and returns an <code>Entry</code>
211object with the data members <code>title</code>, <code>link</code>, and
212<code>summary</code>.</li>
213
214</ul>
215</li>
216<li>A helper <code>skip()</code> method that's recursive. For more discussion of this topic, see <a href="#skip">Skip Tags You Don't Care About</a>.</li>
217</ul>
218
219  </li>
220</ol>
221
222<p>This snippet shows how the parser parses entries, titles, links, and summaries.</p>
223<pre>public static class Entry {
224    public final String title;
225    public final String link;
226    public final String summary;
227
228    private Entry(String title, String summary, String link) {
229        this.title = title;
230        this.summary = summary;
231        this.link = link;
232    }
233}
234
235// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
236// to their respective &quot;read&quot; methods for processing. Otherwise, skips the tag.
237private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException {
238    parser.require(XmlPullParser.START_TAG, ns, "entry");
239    String title = null;
240    String summary = null;
241    String link = null;
242    while (parser.next() != XmlPullParser.END_TAG) {
243        if (parser.getEventType() != XmlPullParser.START_TAG) {
244            continue;
245        }
246        String name = parser.getName();
247        if (name.equals("title")) {
248            title = readTitle(parser);
249        } else if (name.equals("summary")) {
250            summary = readSummary(parser);
251        } else if (name.equals("link")) {
252            link = readLink(parser);
253        } else {
254            skip(parser);
255        }
256    }
257    return new Entry(title, summary, link);
258}
259
260// Processes title tags in the feed.
261private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException {
262    parser.require(XmlPullParser.START_TAG, ns, "title");
263    String title = readText(parser);
264    parser.require(XmlPullParser.END_TAG, ns, "title");
265    return title;
266}
267
268// Processes link tags in the feed.
269private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException {
270    String link = "";
271    parser.require(XmlPullParser.START_TAG, ns, "link");
272    String tag = parser.getName();
273    String relType = parser.getAttributeValue(null, "rel");
274    if (tag.equals("link")) {
275        if (relType.equals("alternate")){
276            link = parser.getAttributeValue(null, "href");
277            parser.nextTag();
278        }
279    }
280    parser.require(XmlPullParser.END_TAG, ns, "link");
281    return link;
282}
283
284// Processes summary tags in the feed.
285private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException {
286    parser.require(XmlPullParser.START_TAG, ns, "summary");
287    String summary = readText(parser);
288    parser.require(XmlPullParser.END_TAG, ns, "summary");
289    return summary;
290}
291
292// For the tags title and summary, extracts their text values.
293private String readText(XmlPullParser parser) throws IOException, XmlPullParserException {
294    String result = "";
295    if (parser.next() == XmlPullParser.TEXT) {
296        result = parser.getText();
297        parser.nextTag();
298    }
299    return result;
300}
301  ...
302}</pre>
303
304<h2 id="skip">Skip Tags You Don't Care About</h2>
305
306<p>One of the steps in the XML parsing described above is for the parser to skip tags it's not interested in. Here is the parser's <code>skip()</code> method:</p>
307
308<pre>
309private void skip(XmlPullParser parser) throws XmlPullParserException, IOException {
310    if (parser.getEventType() != XmlPullParser.START_TAG) {
311        throw new IllegalStateException();
312    }
313    int depth = 1;
314    while (depth != 0) {
315        switch (parser.next()) {
316        case XmlPullParser.END_TAG:
317            depth--;
318            break;
319        case XmlPullParser.START_TAG:
320            depth++;
321            break;
322        }
323    }
324 }
325</pre>
326
327<p>This is how it works:</p>
328
329<ul>
330
331<li>It throws an exception if the current event isn't a
332<code>START_TAG</code>.</li>
333
334<li>It consumes the <code>START_TAG</code>, and all events up to and including
335the matching <code>END_TAG</code>.</li>
336
337<li>To make sure that it stops at the correct <code>END_TAG</code> and not at
338the first tag it encounters after the original <code>START_TAG</code>, it keeps
339track of the nesting depth.</li>
340
341</ul>
342
343<p>Thus if the current element has nested elements, the value of
344<code>depth</code> won't be 0 until the parser has consumed all events between
345the original <code>START_TAG</code> and its matching <code>END_TAG</code>. For
346example, consider how the parser skips the <code>&lt;author&gt;</code> element,
347which has 2 nested elements, <code>&lt;name&gt;</code> and
348<code>&lt;uri&gt;</code>:</p>
349
350<ul>
351
352<li>The first time through the <code>while</code> loop, the next tag the parser
353encounters after <code>&lt;author&gt;</code> is the <code>START_TAG</code> for
354<code>&lt;name&gt;</code>. The value for <code>depth</code> is incremented to
3552.</li>
356
357<li>The second time through the <code>while</code> loop, the next tag the parser
358encounters is the <code>END_TAG</code>  <code>&lt;/name&gt;</code>. The value
359for <code>depth</code> is decremented to 1.</li>
360
361<li>The third time through the <code>while</code> loop, the next tag the parser
362encounters is the <code>START_TAG</code>  <code>&lt;uri&gt;</code>. The value
363for <code>depth</code> is incremented to 2.</li>
364
365<li>The fourth time through the <code>while</code> loop, the next tag the parser
366encounters is the <code>END_TAG</code>  <code>&lt;/uri&gt;</code>. The value for
367<code>depth</code> is decremented to 1.</li>
368
369<li>The fifth time and final time through the <code>while</code> loop, the next
370tag the parser encounters is the <code>END_TAG</code>
371<code>&lt;/author&gt;</code>. The value for <code>depth</code> is decremented to
3720, indicating that the <code>&lt;author&gt;</code> element has been successfully
373skipped.</li>
374
375</ul>
376
377<h2 id="consume">Consume XML Data</h2>
378
379<p>The example application fetches and parses the XML feed within an {@link
380android.os.AsyncTask}. This takes the processing off the main UI thread. When
381processing is complete, the app updates the UI in the main activity
382(<code>NetworkActivity</code>).</p>
383<p>In the excerpt shown below, the <code>loadPage()</code> method does the
384following:</p>
385
386<ul>
387
388  <li>Initializes a string variable with the URL for the XML feed.</li>
389
390  <li>If the user's settings and the network connection allow it, invokes
391<code>new DownloadXmlTask().execute(url)</code>. This instantiates a new
392<code>DownloadXmlTask</code> object ({@link android.os.AsyncTask} subclass) and
393runs its {@link android.os.AsyncTask#execute execute()} method, which downloads
394and parses the feed and returns a string result to be displayed in the UI.</li>
395
396</ul>
397<pre>
398public class NetworkActivity extends Activity {
399    public static final String WIFI = "Wi-Fi";
400    public static final String ANY = "Any";
401    private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest";
402
403    // Whether there is a Wi-Fi connection.
404    private static boolean wifiConnected = false;
405    // Whether there is a mobile connection.
406    private static boolean mobileConnected = false;
407    // Whether the display should be refreshed.
408    public static boolean refreshDisplay = true;
409    public static String sPref = null;
410
411    ...
412
413    // Uses AsyncTask to download the XML feed from stackoverflow.com.
414    public void loadPage() {
415
416        if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) {
417            new DownloadXmlTask().execute(URL);
418        }
419        else if ((sPref.equals(WIFI)) && (wifiConnected)) {
420            new DownloadXmlTask().execute(URL);
421        } else {
422            // show error
423        }
424    }</pre>
425
426<p>The {@link android.os.AsyncTask} subclass shown below,
427<code>DownloadXmlTask</code>, implements the following {@link
428android.os.AsyncTask} methods:</p>
429
430    <ul>
431
432      <li>{@link android.os.AsyncTask#doInBackground doInBackground()} executes
433the method <code>loadXmlFromNetwork()</code>. It passes the feed URL as a
434parameter. The method <code>loadXmlFromNetwork()</code> fetches and processes
435the feed. When it finishes, it passes back a result string.</li>
436
437      <li>{@link android.os.AsyncTask#onPostExecute onPostExecute()} takes the
438returned string and displays it in the UI.</li>
439
440    </ul>
441
442<pre>
443// Implementation of AsyncTask used to download XML feed from stackoverflow.com.
444private class DownloadXmlTask extends AsyncTask&lt;String, Void, String&gt; {
445    &#64;Override
446    protected String doInBackground(String... urls) {
447        try {
448            return loadXmlFromNetwork(urls[0]);
449        } catch (IOException e) {
450            return getResources().getString(R.string.connection_error);
451        } catch (XmlPullParserException e) {
452            return getResources().getString(R.string.xml_error);
453        }
454    }
455
456    &#64;Override
457    protected void onPostExecute(String result) {
458        setContentView(R.layout.main);
459        // Displays the HTML string in the UI via a WebView
460        WebView myWebView = (WebView) findViewById(R.id.webview);
461        myWebView.loadData(result, "text/html", null);
462    }
463}</pre>
464
465   <p>Below is the method <code>loadXmlFromNetwork()</code> that is invoked from
466<code>DownloadXmlTask</code>. It does the following:</p>
467
468   <ol>
469
470     <li>Instantiates a <code>StackOverflowXmlParser</code>. It also creates variables for
471a {@link java.util.List} of <code>Entry</code> objects (<code>entries</code>), and
472<code>title</code>, <code>url</code>, and <code>summary</code>, to hold the
473values extracted from the XML feed for those fields.</li>
474
475     <li>Calls <code>downloadUrl()</code>, which fetches the feed and returns it as
476     an {@link java.io.InputStream}.</li>
477
478     <li>Uses <code>StackOverflowXmlParser</code> to parse the {@link java.io.InputStream}.
479     <code>StackOverflowXmlParser</code> populates a
480     {@link java.util.List} of <code>entries</code> with data from the feed.</li>
481
482     <li>Processes the <code>entries</code> {@link java.util.List},
483 and combines the feed data with HTML markup.</li>
484
485     <li>Returns an HTML string that is displayed in the main activity
486UI by the {@link android.os.AsyncTask} method {@link
487android.os.AsyncTask#onPostExecute onPostExecute()}.</li>
488
489</ol>
490
491<pre>
492// Uploads XML from stackoverflow.com, parses it, and combines it with
493// HTML markup. Returns HTML string.
494private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException {
495    InputStream stream = null;
496    // Instantiate the parser
497    StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser();
498    List&lt;Entry&gt; entries = null;
499    String title = null;
500    String url = null;
501    String summary = null;
502    Calendar rightNow = Calendar.getInstance();
503    DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa");
504
505    // Checks whether the user set the preference to include summary text
506    SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this);
507    boolean pref = sharedPrefs.getBoolean("summaryPref", false);
508
509    StringBuilder htmlString = new StringBuilder();
510    htmlString.append("&lt;h3&gt;" + getResources().getString(R.string.page_title) + "&lt;/h3&gt;");
511    htmlString.append("&lt;em&gt;" + getResources().getString(R.string.updated) + " " +
512            formatter.format(rightNow.getTime()) + "&lt;/em&gt;");
513
514    try {
515        stream = downloadUrl(urlString);
516        entries = stackOverflowXmlParser.parse(stream);
517    // Makes sure that the InputStream is closed after the app is
518    // finished using it.
519    } finally {
520        if (stream != null) {
521            stream.close();
522        }
523     }
524
525    // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
526    // Each Entry object represents a single post in the XML feed.
527    // This section processes the entries list to combine each entry with HTML markup.
528    // Each entry is displayed in the UI as a link that optionally includes
529    // a text summary.
530    for (Entry entry : entries) {
531        htmlString.append("&lt;p&gt;&lt;a href='");
532        htmlString.append(entry.link);
533        htmlString.append("'&gt;" + entry.title + "&lt;/a&gt;&lt;/p&gt;");
534        // If the user set the preference to include summary text,
535        // adds it to the display.
536        if (pref) {
537            htmlString.append(entry.summary);
538        }
539    }
540    return htmlString.toString();
541}
542
543// Given a string representation of a URL, sets up a connection and gets
544// an input stream.
545private InputStream downloadUrl(String urlString) throws IOException {
546    URL url = new URL(urlString);
547    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
548    conn.setReadTimeout(10000 /* milliseconds */);
549    conn.setConnectTimeout(15000 /* milliseconds */);
550    conn.setRequestMethod("GET");
551    conn.setDoInput(true);
552    // Starts the query
553    conn.connect();
554    return conn.getInputStream();
555}</pre>
556