1Comment Example
2===============
3
4.. contents::
5
6Introduction
7------------
8
9This is an example of how to write WSGI middleware with WebOb.  The
10specific example adds a simple comment form to HTML web pages; any
11page served through the middleware that is HTML gets a comment form
12added to it, and shows any existing comments.
13
14Code
15----
16
17The finished code for this is available in
18`docs/comment-example-code/example.py
19<https://github.com/Pylons/webob/blob/master/docs/comment-example-code/example.py>`_
20-- you can run that file as a script to try it out.
21
22Instantiating Middleware
23------------------------
24
25Middleware of any complexity at all is usually best created as a
26class with its configuration as arguments to that class.
27
28Every middleware needs an application (``app``) that it wraps.  This
29middleware also needs a location to store the comments; we'll put them
30all in a single directory.
31
32.. code-block:: python
33
34    import os
35
36    class Commenter(object):
37        def __init__(self, app, storage_dir):
38            self.app = app
39            self.storage_dir = storage_dir
40            if not os.path.exists(storage_dir):
41                os.makedirs(storage_dir)
42
43When you use this middleware, you'll use it like:
44
45.. code-block:: python
46
47    app = ... make the application ...
48    app = Commenter(app, storage_dir='./comments')
49
50For our application we'll use a simple static file server that is
51included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
52Paste`` to install this).  The setup is all at the bottom of
53``example.py``, and looks like this:
54
55.. code-block:: python
56
57    if __name__ == '__main__':
58        import optparse
59        parser = optparse.OptionParser(
60            usage='%prog --port=PORT BASE_DIRECTORY'
61            )
62        parser.add_option(
63            '-p', '--port',
64            default='8080',
65            dest='port',
66            type='int',
67            help='Port to serve on (default 8080)')
68        parser.add_option(
69            '--comment-data',
70            default='./comments',
71            dest='comment_data',
72            help='Place to put comment data into (default ./comments/)')
73        options, args = parser.parse_args()
74        if not args:
75            parser.error('You must give a BASE_DIRECTORY')
76        base_dir = args[0]
77        from paste.urlparser import StaticURLParser
78        app = StaticURLParser(base_dir)
79        app = Commenter(app, options.comment_data)
80        from wsgiref.simple_server import make_server
81        httpd = make_server('localhost', options.port, app)
82        print 'Serving on http://localhost:%s' % options.port
83        try:
84            httpd.serve_forever()
85        except KeyboardInterrupt:
86            print '^C'
87
88I won't explain it here, but basically it takes some options, creates
89an application that serves static files
90(``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
91options.comment_data)`` then serves that.
92
93The Middleware
94--------------
95
96While we've created the class structure for the middleware, it doesn't
97actually do anything.  Here's a kind of minimal version of the
98middleware (using WebOb):
99
100.. code-block:: python
101
102    from webob import Request
103
104    class Commenter(object):
105
106        def __init__(self, app, storage_dir):
107            self.app = app
108            self.storage_dir = storage_dir
109            if not os.path.exists(storage_dir):
110                os.makedirs(storage_dir)
111
112        def __call__(self, environ, start_response):
113            req = Request(environ)
114            resp = req.get_response(self.app)
115            return resp(environ, start_response)
116
117This doesn't modify the response it any way.  You could write it like
118this without WebOb:
119
120.. code-block:: python
121
122    class Commenter(object):
123        ...
124        def __call__(self, environ, start_response):
125            return self.app(environ, start_response)
126
127But it won't be as convenient later.  First, lets create a little bit
128of infrastructure for our middleware.  We need to save and load
129per-url data (the comments themselves).  We'll keep them in pickles,
130where each url has a pickle named after the url (but double-quoted, so
131``http://localhost:8080/index.html`` becomes
132``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).
133
134.. code-block:: python
135
136    from cPickle import load, dump
137
138    class Commenter(object):
139        ...
140
141        def get_data(self, url):
142            filename = self.url_filename(url)
143            if not os.path.exists(filename):
144                return []
145            else:
146                f = open(filename, 'rb')
147                data = load(f)
148                f.close()
149                return data
150
151        def save_data(self, url, data):
152            filename = self.url_filename(url)
153            f = open(filename, 'wb')
154            dump(data, f)
155            f.close()
156
157        def url_filename(self, url):
158            # Double-quoting makes the filename safe
159            return os.path.join(self.storage_dir, urllib.quote(url, ''))
160
161You can get the full request URL with ``req.url``, so to get the
162comment data with these methods you do ``data =
163self.get_data(req.url)``.
164
165Now we'll update the ``__call__`` method to filter *some* responses,
166and get the comment data for those.  We don't want to change responses
167that were error responses (anything but ``200``), nor do we want to
168filter responses that aren't HTML.  So we get:
169
170.. code-block:: python
171
172    class Commenter(object):
173        ...
174
175        def __call__(self, environ, start_response):
176            req = Request(environ)
177            resp = req.get_response(self.app)
178            if resp.content_type != 'text/html' or resp.status_code != 200:
179                return resp(environ, start_response)
180            data = self.get_data(req.url)
181            ... do stuff with data, update resp ...
182            return resp(environ, start_response)
183
184So far we're punting on actually adding the comments to the page.  We
185also haven't defined what ``data`` will hold.  Let's say it's a list
186of dictionaries, where each dictionary looks like ``{'name': 'John
187Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
188site!'}``.
189
190We'll also need a simple method to add stuff to the page.  We'll use a
191regular expression to find the end of the page and put text in:
192
193.. code-block:: python
194
195    import re
196
197    class Commenter(object):
198        ...
199
200        _end_body_re = re.compile(r'</body.*?>', re.I|re.S)
201
202        def add_to_end(self, html, extra_html):
203            """
204            Adds extra_html to the end of the html page (before </body>)
205            """
206            match = self._end_body_re.search(html)
207            if not match:
208                return html + extra_html
209            else:
210                return html[:match.start()] + extra_html + html[match.start():]
211
212And then we'll use it like:
213
214.. code-block:: python
215
216    data = self.get_data(req.url)
217    body = resp.body
218    body = self.add_to_end(body, self.format_comments(data))
219    resp.body = body
220    return resp(environ, start_response)
221
222We get the body, update it, and put it back in the response.  This
223also updates ``Content-Length``.  Then we define:
224
225.. code-block:: python
226
227    from webob import html_escape
228
229    class Commenter(object):
230        ...
231
232        def format_comments(self, comments):
233            if not comments:
234                return ''
235            text = []
236            text.append('<hr>')
237            text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
238            for comment in comments:
239                text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
240                    html_escape(comment['homepage']), html_escape(comment['name']),
241                    time.strftime('%c', comment['time'])))
242                # Susceptible to XSS attacks!:
243                text.append(comment['comments'])
244            return ''.join(text)
245
246We put in a header (with an anchor we'll use later), and a section for
247each comment.  Note that ``html_escape`` is the same as ``cgi.escape``
248and just turns ``&`` into ``&amp;``, etc.
249
250Because we put in some text without quoting it is susceptible to a
251`Cross-Site Scripting
252<http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack.  Fixing
253that is beyond the scope of this tutorial; you could quote it or clean
254it with something like `lxml.html.clean
255<http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.
256
257Accepting Comments
258------------------
259
260All of those pieces *display* comments, but still no one can actually
261make comments.  To handle this we'll take a little piece of the URL
262space for our own, everything under ``/.comments``, so when someone
263POSTs there it will add a comment.
264
265When the request comes in there are two parts to the path:
266``SCRIPT_NAME`` and ``PATH_INFO``.  Everything in ``SCRIPT_NAME`` has
267already been parsed, and everything in ``PATH_INFO`` has yet to be
268parsed.  That means that the URL *without* ``PATH_INFO`` is the path
269to the middleware; we can intercept anything else below
270``SCRIPT_NAME`` but nothing above it.  The name for the URL without
271``PATH_INFO`` is ``req.application_url``.  We have to capture it early
272to make sure it doesn't change (since the WSGI application we are
273wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).
274
275So here's what this all looks like:
276
277.. code-block:: python
278
279    class Commenter(object):
280        ...
281
282        def __call__(self, environ, start_response):
283            req = Request(environ)
284            if req.path_info_peek() == '.comments':
285                return self.process_comment(req)(environ, start_response)
286            # This is the base path of *this* middleware:
287            base_url = req.application_url
288            resp = req.get_response(self.app)
289            if resp.content_type != 'text/html' or resp.status_code != 200:
290                # Not an HTML response, we don't want to
291                # do anything to it
292                return resp(environ, start_response)
293            # Make sure the content isn't gzipped:
294            resp.decode_content()
295            comments = self.get_data(req.url)
296            body = resp.body
297            body = self.add_to_end(body, self.format_comments(comments))
298            body = self.add_to_end(body, self.submit_form(base_url, req))
299            resp.body = body
300            return resp(environ, start_response)
301
302``base_url`` is the path where the middleware is located (if you run
303the example server, it will be ``http://localhost:PORT/``).  We use
304``req.path_info_peek()`` to look at the next segment of the URL --
305what comes after base_url.  If it is ``.comments`` then we handle it
306internally and don't pass the request on.
307
308We also put in a little guard, ``resp.decode_content()`` in case the
309application returns a gzipped response.
310
311Then we get the data, add the comments, add the *form* to make new
312comments, and return the result.
313
314submit_form
315~~~~~~~~~~~
316
317Here's what the form looks like:
318
319.. code-block:: python
320
321    class Commenter(object):
322        ...
323
324        def submit_form(self, base_path, req):
325            return '''<h2>Leave a comment:</h2>
326            <form action="%s/.comments" method="POST">
327             <input type="hidden" name="url" value="%s">
328             <table width="100%%">
329              <tr><td>Name:</td>
330                  <td><input type="text" name="name" style="width: 100%%"></td></tr>
331              <tr><td>URL:</td>
332                  <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
333             </table>
334             Comments:<br>
335             <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
336             <input type="submit" value="Submit comment">
337            </form>
338            ''' % (base_path, html_escape(req.url))
339
340Nothing too exciting.  It submits a form with the keys ``url`` (the
341URL being commented on), ``name``, ``homepage``, and ``comments``.
342
343process_comment
344~~~~~~~~~~~~~~~
345
346If you look at the method call, what we do is call the method then
347treat the result as a WSGI application:
348
349.. code-block:: python
350
351    return self.process_comment(req)(environ, start_response)
352
353You could write this as:
354
355.. code-block:: python
356
357    response = self.process_comment(req)
358    return response(environ, start_response)
359
360A common pattern in WSGI middleware that *doesn't* use WebOb is to
361just do:
362
363.. code-block:: python
364
365    return self.process_comment(environ, start_response)
366
367But the WebOb style makes it easier to modify the response if you want
368to; modifying a traditional WSGI response/application output requires
369changing your logic flow considerably.
370
371Here's the actual processing code:
372
373.. code-block:: python
374
375    from webob import exc
376    from webob import Response
377
378    class Commenter(object):
379        ...
380
381        def process_comment(self, req):
382            try:
383                url = req.params['url']
384                name = req.params['name']
385                homepage = req.params['homepage']
386                comments = req.params['comments']
387            except KeyError, e:
388                resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
389                return resp
390            data = self.get_data(url)
391            data.append(dict(
392                name=name,
393                homepage=homepage,
394                comments=comments,
395                time=time.gmtime()))
396            self.save_data(url, data)
397            resp = exc.HTTPSeeOther(location=url+'#comment-area')
398            return resp
399
400We either give a Bad Request response (if the form submission is
401somehow malformed), or a redirect back to the original page.
402
403The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
404``HTTPSeeOther``) are Response subclasses that can be used to quickly
405create responses for these non-200 cases where the response body
406usually doesn't matter much.
407
408Conclusion
409----------
410
411This shows how to make response modifying middleware, which is
412probably the most difficult kind of middleware to write with WSGI --
413modifying the request is quite simple in comparison, as you simply
414update ``environ``.
415