WebOb/docs/comment-example.txt

Comment Example
===============

.. contents::

Introduction
------------

This is an example of how to write WSGI middleware with WebOb.  The
specific example adds a simple comment form to HTML web pages; any
page served through the middleware that is HTML gets a comment form
added to it, and shows any existing comments.

Code
----

The finished code for this is available in
`docs/comment-example-code/example.py
<https://github.com/Pylons/webob/blob/master/docs/comment-example-code/example.py>`_
-- you can run that file as a script to try it out.

Instantiating Middleware
------------------------

Middleware of any complexity at all is usually best created as a
class with its configuration as arguments to that class.

Every middleware needs an application (``app``) that it wraps.  This
middleware also needs a location to store the comments; we'll put them
all in a single directory.

.. code-block:: python

    import os

    class Commenter(object):
        def __init__(self, app, storage_dir):
            self.app = app
            self.storage_dir = storage_dir
            if not os.path.exists(storage_dir):
                os.makedirs(storage_dir)

When you use this middleware, you'll use it like:

.. code-block:: python

    app = ... make the application ...
    app = Commenter(app, storage_dir='./comments')

For our application we'll use a simple static file server that is
included with `Paste <http://pythonpaste.org>`_ (use ``easy_install
Paste`` to install this).  The setup is all at the bottom of
``example.py``, and looks like this:

.. code-block:: python

    if __name__ == '__main__':
        import optparse
        parser = optparse.OptionParser(
            usage='%prog --port=PORT BASE_DIRECTORY'
            )
        parser.add_option(
            '-p', '--port',
            default='8080',
            dest='port',
            type='int',
            help='Port to serve on (default 8080)')
        parser.add_option(
            '--comment-data',
            default='./comments',
            dest='comment_data',
            help='Place to put comment data into (default ./comments/)')
        options, args = parser.parse_args()
        if not args:
            parser.error('You must give a BASE_DIRECTORY')
        base_dir = args[0]
        from paste.urlparser import StaticURLParser
        app = StaticURLParser(base_dir)
        app = Commenter(app, options.comment_data)
        from wsgiref.simple_server import make_server
        httpd = make_server('localhost', options.port, app)
        print 'Serving on http://localhost:%s' % options.port
        try:
            httpd.serve_forever()
        except KeyboardInterrupt:
            print '^C'

I won't explain it here, but basically it takes some options, creates
an application that serves static files
(``StaticURLParser(base_dir)``), wraps it with ``Commenter(app,
options.comment_data)`` then serves that.

The Middleware
--------------

While we've created the class structure for the middleware, it doesn't
actually do anything.  Here's a kind of minimal version of the
middleware (using WebOb):

.. code-block:: python

    from webob import Request

    class Commenter(object):

        def __init__(self, app, storage_dir):
            self.app = app
            self.storage_dir = storage_dir
            if not os.path.exists(storage_dir):
                os.makedirs(storage_dir)

        def __call__(self, environ, start_response):
            req = Request(environ)
            resp = req.get_response(self.app)
            return resp(environ, start_response)

This doesn't modify the response it any way.  You could write it like
this without WebOb:

.. code-block:: python

    class Commenter(object):
        ...
        def __call__(self, environ, start_response):
            return self.app(environ, start_response)

But it won't be as convenient later.  First, lets create a little bit
of infrastructure for our middleware.  We need to save and load
per-url data (the comments themselves).  We'll keep them in pickles,
where each url has a pickle named after the url (but double-quoted, so
``http://localhost:8080/index.html`` becomes
``http%3A%2F%2Flocalhost%3A8080%2Findex.html``).

.. code-block:: python

    from cPickle import load, dump

    class Commenter(object):
        ...

        def get_data(self, url):
            filename = self.url_filename(url)
            if not os.path.exists(filename):
                return []
            else:
                f = open(filename, 'rb')
                data = load(f)
                f.close()
                return data

        def save_data(self, url, data):
            filename = self.url_filename(url)
            f = open(filename, 'wb')
            dump(data, f)
            f.close()

        def url_filename(self, url):
            # Double-quoting makes the filename safe
            return os.path.join(self.storage_dir, urllib.quote(url, ''))

You can get the full request URL with ``req.url``, so to get the
comment data with these methods you do ``data =
self.get_data(req.url)``.

Now we'll update the ``__call__`` method to filter *some* responses,
and get the comment data for those.  We don't want to change responses
that were error responses (anything but ``200``), nor do we want to
filter responses that aren't HTML.  So we get:

.. code-block:: python

    class Commenter(object):
        ...

        def __call__(self, environ, start_response):
            req = Request(environ)
            resp = req.get_response(self.app)
            if resp.content_type != 'text/html' or resp.status_code != 200:
                return resp(environ, start_response)
            data = self.get_data(req.url)
            ... do stuff with data, update resp ...
            return resp(environ, start_response)

So far we're punting on actually adding the comments to the page.  We
also haven't defined what ``data`` will hold.  Let's say it's a list
of dictionaries, where each dictionary looks like ``{'name': 'John
Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great
site!'}``.

We'll also need a simple method to add stuff to the page.  We'll use a
regular expression to find the end of the page and put text in:

.. code-block:: python

    import re

    class Commenter(object):
        ...

        _end_body_re = re.compile(r'</body.*?>', re.I|re.S)

        def add_to_end(self, html, extra_html):
            """
            Adds extra_html to the end of the html page (before </body>)
            """
            match = self._end_body_re.search(html)
            if not match:
                return html + extra_html
            else:
                return html[:match.start()] + extra_html + html[match.start():]

And then we'll use it like:

.. code-block:: python

    data = self.get_data(req.url)
    body = resp.body
    body = self.add_to_end(body, self.format_comments(data))
    resp.body = body
    return resp(environ, start_response)

We get the body, update it, and put it back in the response.  This
also updates ``Content-Length``.  Then we define:

.. code-block:: python

    from webob import html_escape

    class Commenter(object):
        ...

        def format_comments(self, comments):
            if not comments:
                return ''
            text = []
            text.append('<hr>')
            text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments))
            for comment in comments:
                text.append('<h3><a href="%s">%s</a> at %s:</h3>' % (
                    html_escape(comment['homepage']), html_escape(comment['name']),
                    time.strftime('%c', comment['time'])))
                # Susceptible to XSS attacks!:
                text.append(comment['comments'])
            return ''.join(text)

We put in a header (with an anchor we'll use later), and a section for
each comment.  Note that ``html_escape`` is the same as ``cgi.escape``
and just turns ``&`` into ``&amp;``, etc.

Because we put in some text without quoting it is susceptible to a
`Cross-Site Scripting
<http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack.  Fixing
that is beyond the scope of this tutorial; you could quote it or clean
it with something like `lxml.html.clean
<http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_.

Accepting Comments
------------------

All of those pieces *display* comments, but still no one can actually
make comments.  To handle this we'll take a little piece of the URL
space for our own, everything under ``/.comments``, so when someone
POSTs there it will add a comment.

When the request comes in there are two parts to the path:
``SCRIPT_NAME`` and ``PATH_INFO``.  Everything in ``SCRIPT_NAME`` has
already been parsed, and everything in ``PATH_INFO`` has yet to be
parsed.  That means that the URL *without* ``PATH_INFO`` is the path
to the middleware; we can intercept anything else below
``SCRIPT_NAME`` but nothing above it.  The name for the URL without
``PATH_INFO`` is ``req.application_url``.  We have to capture it early
to make sure it doesn't change (since the WSGI application we are
wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``).

So here's what this all looks like:

.. code-block:: python

    class Commenter(object):
        ...

        def __call__(self, environ, start_response):
            req = Request(environ)
            if req.path_info_peek() == '.comments':
                return self.process_comment(req)(environ, start_response)
            # This is the base path of *this* middleware:
            base_url = req.application_url
            resp = req.get_response(self.app)
            if resp.content_type != 'text/html' or resp.status_code != 200:
                # Not an HTML response, we don't want to
                # do anything to it
                return resp(environ, start_response)
            # Make sure the content isn't gzipped:
            resp.decode_content()
            comments = self.get_data(req.url)
            body = resp.body
            body = self.add_to_end(body, self.format_comments(comments))
            body = self.add_to_end(body, self.submit_form(base_url, req))
            resp.body = body
            return resp(environ, start_response)

``base_url`` is the path where the middleware is located (if you run
the example server, it will be ``http://localhost:PORT/``).  We use
``req.path_info_peek()`` to look at the next segment of the URL --
what comes after base_url.  If it is ``.comments`` then we handle it
internally and don't pass the request on.

We also put in a little guard, ``resp.decode_content()`` in case the
application returns a gzipped response.

Then we get the data, add the comments, add the *form* to make new
comments, and return the result.

submit_form
~~~~~~~~~~~

Here's what the form looks like:

.. code-block:: python

    class Commenter(object):
        ...

        def submit_form(self, base_path, req):
            return '''<h2>Leave a comment:</h2>
            <form action="%s/.comments" method="POST">
             <input type="hidden" name="url" value="%s">
             <table width="100%%">
              <tr><td>Name:</td>
                  <td><input type="text" name="name" style="width: 100%%"></td></tr>
              <tr><td>URL:</td>
                  <td><input type="text" name="homepage" style="width: 100%%"></td></tr>
             </table>
             Comments:<br>
             <textarea name="comments" rows=10 style="width: 100%%"></textarea><br>
             <input type="submit" value="Submit comment">
            </form>
            ''' % (base_path, html_escape(req.url))

Nothing too exciting.  It submits a form with the keys ``url`` (the
URL being commented on), ``name``, ``homepage``, and ``comments``.

process_comment
~~~~~~~~~~~~~~~

If you look at the method call, what we do is call the method then
treat the result as a WSGI application:

.. code-block:: python

    return self.process_comment(req)(environ, start_response)

You could write this as:

.. code-block:: python

    response = self.process_comment(req)
    return response(environ, start_response)

A common pattern in WSGI middleware that *doesn't* use WebOb is to
just do:

.. code-block:: python

    return self.process_comment(environ, start_response)

But the WebOb style makes it easier to modify the response if you want
to; modifying a traditional WSGI response/application output requires
changing your logic flow considerably.

Here's the actual processing code:

.. code-block:: python

    from webob import exc
    from webob import Response

    class Commenter(object):
        ...

        def process_comment(self, req):
            try:
                url = req.params['url']
                name = req.params['name']
                homepage = req.params['homepage']
                comments = req.params['comments']
            except KeyError, e:
                resp = exc.HTTPBadRequest('Missing parameter: %s' % e)
                return resp
            data = self.get_data(url)
            data.append(dict(
                name=name,
                homepage=homepage,
                comments=comments,
                time=time.gmtime()))
            self.save_data(url, data)
            resp = exc.HTTPSeeOther(location=url+'#comment-area')
            return resp

We either give a Bad Request response (if the form submission is
somehow malformed), or a redirect back to the original page.

The classes in ``webob.exc`` (like ``HTTPBadRequest`` and
``HTTPSeeOther``) are Response subclasses that can be used to quickly
create responses for these non-200 cases where the response body
usually doesn't matter much.

Conclusion
----------

This shows how to make response modifying middleware, which is
probably the most difficult kind of middleware to write with WSGI --
modifying the request is quite simple in comparison, as you simply
update ``environ``.