1Comment Example 2=============== 3 4.. contents:: 5 6Introduction 7------------ 8 9This is an example of how to write WSGI middleware with WebOb. The 10specific example adds a simple comment form to HTML web pages; any 11page served through the middleware that is HTML gets a comment form 12added to it, and shows any existing comments. 13 14Code 15---- 16 17The finished code for this is available in 18`docs/comment-example-code/example.py 19<https://github.com/Pylons/webob/blob/master/docs/comment-example-code/example.py>`_ 20-- you can run that file as a script to try it out. 21 22Instantiating Middleware 23------------------------ 24 25Middleware of any complexity at all is usually best created as a 26class with its configuration as arguments to that class. 27 28Every middleware needs an application (``app``) that it wraps. This 29middleware also needs a location to store the comments; we'll put them 30all in a single directory. 31 32.. code-block:: python 33 34 import os 35 36 class Commenter(object): 37 def __init__(self, app, storage_dir): 38 self.app = app 39 self.storage_dir = storage_dir 40 if not os.path.exists(storage_dir): 41 os.makedirs(storage_dir) 42 43When you use this middleware, you'll use it like: 44 45.. code-block:: python 46 47 app = ... make the application ... 48 app = Commenter(app, storage_dir='./comments') 49 50For our application we'll use a simple static file server that is 51included with `Paste <http://pythonpaste.org>`_ (use ``easy_install 52Paste`` to install this). The setup is all at the bottom of 53``example.py``, and looks like this: 54 55.. code-block:: python 56 57 if __name__ == '__main__': 58 import optparse 59 parser = optparse.OptionParser( 60 usage='%prog --port=PORT BASE_DIRECTORY' 61 ) 62 parser.add_option( 63 '-p', '--port', 64 default='8080', 65 dest='port', 66 type='int', 67 help='Port to serve on (default 8080)') 68 parser.add_option( 69 '--comment-data', 70 default='./comments', 71 dest='comment_data', 72 help='Place to put comment data into (default ./comments/)') 73 options, args = parser.parse_args() 74 if not args: 75 parser.error('You must give a BASE_DIRECTORY') 76 base_dir = args[0] 77 from paste.urlparser import StaticURLParser 78 app = StaticURLParser(base_dir) 79 app = Commenter(app, options.comment_data) 80 from wsgiref.simple_server import make_server 81 httpd = make_server('localhost', options.port, app) 82 print 'Serving on http://localhost:%s' % options.port 83 try: 84 httpd.serve_forever() 85 except KeyboardInterrupt: 86 print '^C' 87 88I won't explain it here, but basically it takes some options, creates 89an application that serves static files 90(``StaticURLParser(base_dir)``), wraps it with ``Commenter(app, 91options.comment_data)`` then serves that. 92 93The Middleware 94-------------- 95 96While we've created the class structure for the middleware, it doesn't 97actually do anything. Here's a kind of minimal version of the 98middleware (using WebOb): 99 100.. code-block:: python 101 102 from webob import Request 103 104 class Commenter(object): 105 106 def __init__(self, app, storage_dir): 107 self.app = app 108 self.storage_dir = storage_dir 109 if not os.path.exists(storage_dir): 110 os.makedirs(storage_dir) 111 112 def __call__(self, environ, start_response): 113 req = Request(environ) 114 resp = req.get_response(self.app) 115 return resp(environ, start_response) 116 117This doesn't modify the response it any way. You could write it like 118this without WebOb: 119 120.. code-block:: python 121 122 class Commenter(object): 123 ... 124 def __call__(self, environ, start_response): 125 return self.app(environ, start_response) 126 127But it won't be as convenient later. First, lets create a little bit 128of infrastructure for our middleware. We need to save and load 129per-url data (the comments themselves). We'll keep them in pickles, 130where each url has a pickle named after the url (but double-quoted, so 131``http://localhost:8080/index.html`` becomes 132``http%3A%2F%2Flocalhost%3A8080%2Findex.html``). 133 134.. code-block:: python 135 136 from cPickle import load, dump 137 138 class Commenter(object): 139 ... 140 141 def get_data(self, url): 142 filename = self.url_filename(url) 143 if not os.path.exists(filename): 144 return [] 145 else: 146 f = open(filename, 'rb') 147 data = load(f) 148 f.close() 149 return data 150 151 def save_data(self, url, data): 152 filename = self.url_filename(url) 153 f = open(filename, 'wb') 154 dump(data, f) 155 f.close() 156 157 def url_filename(self, url): 158 # Double-quoting makes the filename safe 159 return os.path.join(self.storage_dir, urllib.quote(url, '')) 160 161You can get the full request URL with ``req.url``, so to get the 162comment data with these methods you do ``data = 163self.get_data(req.url)``. 164 165Now we'll update the ``__call__`` method to filter *some* responses, 166and get the comment data for those. We don't want to change responses 167that were error responses (anything but ``200``), nor do we want to 168filter responses that aren't HTML. So we get: 169 170.. code-block:: python 171 172 class Commenter(object): 173 ... 174 175 def __call__(self, environ, start_response): 176 req = Request(environ) 177 resp = req.get_response(self.app) 178 if resp.content_type != 'text/html' or resp.status_code != 200: 179 return resp(environ, start_response) 180 data = self.get_data(req.url) 181 ... do stuff with data, update resp ... 182 return resp(environ, start_response) 183 184So far we're punting on actually adding the comments to the page. We 185also haven't defined what ``data`` will hold. Let's say it's a list 186of dictionaries, where each dictionary looks like ``{'name': 'John 187Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great 188site!'}``. 189 190We'll also need a simple method to add stuff to the page. We'll use a 191regular expression to find the end of the page and put text in: 192 193.. code-block:: python 194 195 import re 196 197 class Commenter(object): 198 ... 199 200 _end_body_re = re.compile(r'</body.*?>', re.I|re.S) 201 202 def add_to_end(self, html, extra_html): 203 """ 204 Adds extra_html to the end of the html page (before </body>) 205 """ 206 match = self._end_body_re.search(html) 207 if not match: 208 return html + extra_html 209 else: 210 return html[:match.start()] + extra_html + html[match.start():] 211 212And then we'll use it like: 213 214.. code-block:: python 215 216 data = self.get_data(req.url) 217 body = resp.body 218 body = self.add_to_end(body, self.format_comments(data)) 219 resp.body = body 220 return resp(environ, start_response) 221 222We get the body, update it, and put it back in the response. This 223also updates ``Content-Length``. Then we define: 224 225.. code-block:: python 226 227 from webob import html_escape 228 229 class Commenter(object): 230 ... 231 232 def format_comments(self, comments): 233 if not comments: 234 return '' 235 text = [] 236 text.append('<hr>') 237 text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments)) 238 for comment in comments: 239 text.append('<h3><a href="%s">%s</a> at %s:</h3>' % ( 240 html_escape(comment['homepage']), html_escape(comment['name']), 241 time.strftime('%c', comment['time']))) 242 # Susceptible to XSS attacks!: 243 text.append(comment['comments']) 244 return ''.join(text) 245 246We put in a header (with an anchor we'll use later), and a section for 247each comment. Note that ``html_escape`` is the same as ``cgi.escape`` 248and just turns ``&`` into ``&``, etc. 249 250Because we put in some text without quoting it is susceptible to a 251`Cross-Site Scripting 252<http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack. Fixing 253that is beyond the scope of this tutorial; you could quote it or clean 254it with something like `lxml.html.clean 255<http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_. 256 257Accepting Comments 258------------------ 259 260All of those pieces *display* comments, but still no one can actually 261make comments. To handle this we'll take a little piece of the URL 262space for our own, everything under ``/.comments``, so when someone 263POSTs there it will add a comment. 264 265When the request comes in there are two parts to the path: 266``SCRIPT_NAME`` and ``PATH_INFO``. Everything in ``SCRIPT_NAME`` has 267already been parsed, and everything in ``PATH_INFO`` has yet to be 268parsed. That means that the URL *without* ``PATH_INFO`` is the path 269to the middleware; we can intercept anything else below 270``SCRIPT_NAME`` but nothing above it. The name for the URL without 271``PATH_INFO`` is ``req.application_url``. We have to capture it early 272to make sure it doesn't change (since the WSGI application we are 273wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``). 274 275So here's what this all looks like: 276 277.. code-block:: python 278 279 class Commenter(object): 280 ... 281 282 def __call__(self, environ, start_response): 283 req = Request(environ) 284 if req.path_info_peek() == '.comments': 285 return self.process_comment(req)(environ, start_response) 286 # This is the base path of *this* middleware: 287 base_url = req.application_url 288 resp = req.get_response(self.app) 289 if resp.content_type != 'text/html' or resp.status_code != 200: 290 # Not an HTML response, we don't want to 291 # do anything to it 292 return resp(environ, start_response) 293 # Make sure the content isn't gzipped: 294 resp.decode_content() 295 comments = self.get_data(req.url) 296 body = resp.body 297 body = self.add_to_end(body, self.format_comments(comments)) 298 body = self.add_to_end(body, self.submit_form(base_url, req)) 299 resp.body = body 300 return resp(environ, start_response) 301 302``base_url`` is the path where the middleware is located (if you run 303the example server, it will be ``http://localhost:PORT/``). We use 304``req.path_info_peek()`` to look at the next segment of the URL -- 305what comes after base_url. If it is ``.comments`` then we handle it 306internally and don't pass the request on. 307 308We also put in a little guard, ``resp.decode_content()`` in case the 309application returns a gzipped response. 310 311Then we get the data, add the comments, add the *form* to make new 312comments, and return the result. 313 314submit_form 315~~~~~~~~~~~ 316 317Here's what the form looks like: 318 319.. code-block:: python 320 321 class Commenter(object): 322 ... 323 324 def submit_form(self, base_path, req): 325 return '''<h2>Leave a comment:</h2> 326 <form action="%s/.comments" method="POST"> 327 <input type="hidden" name="url" value="%s"> 328 <table width="100%%"> 329 <tr><td>Name:</td> 330 <td><input type="text" name="name" style="width: 100%%"></td></tr> 331 <tr><td>URL:</td> 332 <td><input type="text" name="homepage" style="width: 100%%"></td></tr> 333 </table> 334 Comments:<br> 335 <textarea name="comments" rows=10 style="width: 100%%"></textarea><br> 336 <input type="submit" value="Submit comment"> 337 </form> 338 ''' % (base_path, html_escape(req.url)) 339 340Nothing too exciting. It submits a form with the keys ``url`` (the 341URL being commented on), ``name``, ``homepage``, and ``comments``. 342 343process_comment 344~~~~~~~~~~~~~~~ 345 346If you look at the method call, what we do is call the method then 347treat the result as a WSGI application: 348 349.. code-block:: python 350 351 return self.process_comment(req)(environ, start_response) 352 353You could write this as: 354 355.. code-block:: python 356 357 response = self.process_comment(req) 358 return response(environ, start_response) 359 360A common pattern in WSGI middleware that *doesn't* use WebOb is to 361just do: 362 363.. code-block:: python 364 365 return self.process_comment(environ, start_response) 366 367But the WebOb style makes it easier to modify the response if you want 368to; modifying a traditional WSGI response/application output requires 369changing your logic flow considerably. 370 371Here's the actual processing code: 372 373.. code-block:: python 374 375 from webob import exc 376 from webob import Response 377 378 class Commenter(object): 379 ... 380 381 def process_comment(self, req): 382 try: 383 url = req.params['url'] 384 name = req.params['name'] 385 homepage = req.params['homepage'] 386 comments = req.params['comments'] 387 except KeyError, e: 388 resp = exc.HTTPBadRequest('Missing parameter: %s' % e) 389 return resp 390 data = self.get_data(url) 391 data.append(dict( 392 name=name, 393 homepage=homepage, 394 comments=comments, 395 time=time.gmtime())) 396 self.save_data(url, data) 397 resp = exc.HTTPSeeOther(location=url+'#comment-area') 398 return resp 399 400We either give a Bad Request response (if the form submission is 401somehow malformed), or a redirect back to the original page. 402 403The classes in ``webob.exc`` (like ``HTTPBadRequest`` and 404``HTTPSeeOther``) are Response subclasses that can be used to quickly 405create responses for these non-200 cases where the response body 406usually doesn't matter much. 407 408Conclusion 409---------- 410 411This shows how to make response modifying middleware, which is 412probably the most difficult kind of middleware to write with WSGI -- 413modifying the request is quite simple in comparison, as you simply 414update ``environ``. 415