1# (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org)
2# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
3# (c) 2005 Ian Bicking, Clark C. Evans and contributors
4# This module is part of the Python Paste Project and is released under
5# the MIT License: http://www.opensource.org/licenses/mit-license.php
6# Some of this code was funded by: http://prometheusresearch.com
7"""
8HTTP Message Header Fields (see RFC 4229)
9
10This contains general support for HTTP/1.1 message headers [1]_ in a
11manner that supports WSGI ``environ`` [2]_ and ``response_headers``
12[3]_. Specifically, this module defines a ``HTTPHeader`` class whose
13instances correspond to field-name items.  The actual field-content for
14the message-header is stored in the appropriate WSGI collection (either
15the ``environ`` for requests, or ``response_headers`` for responses).
16
17Each ``HTTPHeader`` instance is a callable (defining ``__call__``)
18that takes one of the following:
19
20  - an ``environ`` dictionary, returning the corresponding header
21    value by according to the WSGI's ``HTTP_`` prefix mechanism, e.g.,
22    ``USER_AGENT(environ)`` returns ``environ.get('HTTP_USER_AGENT')``
23
24  - a ``response_headers`` list, giving a comma-delimited string for
25    each corresponding ``header_value`` tuple entries (see below).
26
27  - a sequence of string ``*args`` that are comma-delimited into
28    a single string value: ``CONTENT_TYPE("text/html","text/plain")``
29    returns ``"text/html, text/plain"``
30
31  - a set of ``**kwargs`` keyword arguments that are used to create
32    a header value, in a manner dependent upon the particular header in
33    question (to make value construction easier and error-free):
34    ``CONTENT_DISPOSITION(max_age=CONTENT_DISPOSITION.ONEWEEK)``
35    returns ``"public, max-age=60480"``
36
37Each ``HTTPHeader`` instance also provides several methods to act on
38a WSGI collection, for removing and setting header values.
39
40  ``delete(collection)``
41
42    This method removes all entries of the corresponding header from
43    the given collection (``environ`` or ``response_headers``), e.g.,
44    ``USER_AGENT.delete(environ)`` deletes the 'HTTP_USER_AGENT' entry
45    from the ``environ``.
46
47  ``update(collection, *args, **kwargs)``
48
49    This method does an in-place replacement of the given header entry,
50    for example: ``CONTENT_LENGTH(response_headers,len(body))``
51
52    The first argument is a valid ``environ`` dictionary or
53    ``response_headers`` list; remaining arguments are passed on to
54    ``__call__(*args, **kwargs)`` for value construction.
55
56  ``apply(collection, **kwargs)``
57
58    This method is similar to update, only that it may affect other
59    headers.  For example, according to recommendations in RFC 2616,
60    certain Cache-Control configurations should also set the
61    ``Expires`` header for HTTP/1.0 clients. By default, ``apply()``
62    is simply ``update()`` but limited to keyword arguments.
63
64This particular approach to managing headers within a WSGI collection
65has several advantages:
66
67  1. Typos in the header name are easily detected since they become a
68     ``NameError`` when executed.  The approach of using header strings
69     directly can be problematic; for example, the following should
70     return ``None`` : ``environ.get("HTTP_ACCEPT_LANGUAGES")``
71
72  2. For specific headers with validation, using ``__call__`` will
73     result in an automatic header value check.  For example, the
74     _ContentDisposition header will reject a value having ``maxage``
75     or ``max_age`` (the appropriate parameter is ``max-age`` ).
76
77  3. When appending/replacing headers, the field-name has the suggested
78     RFC capitalization (e.g. ``Content-Type`` or ``ETag``) for
79     user-agents that incorrectly use case-sensitive matches.
80
81  4. Some headers (such as ``Content-Type``) are 0, that is,
82     only one entry of this type may occur in a given set of
83     ``response_headers``.  This module knows about those cases and
84     enforces this cardinality constraint.
85
86  5. The exact details of WSGI header management are abstracted so
87     the programmer need not worry about operational differences
88     between ``environ`` dictionary or ``response_headers`` list.
89
90  6. Sorting of ``HTTPHeaders`` is done following the RFC suggestion
91     that general-headers come first, followed by request and response
92     headers, and finishing with entity-headers.
93
94  7. Special care is given to exceptional cases such as Set-Cookie
95     which violates the RFC's recommendation about combining header
96     content into a single entry using comma separation.
97
98A particular difficulty with HTTP message headers is a categorization
99of sorts as described in section 4.2:
100
101    Multiple message-header fields with the same field-name MAY be
102    present in a message if and only if the entire field-value for
103    that header field is defined as a comma-separated list [i.e.,
104    #(values)]. It MUST be possible to combine the multiple header
105    fields into one "field-name: field-value" pair, without changing
106    the semantics of the message, by appending each subsequent
107    field-value to the first, each separated by a comma.
108
109This creates three fundamentally different kinds of headers:
110
111  - Those that do not have a #(values) production, and hence are
112    singular and may only occur once in a set of response fields;
113    this case is handled by the ``_SingleValueHeader`` subclass.
114
115  - Those which have the #(values) production and follow the
116    combining rule outlined above; our ``_MultiValueHeader`` case.
117
118  - Those which are multi-valued, but cannot be combined (such as the
119    ``Set-Cookie`` header due to its ``Expires`` parameter); or where
120    combining them into a single header entry would cause common
121    user-agents to fail (``WWW-Authenticate``, ``Warning``) since
122    they fail to handle dates even when properly quoted. This case
123    is handled by ``_MultiEntryHeader``.
124
125Since this project does not have time to provide rigorous support
126and validation for all headers, it does a basic construction of
127headers listed in RFC 2616 (plus a few others) so that they can
128be obtained by simply doing ``from paste.httpheaders import *``;
129the name of the header instance is the "common name" less any
130dashes to give CamelCase style names.
131
132.. [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
133.. [2] http://www.python.org/peps/pep-0333.html#environ-variables
134.. [3] http://www.python.org/peps/pep-0333.html#the-start-response-callable
135
136"""
137import mimetypes
138import six
139from time import time as now
140try:
141    # Python 3
142    from email.utils import formatdate, parsedate_tz, mktime_tz
143    from urllib.request import AbstractDigestAuthHandler, parse_keqv_list, parse_http_list
144except ImportError:
145    # Python 2
146    from rfc822 import formatdate, parsedate_tz, mktime_tz
147    from urllib2 import AbstractDigestAuthHandler, parse_keqv_list, parse_http_list
148
149from .httpexceptions import HTTPBadRequest
150
151__all__ = ['get_header', 'list_headers', 'normalize_headers',
152           'HTTPHeader', 'EnvironVariable' ]
153
154class EnvironVariable(str):
155    """
156    a CGI ``environ`` variable as described by WSGI
157
158    This is a helper object so that standard WSGI ``environ`` variables
159    can be extracted w/o syntax error possibility.
160    """
161    def __call__(self, environ):
162        return environ.get(self,'')
163    def __repr__(self):
164        return '<EnvironVariable %s>' % self
165    def update(self, environ, value):
166        environ[self] = value
167REMOTE_USER    = EnvironVariable("REMOTE_USER")
168REMOTE_SESSION = EnvironVariable("REMOTE_SESSION")
169AUTH_TYPE      = EnvironVariable("AUTH_TYPE")
170REQUEST_METHOD = EnvironVariable("REQUEST_METHOD")
171SCRIPT_NAME    = EnvironVariable("SCRIPT_NAME")
172PATH_INFO      = EnvironVariable("PATH_INFO")
173
174for _name, _obj in six.iteritems(dict(globals())):
175    if isinstance(_obj, EnvironVariable):
176        __all__.append(_name)
177
178_headers = {}
179
180class HTTPHeader(object):
181    """
182    an HTTP header
183
184    HTTPHeader instances represent a particular ``field-name`` of an
185    HTTP message header. They do not hold a field-value, but instead
186    provide operations that work on is corresponding values.  Storage
187    of the actual field values is done with WSGI ``environ`` or
188    ``response_headers`` as appropriate.  Typically, a sub-classes that
189    represent a specific HTTP header, such as _ContentDisposition, are
190    0.  Once constructed the HTTPHeader instances themselves
191    are immutable and stateless.
192
193    For purposes of documentation a "container" refers to either a
194    WSGI ``environ`` dictionary, or a ``response_headers`` list.
195
196    Member variables (and correspondingly constructor arguments).
197
198      ``name``
199
200          the ``field-name`` of the header, in "common form"
201          as presented in RFC 2616; e.g. 'Content-Type'
202
203      ``category``
204
205          one of 'general', 'request', 'response', or 'entity'
206
207      ``version``
208
209          version of HTTP (informational) with which the header should
210          be recognized
211
212      ``sort_order``
213
214          sorting order to be applied before sorting on
215          field-name when ordering headers in a response
216
217    Special Methods:
218
219       ``__call__``
220
221           The primary method of the HTTPHeader instance is to make
222           it a callable, it takes either a collection, a string value,
223           or keyword arguments and attempts to find/construct a valid
224           field-value
225
226       ``__lt__``
227
228           This method is used so that HTTPHeader objects can be
229           sorted in a manner suggested by RFC 2616.
230
231       ``__str__``
232
233           The string-value for instances of this class is
234           the ``field-name``.
235
236    Primary Methods:
237
238       ``delete()``
239
240           remove the all occurrences (if any) of the given
241           header in the collection provided
242
243       ``update()``
244
245           replaces (if they exist) all field-value items
246           in the given collection with the value provided
247
248       ``tuples()``
249
250           returns a set of (field-name, field-value) tuples
251           5 for extending ``response_headers``
252
253    Custom Methods (these may not be implemented):
254
255       ``apply()``
256
257           similar to ``update``, but with two differences; first,
258           only keyword arguments can be used, and second, specific
259           sub-classes may introduce side-effects
260
261       ``parse()``
262
263           converts a string value of the header into a more usable
264           form, such as time in seconds for a date header, etc.
265
266    The collected versions of initialized header instances are immediately
267    registered and accessible through the ``get_header`` function.  Do not
268    inherit from this directly, use one of ``_SingleValueHeader``,
269    ``_MultiValueHeader``, or ``_MultiEntryHeader`` as appropriate.
270    """
271
272    #
273    # Things which can be customized
274    #
275    version = '1.1'
276    category = 'general'
277    reference = ''
278    extensions = {}
279
280    def compose(self, **kwargs):
281        """
282        build header value from keyword arguments
283
284        This method is used to build the corresponding header value when
285        keyword arguments (or no arguments) were provided.  The result
286        should be a sequence of values.  For example, the ``Expires``
287        header takes a keyword argument ``time`` (e.g. time.time()) from
288        which it returns a the corresponding date.
289        """
290        raise NotImplementedError()
291
292    def parse(self, *args, **kwargs):
293        """
294        convert raw header value into more usable form
295
296        This method invokes ``values()`` with the arguments provided,
297        parses the header results, and then returns a header-specific
298        data structure corresponding to the header.  For example, the
299        ``Expires`` header returns seconds (as returned by time.time())
300        """
301        raise NotImplementedError()
302
303    def apply(self, collection, **kwargs):
304        """
305        update the collection /w header value (may have side effects)
306
307        This method is similar to ``update`` only that usage may result
308        in other headers being changed as recommended by the corresponding
309        specification.  The return value is defined by the particular
310        sub-class. For example, the ``_CacheControl.apply()`` sets the
311        ``Expires`` header in addition to its normal behavior.
312        """
313        self.update(collection, **kwargs)
314
315    #
316    # Things which are standardized (mostly)
317    #
318    def __new__(cls, name, category=None, reference=None, version=None):
319        """
320        construct a new ``HTTPHeader`` instance
321
322        We use the ``__new__`` operator to ensure that only one
323        ``HTTPHeader`` instance exists for each field-name, and to
324        register the header so that it can be found/enumerated.
325        """
326        self = get_header(name, raiseError=False)
327        if self:
328            # Allow the registration to happen again, but assert
329            # that everything is identical.
330            assert self.name == name, \
331                "duplicate registration with different capitalization"
332            assert self.category == category, \
333                "duplicate registration with different category"
334            assert cls == self.__class__, \
335                "duplicate registration with different class"
336            return self
337
338        self = object.__new__(cls)
339        self.name = name
340        assert isinstance(self.name, str)
341        self.category = category or self.category
342        self.version  = version or self.version
343        self.reference = reference or self.reference
344        _headers[self.name.lower()] = self
345        self.sort_order = {'general': 1, 'request': 2,
346                           'response': 3, 'entity': 4 }[self.category]
347        self._environ_name = getattr(self, '_environ_name',
348                                'HTTP_'+ self.name.upper().replace("-","_"))
349        self._headers_name = getattr(self, '_headers_name',
350                                 self.name.lower())
351        assert self.version in ('1.1', '1.0', '0.9')
352        return self
353
354    def __str__(self):
355        return self.name
356
357    def __lt__(self, other):
358        """
359        sort header instances as specified by RFC 2616
360
361        Re-define sorting so that general headers are first, followed
362        by request/response headers, and then entity headers.  The
363        list.sort() methods use the less-than operator for this purpose.
364        """
365        if isinstance(other, HTTPHeader):
366            if self.sort_order != other.sort_order:
367                return self.sort_order < other.sort_order
368            return self.name < other.name
369        return False
370
371    def __repr__(self):
372        ref = self.reference and (' (%s)' % self.reference) or ''
373        return '<%s %s%s>' % (self.__class__.__name__, self.name, ref)
374
375    def values(self, *args, **kwargs):
376        """
377        find/construct field-value(s) for the given header
378
379        Resolution is done according to the following arguments:
380
381        - If only keyword arguments are given, then this is equivalent
382          to ``compose(**kwargs)``.
383
384        - If the first (and only) argument is a dict, it is assumed
385          to be a WSGI ``environ`` and the result of the corresponding
386          ``HTTP_`` entry is returned.
387
388        - If the first (and only) argument is a list, it is assumed
389          to be a WSGI ``response_headers`` and the field-value(s)
390          for this header are collected and returned.
391
392        - In all other cases, the arguments are collected, checked that
393          they are string values, possibly verified by the header's
394          logic, and returned.
395
396        At this time it is an error to provide keyword arguments if args
397        is present (this might change).  It is an error to provide both
398        a WSGI object and also string arguments.  If no arguments are
399        provided, then ``compose()`` is called to provide a default
400        value for the header; if there is not default it is an error.
401        """
402        if not args:
403            return self.compose(**kwargs)
404        if list == type(args[0]):
405            assert 1 == len(args)
406            result = []
407            name = self.name.lower()
408            for value in [value for header, value in args[0]
409                         if header.lower() == name]:
410                result.append(value)
411            return result
412        if dict == type(args[0]):
413            assert 1 == len(args) and 'wsgi.version' in args[0]
414            value = args[0].get(self._environ_name)
415            if not value:
416                return ()
417            return (value,)
418        for item in args:
419            assert not type(item) in (dict, list)
420        return args
421
422    def __call__(self, *args, **kwargs):
423        """
424        converts ``values()`` into a string value
425
426        This method converts the results of ``values()`` into a string
427        value for common usage.  By default, it is asserted that only
428        one value exists; if you need to access all values then either
429        call ``values()`` directly, or inherit ``_MultiValueHeader``
430        which overrides this method to return a comma separated list of
431        values as described by section 4.2 of RFC 2616.
432        """
433        values = self.values(*args, **kwargs)
434        assert isinstance(values, (tuple, list))
435        if not values:
436            return ''
437        assert len(values) == 1, "more than one value: %s" % repr(values)
438        return str(values[0]).strip()
439
440    def delete(self, collection):
441        """
442        removes all occurances of the header from the collection provided
443        """
444        if type(collection) == dict:
445            if self._environ_name in collection:
446                del collection[self._environ_name]
447            return self
448        assert list == type(collection)
449        i = 0
450        while i < len(collection):
451            if collection[i][0].lower() == self._headers_name:
452                del collection[i]
453                continue
454            i += 1
455
456    def update(self, collection, *args, **kwargs):
457        """
458        updates the collection with the provided header value
459
460        This method replaces (in-place when possible) all occurrences of
461        the given header with the provided value.  If no value is
462        provided, this is the same as ``remove`` (note that this case
463        can only occur if the target is a collection w/o a corresponding
464        header value). The return value is the new header value (which
465        could be a list for ``_MultiEntryHeader`` instances).
466        """
467        value = self.__call__(*args, **kwargs)
468        if not value:
469            self.delete(collection)
470            return
471        if type(collection) == dict:
472            collection[self._environ_name] = value
473            return
474        assert list == type(collection)
475        i = 0
476        found = False
477        while i < len(collection):
478            if collection[i][0].lower() == self._headers_name:
479                if found:
480                    del collection[i]
481                    continue
482                collection[i] = (self.name, value)
483                found = True
484            i += 1
485        if not found:
486            collection.append((self.name, value))
487
488    def tuples(self, *args, **kwargs):
489        value = self.__call__(*args, **kwargs)
490        if not value:
491            return ()
492        return [(self.name, value)]
493
494class _SingleValueHeader(HTTPHeader):
495    """
496    a ``HTTPHeader`` with exactly a single value
497
498    This is the default behavior of ``HTTPHeader`` where returning a
499    the string-value of headers via ``__call__`` assumes that only
500    a single value exists.
501    """
502    pass
503
504class _MultiValueHeader(HTTPHeader):
505    """
506    a ``HTTPHeader`` with one or more values
507
508    The field-value for these header instances is is allowed to be more
509    than one value; whereby the ``__call__`` method returns a comma
510    separated list as described by section 4.2 of RFC 2616.
511    """
512
513    def __call__(self, *args, **kwargs):
514        results = self.values(*args, **kwargs)
515        if not results:
516            return ''
517        return ", ".join([str(v).strip() for v in results])
518
519    def parse(self, *args, **kwargs):
520        value = self.__call__(*args, **kwargs)
521        values = value.split(',')
522        return [
523            v.strip() for v in values
524            if v.strip()]
525
526class _MultiEntryHeader(HTTPHeader):
527    """
528    a multi-value ``HTTPHeader`` where items cannot be combined with a comma
529
530    This header is multi-valued, but the values should not be combined
531    with a comma since the header is not in compliance with RFC 2616
532    (Set-Cookie due to Expires parameter) or which common user-agents do
533    not behave well when the header values are combined.
534    """
535
536    def update(self, collection, *args, **kwargs):
537        assert list == type(collection), "``environ`` may not be updated"
538        self.delete(collection)
539        collection.extend(self.tuples(*args, **kwargs))
540
541    def tuples(self, *args, **kwargs):
542        values = self.values(*args, **kwargs)
543        if not values:
544            return ()
545        return [(self.name, value.strip()) for value in values]
546
547def get_header(name, raiseError=True):
548    """
549    find the given ``HTTPHeader`` instance
550
551    This function finds the corresponding ``HTTPHeader`` for the
552    ``name`` provided.  So that python-style names can be used,
553    underscores are converted to dashes before the lookup.
554    """
555    retval = _headers.get(str(name).strip().lower().replace("_","-"))
556    if not retval and raiseError:
557        raise AssertionError("'%s' is an unknown header" % name)
558    return retval
559
560def list_headers(general=None, request=None, response=None, entity=None):
561    " list all headers for a given category "
562    if not (general or request or response or entity):
563        general = request = response = entity = True
564    search = []
565    for (bool, strval) in ((general, 'general'), (request, 'request'),
566                           (response, 'response'), (entity, 'entity')):
567        if bool:
568            search.append(strval)
569    return [head for head in _headers.values() if head.category in search]
570
571def normalize_headers(response_headers, strict=True):
572    """
573    sort headers as suggested by  RFC 2616
574
575    This alters the underlying response_headers to use the common
576    name for each header; as well as sorting them with general
577    headers first, followed by request/response headers, then
578    entity headers, and unknown headers last.
579    """
580    category = {}
581    for idx in range(len(response_headers)):
582        (key, val) = response_headers[idx]
583        head = get_header(key, strict)
584        if not head:
585            newhead = '-'.join([x.capitalize() for x in
586                                key.replace("_","-").split("-")])
587            response_headers[idx] = (newhead, val)
588            category[newhead] = 4
589            continue
590        response_headers[idx] = (str(head), val)
591        category[str(head)] = head.sort_order
592    def key_func(item):
593        value = item[0]
594        return (category[value], value)
595    response_headers.sort(key=key_func)
596
597class _DateHeader(_SingleValueHeader):
598    """
599    handle date-based headers
600
601    This extends the ``_SingleValueHeader`` object with specific
602    treatment of time values:
603
604    - It overrides ``compose`` to provide a sole keyword argument
605      ``time`` which is an offset in seconds from the current time.
606
607    - A ``time`` method is provided which parses the given value
608      and returns the current time value.
609    """
610
611    def compose(self, time=None, delta=None):
612        time = time or now()
613        if delta:
614            assert type(delta) == int
615            time += delta
616        return (formatdate(time),)
617
618    def parse(self, *args, **kwargs):
619        """ return the time value (in seconds since 1970) """
620        value = self.__call__(*args, **kwargs)
621        if value:
622            try:
623                return mktime_tz(parsedate_tz(value))
624            except (TypeError, OverflowError):
625                raise HTTPBadRequest((
626                    "Received an ill-formed timestamp for %s: %s\r\n") %
627                    (self.name, value))
628
629#
630# Following are specific HTTP headers. Since these classes are mostly
631# singletons, there is no point in keeping the class around once it has
632# been instantiated, so we use the same name.
633#
634
635class _CacheControl(_MultiValueHeader):
636    """
637    Cache-Control, RFC 2616 14.9  (use ``CACHE_CONTROL``)
638
639    This header can be constructed (using keyword arguments), by
640    first specifying one of the following mechanisms:
641
642      ``public``
643
644          if True, this argument specifies that the
645          response, as a whole, may be cashed.
646
647      ``private``
648
649          if True, this argument specifies that the response, as a
650          whole, may be cashed; this implementation does not support
651          the enumeration of private fields
652
653      ``no_cache``
654
655          if True, this argument specifies that the response, as a
656          whole, may not be cashed; this implementation does not
657          support the enumeration of private fields
658
659    In general, only one of the above three may be True, the other 2
660    must then be False or None.  If all three are None, then the cache
661    is assumed to be ``public``.  Following one of these mechanism
662    specifiers are various modifiers:
663
664      ``no_store``
665
666          indicates if content may be stored on disk;
667          otherwise cache is limited to memory (note:
668          users can still save the data, this applies
669          to intermediate caches)
670
671      ``max_age``
672
673          the maximum duration (in seconds) for which
674          the content should be cached; if ``no-cache``
675          is specified, this defaults to 0 seconds
676
677      ``s_maxage``
678
679          the maximum duration (in seconds) for which the
680          content should be allowed in a shared cache.
681
682      ``no_transform``
683
684          specifies that an intermediate cache should
685          not convert the content from one type to
686          another (e.g. transform a BMP to a PNG).
687
688      ``extensions``
689
690          gives additional cache-control extensions,
691          such as items like, community="UCI" (14.9.6)
692
693    The usage of ``apply()`` on this header has side-effects. As
694    recommended by RFC 2616, if ``max_age`` is provided, then then the
695    ``Expires`` header is also calculated for HTTP/1.0 clients and
696    proxies (this is done at the time ``apply()`` is called).  For
697    ``no-cache`` and for ``private`` cases, we either do not want the
698    response cached or do not want any response accidently returned to
699    other users; so to prevent this case, we set the ``Expires`` header
700    to the time of the request, signifying to HTTP/1.0 transports that
701    the content isn't to be cached.  If you are using SSL, your
702    communication is already "private", so to work with HTTP/1.0
703    browsers over SSL, consider specifying your cache as ``public`` as
704    the distinction between public and private is moot.
705    """
706
707    # common values for max-age; "good enough" approximates
708    ONE_HOUR  = 60*60
709    ONE_DAY   = ONE_HOUR * 24
710    ONE_WEEK  = ONE_DAY * 7
711    ONE_MONTH = ONE_DAY * 30
712    ONE_YEAR  = ONE_WEEK * 52
713
714    def _compose(self, public=None, private=None, no_cache=None,
715                 no_store=False, max_age=None, s_maxage=None,
716                 no_transform=False, **extensions):
717        assert isinstance(max_age, (type(None), int))
718        assert isinstance(s_maxage, (type(None), int))
719        expires = 0
720        result = []
721        if private is True:
722            assert not public and not no_cache and not s_maxage
723            result.append('private')
724        elif no_cache is True:
725            assert not public and not private and not max_age
726            result.append('no-cache')
727        else:
728            assert public is None or public is True
729            assert not private and not no_cache
730            expires = max_age
731            result.append('public')
732        if no_store:
733            result.append('no-store')
734        if no_transform:
735            result.append('no-transform')
736        if max_age is not None:
737            result.append('max-age=%d' % max_age)
738        if s_maxage is not None:
739            result.append('s-maxage=%d' % s_maxage)
740        for (k, v) in six.iteritems(extensions):
741            if k not in self.extensions:
742                raise AssertionError("unexpected extension used: '%s'" % k)
743            result.append('%s="%s"' % (k.replace("_", "-"), v))
744        return (result, expires)
745
746    def compose(self, **kwargs):
747        (result, expires) = self._compose(**kwargs)
748        return result
749
750    def apply(self, collection, **kwargs):
751        """ returns the offset expiration in seconds """
752        (result, expires) = self._compose(**kwargs)
753        if expires is not None:
754            EXPIRES.update(collection, delta=expires)
755        self.update(collection, *result)
756        return expires
757
758_CacheControl('Cache-Control', 'general', 'RFC 2616, 14.9')
759
760class _ContentType(_SingleValueHeader):
761    """
762    Content-Type, RFC 2616 section 14.17
763
764    Unlike other headers, use the CGI variable instead.
765    """
766    version = '1.0'
767    _environ_name = 'CONTENT_TYPE'
768
769    # common mimetype constants
770    UNKNOWN    = 'application/octet-stream'
771    TEXT_PLAIN = 'text/plain'
772    TEXT_HTML  = 'text/html'
773    TEXT_XML   = 'text/xml'
774
775    def compose(self, major=None, minor=None, charset=None):
776        if not major:
777            if minor in ('plain', 'html', 'xml'):
778                major = 'text'
779            else:
780                assert not minor and not charset
781                return (self.UNKNOWN,)
782        if not minor:
783            minor = "*"
784        result = "%s/%s" % (major, minor)
785        if charset:
786            result += "; charset=%s" % charset
787        return (result,)
788
789_ContentType('Content-Type', 'entity', 'RFC 2616, 14.17')
790
791class _ContentLength(_SingleValueHeader):
792    """
793    Content-Length, RFC 2616 section 14.13
794
795    Unlike other headers, use the CGI variable instead.
796    """
797    version = "1.0"
798    _environ_name = 'CONTENT_LENGTH'
799
800_ContentLength('Content-Length', 'entity', 'RFC 2616, 14.13')
801
802class _ContentDisposition(_SingleValueHeader):
803    """
804    Content-Disposition, RFC 2183 (use ``CONTENT_DISPOSITION``)
805
806    This header can be constructed (using keyword arguments),
807    by first specifying one of the following mechanisms:
808
809      ``attachment``
810
811          if True, this specifies that the content should not be
812          shown in the browser and should be handled externally,
813          even if the browser could render the content
814
815      ``inline``
816
817         exclusive with attachment; indicates that the content
818         should be rendered in the browser if possible, but
819         otherwise it should be handled externally
820
821    Only one of the above 2 may be True.  If both are None, then
822    the disposition is assumed to be an ``attachment``. These are
823    distinct fields since support for field enumeration may be
824    added in the future.
825
826      ``filename``
827
828          the filename parameter, if any, to be reported; if
829          this is None, then the current object's filename
830          attribute is used
831
832    The usage of ``apply()`` on this header has side-effects. If
833    filename is provided, and Content-Type is not set or is
834    'application/octet-stream', then the mimetypes.guess is used to
835    upgrade the Content-Type setting.
836    """
837
838    def _compose(self, attachment=None, inline=None, filename=None):
839        result = []
840        if inline is True:
841            assert not attachment
842            result.append('inline')
843        else:
844            assert not inline
845            result.append('attachment')
846        if filename:
847            assert '"' not in filename
848            filename = filename.split("/")[-1]
849            filename = filename.split("\\")[-1]
850            result.append('filename="%s"' % filename)
851        return (("; ".join(result),), filename)
852
853    def compose(self, **kwargs):
854        (result, mimetype) = self._compose(**kwargs)
855        return result
856
857    def apply(self, collection, **kwargs):
858        """ return the new Content-Type side-effect value """
859        (result, filename) = self._compose(**kwargs)
860        mimetype = CONTENT_TYPE(collection)
861        if filename and (not mimetype or CONTENT_TYPE.UNKNOWN == mimetype):
862            mimetype, _ = mimetypes.guess_type(filename)
863            if mimetype and CONTENT_TYPE.UNKNOWN != mimetype:
864                CONTENT_TYPE.update(collection, mimetype)
865        self.update(collection, *result)
866        return mimetype
867
868_ContentDisposition('Content-Disposition', 'entity', 'RFC 2183')
869
870class _IfModifiedSince(_DateHeader):
871    """
872    If-Modified-Since, RFC 2616 section 14.25
873    """
874    version = '1.0'
875
876    def __call__(self, *args, **kwargs):
877        """
878        Split the value on ';' incase the header includes extra attributes. E.g.
879        IE 6 is known to send:
880        If-Modified-Since: Sun, 25 Jun 2006 20:36:35 GMT; length=1506
881        """
882        return _DateHeader.__call__(self, *args, **kwargs).split(';', 1)[0]
883
884    def parse(self, *args, **kwargs):
885        value = _DateHeader.parse(self, *args, **kwargs)
886        if value and value > now():
887            raise HTTPBadRequest((
888              "Please check your system clock.\r\n"
889              "According to this server, the time provided in the\r\n"
890              "%s header is in the future.\r\n") % self.name)
891        return value
892_IfModifiedSince('If-Modified-Since', 'request', 'RFC 2616, 14.25')
893
894class _Range(_MultiValueHeader):
895    """
896    Range, RFC 2616 14.35 (use ``RANGE``)
897
898    According to section 14.16, the response to this message should be a
899    206 Partial Content and that if multiple non-overlapping byte ranges
900    are requested (it is an error to request multiple overlapping
901    ranges) the result should be sent as multipart/byteranges mimetype.
902
903    The server should respond with '416 Requested Range Not Satisfiable'
904    if the requested ranges are out-of-bounds.  The specification also
905    indicates that a syntax error in the Range request should result in
906    the header being ignored rather than a '400 Bad Request'.
907    """
908
909    def parse(self, *args, **kwargs):
910        """
911        Returns a tuple (units, list), where list is a sequence of
912        (begin, end) tuples; and end is None if it was not provided.
913        """
914        value = self.__call__(*args, **kwargs)
915        if not value:
916            return None
917        ranges = []
918        last_end   = -1
919        try:
920            (units, range) = value.split("=", 1)
921            units = units.strip().lower()
922            for item in range.split(","):
923                (begin, end) = item.split("-")
924                if not begin.strip():
925                    begin = 0
926                else:
927                    begin = int(begin)
928                if begin <= last_end:
929                    raise ValueError()
930                if not end.strip():
931                    end = None
932                else:
933                    end = int(end)
934                last_end = end
935                ranges.append((begin, end))
936        except ValueError:
937            # In this case where the Range header is malformed,
938            # section 14.16 says to treat the request as if the
939            # Range header was not present.  How do I log this?
940            return None
941        return (units, ranges)
942_Range('Range', 'request', 'RFC 2616, 14.35')
943
944class _AcceptLanguage(_MultiValueHeader):
945    """
946    Accept-Language, RFC 2616 section 14.4
947    """
948
949    def parse(self, *args, **kwargs):
950        """
951        Return a list of language tags sorted by their "q" values.  For example,
952        "en-us,en;q=0.5" should return ``["en-us", "en"]``.  If there is no
953        ``Accept-Language`` header present, default to ``[]``.
954        """
955        header = self.__call__(*args, **kwargs)
956        if header is None:
957            return []
958        langs = [v for v in header.split(",") if v]
959        qs = []
960        for lang in langs:
961            pieces = lang.split(";")
962            lang, params = pieces[0].strip().lower(), pieces[1:]
963            q = 1
964            for param in params:
965                if '=' not in param:
966                    # Malformed request; probably a bot, we'll ignore
967                    continue
968                lvalue, rvalue = param.split("=")
969                lvalue = lvalue.strip().lower()
970                rvalue = rvalue.strip()
971                if lvalue == "q":
972                    q = float(rvalue)
973            qs.append((lang, q))
974        qs.sort(key=lambda query: query[1], reverse=True)
975        return [lang for (lang, q) in qs]
976_AcceptLanguage('Accept-Language', 'request', 'RFC 2616, 14.4')
977
978class _AcceptRanges(_MultiValueHeader):
979    """
980    Accept-Ranges, RFC 2616 section 14.5
981    """
982    def compose(self, none=None, bytes=None):
983        if bytes:
984            return ('bytes',)
985        return ('none',)
986_AcceptRanges('Accept-Ranges', 'response', 'RFC 2616, 14.5')
987
988class _ContentRange(_SingleValueHeader):
989    """
990    Content-Range, RFC 2616 section 14.6
991    """
992    def compose(self, first_byte=None, last_byte=None, total_length=None):
993        retval = "bytes %d-%d/%d" % (first_byte, last_byte, total_length)
994        assert last_byte == -1 or first_byte <= last_byte
995        assert last_byte  < total_length
996        return (retval,)
997_ContentRange('Content-Range', 'entity', 'RFC 2616, 14.6')
998
999class _Authorization(_SingleValueHeader):
1000    """
1001    Authorization, RFC 2617 (RFC 2616, 14.8)
1002    """
1003    def compose(self, digest=None, basic=None, username=None, password=None,
1004                challenge=None, path=None, method=None):
1005        assert username and password
1006        if basic or not challenge:
1007            assert not digest
1008            userpass = "%s:%s" % (username.strip(), password.strip())
1009            return "Basic %s" % userpass.encode('base64').strip()
1010        assert challenge and not basic
1011        path = path or "/"
1012        (_, realm) = challenge.split('realm="')
1013        (realm, _) = realm.split('"', 1)
1014        auth = AbstractDigestAuthHandler()
1015        auth.add_password(realm, path, username, password)
1016        (token, challenge) = challenge.split(' ', 1)
1017        chal = parse_keqv_list(parse_http_list(challenge))
1018        class FakeRequest(object):
1019            if six.PY3:
1020                @property
1021                def full_url(self):
1022                    return path
1023
1024                selector = full_url
1025
1026                @property
1027                def data(self):
1028                    return None
1029            else:
1030                def get_full_url(self):
1031                    return path
1032
1033                get_selector = get_full_url
1034
1035                def has_data(self):
1036                    return False
1037
1038            def get_method(self):
1039                return method or "GET"
1040
1041        retval = "Digest %s" % auth.get_authorization(FakeRequest(), chal)
1042        return (retval,)
1043_Authorization('Authorization', 'request', 'RFC 2617')
1044
1045#
1046# For now, construct a minimalistic version of the field-names; at a
1047# later date more complicated headers may sprout content constructors.
1048# The items commented out have concrete variants.
1049#
1050for (name,              category, version, style,      comment) in \
1051(("Accept"             ,'request' ,'1.1','multi-value','RFC 2616, 14.1' )
1052,("Accept-Charset"     ,'request' ,'1.1','multi-value','RFC 2616, 14.2' )
1053,("Accept-Encoding"    ,'request' ,'1.1','multi-value','RFC 2616, 14.3' )
1054#,("Accept-Language"    ,'request' ,'1.1','multi-value','RFC 2616, 14.4' )
1055#,("Accept-Ranges"      ,'response','1.1','multi-value','RFC 2616, 14.5' )
1056,("Age"                ,'response','1.1','singular'   ,'RFC 2616, 14.6' )
1057,("Allow"              ,'entity'  ,'1.0','multi-value','RFC 2616, 14.7' )
1058#,("Authorization"      ,'request' ,'1.0','singular'   ,'RFC 2616, 14.8' )
1059#,("Cache-Control"      ,'general' ,'1.1','multi-value','RFC 2616, 14.9' )
1060,("Cookie"             ,'request' ,'1.0','multi-value','RFC 2109/Netscape')
1061,("Connection"         ,'general' ,'1.1','multi-value','RFC 2616, 14.10')
1062,("Content-Encoding"   ,'entity'  ,'1.0','multi-value','RFC 2616, 14.11')
1063#,("Content-Disposition",'entity'  ,'1.1','multi-value','RFC 2616, 15.5' )
1064,("Content-Language"   ,'entity'  ,'1.1','multi-value','RFC 2616, 14.12')
1065#,("Content-Length"     ,'entity'  ,'1.0','singular'   ,'RFC 2616, 14.13')
1066,("Content-Location"   ,'entity'  ,'1.1','singular'   ,'RFC 2616, 14.14')
1067,("Content-MD5"        ,'entity'  ,'1.1','singular'   ,'RFC 2616, 14.15')
1068#,("Content-Range"      ,'entity'  ,'1.1','singular'   ,'RFC 2616, 14.16')
1069#,("Content-Type"       ,'entity'  ,'1.0','singular'   ,'RFC 2616, 14.17')
1070,("Date"               ,'general' ,'1.0','date-header','RFC 2616, 14.18')
1071,("ETag"               ,'response','1.1','singular'   ,'RFC 2616, 14.19')
1072,("Expect"             ,'request' ,'1.1','multi-value','RFC 2616, 14.20')
1073,("Expires"            ,'entity'  ,'1.0','date-header','RFC 2616, 14.21')
1074,("From"               ,'request' ,'1.0','singular'   ,'RFC 2616, 14.22')
1075,("Host"               ,'request' ,'1.1','singular'   ,'RFC 2616, 14.23')
1076,("If-Match"           ,'request' ,'1.1','multi-value','RFC 2616, 14.24')
1077#,("If-Modified-Since"  ,'request' ,'1.0','date-header','RFC 2616, 14.25')
1078,("If-None-Match"      ,'request' ,'1.1','multi-value','RFC 2616, 14.26')
1079,("If-Range"           ,'request' ,'1.1','singular'   ,'RFC 2616, 14.27')
1080,("If-Unmodified-Since",'request' ,'1.1','date-header' ,'RFC 2616, 14.28')
1081,("Last-Modified"      ,'entity'  ,'1.0','date-header','RFC 2616, 14.29')
1082,("Location"           ,'response','1.0','singular'   ,'RFC 2616, 14.30')
1083,("Max-Forwards"       ,'request' ,'1.1','singular'   ,'RFC 2616, 14.31')
1084,("Pragma"             ,'general' ,'1.0','multi-value','RFC 2616, 14.32')
1085,("Proxy-Authenticate" ,'response','1.1','multi-value','RFC 2616, 14.33')
1086,("Proxy-Authorization",'request' ,'1.1','singular'   ,'RFC 2616, 14.34')
1087#,("Range"              ,'request' ,'1.1','multi-value','RFC 2616, 14.35')
1088,("Referer"            ,'request' ,'1.0','singular'   ,'RFC 2616, 14.36')
1089,("Retry-After"        ,'response','1.1','singular'   ,'RFC 2616, 14.37')
1090,("Server"             ,'response','1.0','singular'   ,'RFC 2616, 14.38')
1091,("Set-Cookie"         ,'response','1.0','multi-entry','RFC 2109/Netscape')
1092,("TE"                 ,'request' ,'1.1','multi-value','RFC 2616, 14.39')
1093,("Trailer"            ,'general' ,'1.1','multi-value','RFC 2616, 14.40')
1094,("Transfer-Encoding"  ,'general' ,'1.1','multi-value','RFC 2616, 14.41')
1095,("Upgrade"            ,'general' ,'1.1','multi-value','RFC 2616, 14.42')
1096,("User-Agent"         ,'request' ,'1.0','singular'   ,'RFC 2616, 14.43')
1097,("Vary"               ,'response','1.1','multi-value','RFC 2616, 14.44')
1098,("Via"                ,'general' ,'1.1','multi-value','RFC 2616, 14.45')
1099,("Warning"            ,'general' ,'1.1','multi-entry','RFC 2616, 14.46')
1100,("WWW-Authenticate"   ,'response','1.0','multi-entry','RFC 2616, 14.47')):
1101    klass = {'multi-value': _MultiValueHeader,
1102             'multi-entry': _MultiEntryHeader,
1103             'date-header': _DateHeader,
1104             'singular'   : _SingleValueHeader}[style]
1105    klass(name, category, comment, version).__doc__ = comment
1106    del klass
1107
1108for head in _headers.values():
1109    headname = head.name.replace("-","_").upper()
1110    locals()[headname] = head
1111    __all__.append(headname)
1112
1113__pudge_all__ = __all__[:]
1114for _name, _obj in six.iteritems(dict(globals())):
1115    if isinstance(_obj, type) and issubclass(_obj, HTTPHeader):
1116        __pudge_all__.append(_name)
1117