1The Paste HTTP Server Thread Pool
2=================================
3
4This document describes how the thread pool in ``paste.httpserver``
5works, and how it can adapt to problems.
6
7Note all of the configuration parameters listed here are prefixed with
8``threadpool_`` when running through a Paste Deploy configuration.
9
10Error Cases
11-----------
12
13When a WSGI application is called, it's possible that it will block
14indefinitely.  There's two basic ways you can manage threads:
15
16* Start a thread on every request, close it down when the thread stops
17
18* Start a pool of threads, and reuse those threads for subsequent
19  requests
20
21In both cases things go wrong -- if you start a thread every request
22you will have an explosion of threads, and with it memory and a loss
23of performance.  This can culminate in really high loads, swapping,
24and the whole site grinds to a halt.
25
26If you are using a pool of threads, all the threads can simply be used
27up.  New requests go into a queue to be processed, but since that
28queue never moves forward everyone will just block.  The site
29basically freezes, though memory usage doesn't generally get worse.
30
31Paste Thread Pool
32-----------------
33
34The thread pool in Paste has some options to walk the razor's edge
35between the two techniques, and to try to respond usefully in most
36cases.
37
38The pool tracks all workers threads.  Threads can be in a few states:
39
40* Idle, waiting for a request ("idle")
41
42* Working on a request
43
44  * For a reasonable amount of time ("busy")
45
46  * For an unreasonably long amount of time ("hung")
47
48* Thread that should die
49
50  * An exception has been injected that should kill the thread, but it
51    hasn't happened yet ("dying")
52
53  * An exception has been injected, but the thread has persisted for
54    an unreasonable amount of time ("zombie")
55
56When a request comes in, if there are no idle worker threads waiting
57then the server looks at the workers; all workers are busy or hung.
58If too many are hung, another thread is opened up.  The limit is if
59there are less than ``spawn_if_under`` busy threads.  So if you have
6010 workers, ``spawn_if_under`` is 5, and there are 6 hung threads and
614 busy threads, another thread will be opened (bringing the number of
62busy threads back to 5).  Later those threads may be collected again
63if some of the threads become un-hung.  A thread is hung if it has
64been working for longer than ``hung_thread_limit`` (default 30
65seconds).
66
67Every so often, the server will check all the threads for error
68conditions.  This happens every ``hung_check_period`` requests
69(default 100).  At this time if there are more than enough threads
70(because of ``spawn_if_under``) some threads may be collected.  If any
71threads have been working for longer than ``kill_thread_limit``
72(default 1800 seconds, i.e., 30 minutes) then the thread will be
73killed.
74
75To kill a thread the ``ctypes`` module must be installed.  This will
76raise an exception (``SystemExit``) in the thread, which should cause
77the thread to stop.  It can take quite a while for this to actually
78take effect, sometimes on the order of several minutes.  This uses a
79non-public API (hence the ``ctypes`` requirement), and so it might not
80work in all cases.  I've tried it in pure Python code and with a hung
81socket, and in both cases it worked.  As soon as the thread is killed
82(before it is actually dead) another worker is added to the pool.
83
84If the killed thread lives longer than ``dying_thread_limit`` (default
85300 seconds, 5 minutes) then it is considered a zombie.
86
87Zombie threads are not handled specially unless you set
88``max_zombies_before_die``.  If you set this and there are more than
89this many zombie threads, then the entire process will be killed.
90This is useful if you are running the server under some process
91monitor, such as ``start-stop-daemon``, ``daemontools``, ``runit``, or
92with ``paster serve --monitor``.  To make the process die, it may run
93``os._exit``, which is considered an impolite way to exit a process
94(akin to ``kill -9``).  It *will* try to run the functions registered
95with ``atexit`` (except for the thread cleanup functions, which are
96the ones which will block so long as there are living threads).
97
98Notification
99------------
100
101If you set ``error_email`` (including setting it globally in a Paste
102Deploy ``[DEFAULT]`` section) then you will be notified of two error
103conditions: when hung threads are killed, and when the process is
104killed due to too many zombie threads.
105
106Missed Cases
107------------
108
109If you have a worker pool size of 10, and 11 slow or hung requests
110come in, the first 10 will get handed off but the server won't know
111yet that they will hang.  The last request will stay stuck in a queue
112until another request comes in.  When a later request comes later
113(after ``hung_thread_limit`` seconds) the server will notice the
114problem and add more threads, and the 11th request will come through.
115
116If a trickle of bad requests keeps coming in, the number of hung
117threads will keep increasing.  At 100 the ``hung_check_period`` may
118not clean them up fast enough.
119
120Killing threads is not something Python really supports.  Corruption
121of the process, memory leaks, or who knows what might occur.  For the
122most part the threads seem to be killed in a fairly simple manner --
123an exception is raised, and ``finally`` blocks do get executed.  But
124this hasn't been tried much in production, so there's not much
125experience with it.
126
127watch_threads
128-------------
129
130If you want to see what's going on in your process, you can install
131the application ``egg:Paste#watch_threads`` (in the
132``paste.debug.watchthreads`` module).  This lets you see requests and
133how long they have been running.  In Python 2.5 you can see tracebacks
134of the running requests; before that you can only see request data
135(URLs, User-Agent, etc).  If you set ``allow_kill = true`` then you
136can also kill threads from the application.  The thread pool is
137intended to run reliably without intervention, but this can help debug
138problems or give you some feeling of what causes problems in the site.
139
140This does open up privacy problems, as it gives you access to all the
141request data in the site, including cookies, IP addresses, etc.  It
142shouldn't be left on in a public setting.
143
144socket_timeout
145--------------
146
147The HTTP server (not the thread pool) also accepts an argument
148``socket_timeout``.  It is turned off by default.  You might find it
149helpful to turn it on.
150
151