1
2/* Make a thread the running thread.  The thread must previously been
3   sleeping, and not holding the CPU semaphore. This will set the
4   thread state to VgTs_Runnable, and the thread will attempt to take
5   the CPU semaphore.  By the time it returns, tid will be the running
6   thread. */
7extern void VG_(set_running) ( ThreadId tid );
8
9/* Set a thread into a sleeping state.  Before the call, the thread
10   must be runnable, and holding the CPU semaphore.  When this call
11   returns, the thread will be set to the specified sleeping state,
12   and will not be holding the CPU semaphore.  Note that another
13   thread could be running by the time this call returns, so the
14   caller must be careful not to touch any shared state.  It is also
15   the caller's responsibility to actually block until the thread is
16   ready to run again. */
17extern void VG_(set_sleeping) ( ThreadId tid, ThreadStatus state );
18
19
20The master semaphore is run_sema in vg_scheduler.c.
21
22
23(what happens at a fork?)
24
25VG_(scheduler_init) registers sched_fork_cleanup as a child atfork
26handler.  sched_fork_cleanup, among other things, reinitializes the
27semaphore with a new pipe so the process has its own.
28
29--------------------------------------------------------------------
30
31Re:   New World signal handling
32From: Jeremy Fitzhardinge <jeremy@goop.org>
33To:   Julian Seward <jseward@acm.org>
34Date: Mon Mar 14 09:03:51 2005
35
36Well, the big-picture things to be clear about are:
37
38   1. signal handlers are process-wide global state
39   2. signal masks are per-thread (there's no notion of a process-wide
40      signal mask)
41   3. a signal can be targeted to either
42         1. the whole process (any eligable thread is picked for
43            delivery), or
44         2. a specific thread
45
461 is why it is always a bug to temporarily reset a signal handler (say,
47for SIGSEGV), because if any other thread happens to be sent one in that
48window it will cause havok (I think there's still one instance of this
49in the symtab stuff).
502 is the meat of your questions; more below.
513 is responsible for some of the nitty detail in the signal stuff, so
52its worth bearing in mind to understand it all. (Note that even if a
53signal is targeting the whole process, its only ever delivered to one
54particular thread; there's no such thing as a broadcast signal.)
55
56While a thread are running core code or generated code, it has almost
57all its signals blocked (all but the fault signals: SEGV, BUS, ILL, etc).
58
59Every N basic blocks, each thread calls VG_(poll_signals) to see what
60signals are pending for it.  poll_signals grabs the next pending signal
61which the client signal mask doesn't block, and sets it up for delivery;
62it uses the sigtimedwait() syscall to fetch blocked pending signals
63rather than have them delivered to a signal handler.   This means that
64we avoid the complexity of having signals delivered asynchronously via
65the signal handlers; we can just poll for them synchronously when
66they're easy to deal with.
67
68Fault signals, being caused by a specific instruction, are the exception
69because they can't be held off; if they're blocked when an instruction
70raises one, the kernel will just summarily kill the process.  Therefore,
71they need to be always unblocked, and the signal handler is called when
72an instruction raises one of these exceptions. (It's also necessary to
73call poll_signals after any syscall which may raise a signal, since
74signal-raising syscalls are considered to be synchronous with respect to
75their signal; ie, calling kill(getpid(), SIGUSR1) will call the handler
76for SIGUSR1 before kill is seen to complete.)
77
78The one time when the thread's real signal mask actually matches the
79client's requested signal mask is while running a blocking syscall.  We
80have to set things up to accept signals during a syscall so that we get
81the right signal-interrupts-syscall semantics.  The tricky part about
82this is that there's no general atomic
83set-signal-mask-and-block-in-syscall mechanism, so we need to fake it
84with the stuff in VGA_(_client_syscall)/VGA_(interrupted_syscall).
85These two basically form an explicit state machine, where the state
86variable is the instruction pointer, which allows it to determine what
87point the syscall got to when the async signal happens.  By keeping the
88window where signals are actually unblocked very narrow, the number of
89possible states is pretty small.
90
91This is all quite nice because the kernel does almost all the work of
92determining which thread should get a signal, what the correct action
93for a syscall when it has been interrupted is, etc.  Particularly nice
94is that we don't need to worry about all the queuing semantics, and the
95per-signal special cases (which is, roughly, signals 1-32 are not queued
96except when they are, and signals 33-64 are queued except when they aren't).
97
98BUT, there's another complexity: because the Unix signal mechanism has
99been overloaded to deal with two separate kinds of events (asynchronous
100signals raised by kill(), and synchronous faults raised by an
101instruction), we can't block a signal for one form and not the other.
102That is, because we have to leave SIGSEGV unblocked for faulting
103instructions, it also leaves us open to getting an async SIGSEGV sent
104with kill(pid, SIGSEGV).
105
106To handle this case, there's a small per-thread signal queue set up to
107deal with this case (I'm using tid 0's queue for "signals sent to the
108whole process" - a hack, I'll admit).  If an async SIGSEGV (etc) signal
109appears, then it is pushed onto the appropriate queue.
110VG_(poll_signals) also checks these queues for pending signals to decide
111what signal to deliver next.  These queues are only manipulated with
112*all* signals blocked, so there's no risk of two concurrent async signal
113handlers modifying the queues at once.  Also, because the liklihood of
114actually being sent an async SIGSEGV is pretty low, the queues are only
115allocated on demand.
116
117
118
119There are two mechanisms to prevent disaster if multiple threads get
120signals concurrently.  One is that a signal handler is set up to block a
121set of signals while the signal is being delivered.  Valgrind's handlers
122block all signals, so there's no risk of a new signal being delivered to
123the same thread until the old handler has finished.
124
125The other is that if the thread which recieves the signal is not running
126(ie, doesn't hold the run_sema, which implies it must be waiting for a
127syscall to complete), then the signal handler will grab the run_sema
128before making any global state changes.  Since the only time we can get
129an async signal asynchronously is during a blocking syscall, this should
130be all the time. (And since synchronous signals are always the result of
131running an instruction, we should already be holding run_sema.)
132
133
134Valgrind will occasionally generate signals for itself. These are always
135synchronous faults as a result instruction fetch or something an
136instruction did.  The two mechanims are the synth_fault_* functions,
137which are used to signal a problem while fetching an instruction, or by
138getting generated code to call a helper which contains a fault-raising
139instruction (used to deal with illegal/unimplemented instructions and
140for instructions who's only job is to raise exceptions).
141
142That all explains how signals come in, but the second part is how they
143get delivered.
144
145The main function for this is VG_(deliver_signal).  There are three cases:
146
147   1. the process is ignoring the signal (SIG_IGN)
148   2. the process is using the default handler (SIG_DFL)
149   3. the process has a handler for the signal
150
151In general, VG_(deliver_signal) shouldn't be called for ignored signals;
152if it has been called, it assumes the ignore is being overridden (if an
153instruction gets a SEGV etc, SIG_IGN is ignored and treated as SIG_DFL).
154
155VG_(deliver_signal) handles the default handler case, and the
156client-specified signal handler case.
157
158The default handler case is relatively easy: the signal's default action
159is either Terminate, or Ignore.  We can ignore Ignore.
160
161Terminate always kills the entire process; there's no such thing as a
162thread-specific signal death. Terminate comes in two forms: with
163coredump, or without.  vg_default_action() will write a core file, and
164then will tell all the threads to start terminating; it then longjmps
165back to the current thread's scheduler loop.  The scheduler loop will
166terminate immediately, and the master_tid thread will wait for all the
167others to exit before shutting down the process (this is the same
168mechanism as exit_group).
169
170Delivering a signal to a client-side handler modifys the thread state so
171that there's a signal frame on the stack, and the instruction pointer is
172pointing to the handler.  The fiddly bit is that there are two
173completely different signal frame formats: old and RT.  While in theory
174the exact shape of these frames on stack is abstracted, there are real
175programs which know exactly where various parts of the structures are on
176stack (most notably, g++'s exception throwing code), which is why it has
177to have two separate pieces of code for each frame format.  Another
178tricky case is dealing with the client stack running out/overflowing
179while setting up the signal frame.
180
181Signal return is also interesting.  There are two syscalls, sigreturn
182and rt_sigreturn, which a signal handler will use to resume execution.
183The client will call the right one for the frame it was passed, so the
184core doesn't need to track that state.  The tricky part is moving the
185frame's register state back into the thread's state, particularly all
186the FPU state reformatting gunk.  Also, *sigreturn checks for new
187pending signals after the old frame has been cleaned up, since there's a
188requirement that all deliverable pending signals are delivered before
189the mainline code makes progress.  This means that a program could
190live-lock on signals, but that's what would happen running natively...
191
192Another thing to watch for: programs which unwind the stack (like gdb,
193or exception throwers) recognize the existence of a signal frame by
194looking at the code the return address points to: if it is one of the
195two specific signal return sequences, it knows its a signal frame.
196That's why the signal handler return address must point to a very
197specific set of instructions.
198
199
200What else.  Ah, the two internal signals.
201
202SIGVGKILL is pretty straightforward: its just used to dislodge a thread
203from being blocked in a syscall, so that we can get the thread to
204terminate in a timely fashion.
205
206SIGVGCHLD is used by a thread to tell the master_tid that it has
207exited.  However, the only time the master_tid cares about this is when
208it has already exited, and its waiting for everyone else to exit.  If
209the master_tid hasn't exited, then this signal is ignored.  It isn't
210enough to simply block it, because that will cause a pile of queued
211SIGVGCHLDs to build up, eventually clogging the kernel's signal delivery
212mechanism.  If its unblocked and ignored, it doesn't interrupt syscalls
213and it doesn't accumulate.
214
215
216I hope that helps clarify things.  And explain why there's so much stuff
217in there: it's tracking a very complex and arcane underlying set of
218machinery.
219
220    J
221
222--------------------------------------------------------------------
223
224>I've been seeing references to 'master thread' around the place.
225>What distinguishes the master thread from the rest?  Where does
226>the requirement to have a master thread come from?
227>
228It used to be tid 1, but I had to generalize it.
229
230The master_tid isn't very special; its main job is at process shutdown.
231It waits for all the other threads to exit, and then produces all the
232final reports. Until it exits, it's just a normal thread, with no other
233responsibilities.
234
235The alternative to having a master thread would be to make whichever
236thread exits last be responsible for emitting all the output.  That
237would work, but it would make the results a bit asynchronous (that is,
238if the main thread exits and the other hang around for a while, anyone
239waiting on the process would see it as having exited, but no results
240would have been produced).
241
242VG_(master_tid) is a varable to handle the case where a threaded program
243forks.  In the first process, the master_tid will be 1.  If that program
244creates a few threads, and then, say, thread 3 forks, the child process
245will have a single thread in it.  In the child, master_tid will be 3.
246It was easier to make the master thread a variable than to try to work
247out how to rename thread 3 to 1 after a fork.
248
249    J
250
251--------------------------------------------------------------------
252
253Re:   Fwd: Documentation of kernel's signal routing ?
254From: David Woodhouse <...>
255To:   Julian Seward <jseward@acm.org>
256
257> Regarding sys_clone created threads.  I have a vague idea that
258> there is a notion of 'thread group'.  I further understand that if
259> one thread in a group calls sys_exit_group then all threads in that
260> group exit.  Whereas if a thread calls sys_exit then just that
261> thread exits.
262>
263> I'm pretty hazy on this:
264
265Hmm, so am I :)
266
267> * Is the above correct?
268
269Yes, I believe so.
270
271> * How is thread-group membership defined/changed?
272
273By specifying CLONE_THREAD in the flags to clone(), you remain part of
274the same thread group as the parent. In a single-threaded process, the
275thread group id (tgid) is the same as the pid.
276
277Linux just has tasks, which sometimes happen to share VM -- and now with
278NPTL we also share other stuff like signals, etc. The 'pid' in Linux is
279what POSIX would call the 'thread id', and the 'tgid' in Linux is
280equivalent to the POSIX 'pid'.
281
282> * Do you know offhand how LinuxThreads and NPTL use thread groups?
283
284I believe that LT doesn't use the kernel's concept of thread groups at
285all. LT predates the kernel's support for proper POSIX-like sharing of
286anything much but memory, so uses only the CLONE_VM (and possibly
287CLONE_FILES) flags. I don't _think_ it uses CLONE_SIGHAND -- it does
288most of its work by propagating signals manually between threads.
289
290NTPL uses thread groups as generated by the CLONE_THREAD flag, which is
291what invokes the POSIX-related thread semantics.
292
293>   Is it the case that each LinuxThreads threads is in its own
294>   group whereas all NTPL threads [in a process] are in a single
295>   group?
296
297Yes, that's my understanding.
298
299--
300dwmw2
301