| ptrace and NTPL, the missing manpage |
| |
| == Signals == |
| |
| A signal sent to a ptrace'd process or thread causes only the thread |
| that receives it to stop and report to the attached process. |
| |
| Use tgkill to target a signal (for example, SIGSTOP) at a particular |
| thread. If you use kill, the signal could be delivered to another |
| thread in the same process. |
| |
| Note that SIGSTOP differs from its usual behavior when a process is |
| being traced. Usually, a SIGSTOP sent to any thread in a thread group |
| will stop all threads in the thread group. When a thread is traced, |
| however, a SIGSTOP affects only the receiving thread (and any other |
| threads in the thread group that are not traced). |
| |
| SIGKILL behaves like it does for non-traced processes. It affects all |
| threads in the process and terminates them without the WSTOPSIG event |
| generated by other signals. However, if PTRACE_O_TRACEEXIT is set, |
| the attached process will still receive PTRACE_EVENT_EXIT events |
| before receiving WIFSIGNALED events. |
| |
| See "Following thread death" for a caveat regarding signal delivery to |
| zombie threads. |
| |
| == Waiting on threads == |
| |
| Cloned threads in ptrace'd processes are treated similarly to cloned |
| threads in your own process. Thus, you must use the __WALL option in |
| order to receive notifications from threads created by the child |
| process. Similarly, the __WCLONE option will wait only on |
| notifications from threads created by the child process and *not* on |
| notifications from the initial child thread. |
| |
| Even when waiting on a specific thread's PID using waitpid or similar, |
| __WALL or __WCLONE is necessary or waitpid will return ECHILD. |
| |
| == Attaching to existing threads == |
| |
| libthread_db (which gdb uses), attaches to existing threads by pulling |
| the pthread data structures out of the traced process. The much |
| easier way is to traverse the /proc/PID/task directory, though it's |
| unclear how the semantics of these two approaches differ. |
| |
| Unfortunately, if the main thread has exited (but the overall process |
| has not), it sticks around as a zombie process. This zombie will |
| appear in the /proc/PID/task directory, but trying to attach to it |
| will yield EPERM. In this case, the third field of the |
| /proc/PID/task/PID/stat file will be "Z". Attempting to open the stat |
| file is also a convenient way to detect races between listing the task |
| directory and the thread exiting. Coincidentally, gdb will simply |
| fail to attach to a process whose main thread is a zombie. |
| |
| Because new threads may be created while the debugger is in the |
| process of attaching to existing threads, the debugger must repeatedly |
| re-list the task directory until it has attached to (and thus stopped) |
| every thread listed. |
| |
| In order to follow new threads created by existing threads, |
| PTRACE_O_TRACECLONE must be set on each thread attached to. |
| |
| == Following new threads == |
| |
| With the child process stopped, use PTRACE_SETOPTIONS to set the |
| PTRACE_O_TRACECLONE option. This option is per-thread, and thus must |
| be set on each existing thread individually. When an existing thread |
| with PTRACE_O_TRACECLONE set spawns a new thread, the existing thread |
| will stop with (SIGTRAP | PTRACE_EVENT_CLONE << 8) and the PID of the |
| new thread can be retrieved with PTRACE_GETEVENTMSG on the creating |
| thread. At this time, the new thread will exist, but will initially |
| be stopped with a SIGSTOP. The new thread will automatically be |
| traced and will inherit the PTRACE_O_TRACECLONE option from its |
| parent. The attached process should wait on the new thread to receive |
| the SIGSTOP notification. |
| |
| When using waitpid(-1, ...), don't rely on the parent thread reporting |
| a SIGTRAP before receiving the SIGSTOP from the new child thread. |
| |
| Without PTRACE_O_TRACECLONE, newly cloned threads will not be |
| ptrace'd. As a result, signals received by new threads will be |
| handled in the usual way, which may affect the parent and in turn |
| appear to the attached process, but attributed to the parent (possibly |
| in unexpected ways). |
| |
| == Following thread death == |
| |
| If any thread with the PTRACE_O_TRACEEXIT option set exits (either by |
| returning or pthread_exit'ing), the tracing process will receive an |
| immediate PTRACE_EVENT_EXIT. At this point, the thread will still |
| exist. The exit status, encoded as for wait, can be queried using |
| PTRACE_GETEVENTMSG on the exiting thread's PID. The thread should be |
| continued so it can actually exit, after which its wait behavior is |
| the same as for a thread without the PTRACE_O_TRACEEXIT option. |
| |
| If a non-main thread exits (either by returning or pthread_exit'ing), |
| its corresponding process will also exit, producing a WIFEXITED event |
| (after the process is continued from a possible PTRACE_EVENT_EXIT |
| event). It is *not* necessary for another thread to ptrace_join for |
| this to happen. |
| |
| If the main thread exits by returning, then all threads will exit, |
| first generating a PTRACE_EVENT_EXIT event for each thread if |
| appropriate, then producing a WIFEXITED event for each thread. |
| |
| If the main thread exits using pthread_exit, then it enters a |
| non-waitable zombie state. It will still produce an immediate |
| PTRACE_O_TRACEEXIT event, but the WIFEXITED event will be delayed |
| until the entire process exits. This state exists so that shells |
| don't think the process is done until all of the threads have exited. |
| Unfortunately, signals cannot be delivered to non-waitable zombies. |
| Most notably, SIGSTOP cannot be delivered; as a result, when you |
| broadcast SIGSTOP to all of the threads, you must not wait for |
| non-waitable zombies to stop. Furthermore, any ptrace command on a |
| non-waitable zombie, including PTRACE_DETACH, will return ESRCH. |
| |
| == Multi-threaded debuggers == |
| |
| If the debugger itself is multi-threaded, ptrace calls must come from |
| the same thread that originally attached to the remote thread. The |
| kernel simply compares the PID of the caller of ptrace against the |
| tracer PID of the process passed to ptrace. Because each debugger |
| thread has a different PID, calling ptrace from a different thread |
| might as well be calling it from a different process and the kernel |
| will return ESRCH. |
| |
| wait, on the other hand, does not have this restriction. Any debugger |
| thread can wait on any thread in the attached process. |