in Hacker School

Hacker School Day 39 – pstree and Weird /proc Behavior

I wrote a little pstree clone in Go. It’s a very minimal implementation and just scrapes /proc to get information about each process. The process data includes the parent PID, and then a second pass over that data matches children to parents. It’s got proper Godoc-formatted comments and I think it’s structured nicely for how tiny it is.

I’ve done a lot of working before in /proc at Linode, through various internal projects and primarily when I worked on Longview, a Linux system statistics collection service.

I noticed that my code’s output was different than the output of pstree. I didn’t originally realize that pstree includes threads in the output. So when I ran:

pstree -p

(to include the PID number), I saw PIDs that didn’t exist in my output. It went like this:

  1. Run pstree -p
  2. See PIDs I don’t have in my output, even though I’m globbing over /proc/[0-9]*
  3. Become curious and run ls /proc/<weird PID>
  4. See that it is actually a directory with contents like a PID would have
  5. Run ls /proc/ | grep weird PID and see nothing

It turns out that /proc actually contains directories for threads at the top level, and they don’t exist until you ask for them. You can confirm this yourself, like I did, with plain old brute force:

Typically, in order to get thread information, you look in a specific PID’s directory under the /task directory. There you can find subdirectories each named as a thread ID. It’s especially odd because the proc filesystem is normally really great with providing symlinks to entities that really live in other places. You can see this by looking at the symlinks under /proc/self, which is itself a symlink to the directory, named after the PID of the command you just ran (whoah):

find /proc/self/ -maxdepth 1 -type l -ls

You’ll see a few items, including entries like /proc/self/exe, which is a symlink to the executable (in this case, /usr/bin/find).

I checked the source of pstree with apt-get source $(which pstree) and confirmed that they’re doing the sensible thing and scraping /task, but you could blindly barrel through and check for every possible PID up to PID_MAX.  You can see PID_MAX on your system by running: cat /proc/sys/kernel/pid_max

I looked through the source for the /proc filesystem for a little bit to see if I could find a reason other than a missing if-check. In my head, it would be bad to make these actual symlinks because it would break the applications that just glob over /proc/[0-9]*. So unless there’s another neat reason (and would give me something better to search for), it seems like they shouldn’t be accessible.

I’ll keep looking in the meantime, in between the TCP congestion algorithm stuff, which has been on a minor hold. I spent a lot of time yesterday going through the recent IETF journal that came out, and ended up with a ton of neat links that might be helpful. I’ll throw the good ones in the next blog post 🙂

(I’m not sure if it’s day 39. I skipped Wednesday – Friday of last week for Thanksgiving. Either way, there’s not much time left.)

Write a Comment