in Hacker School

Getting Process Invocations: The Hard Way

Typical output from running ps aux includes a bunch of lines that look like this:

stan 25906 0.0 0.0 832424 1428 pts/5 SNl+ Oct29 0:46 go run youtube-gif-go.go

The information on the right-hand side is the process name. Unless modified, this information represents the invocation of the process — the letters that were typed* to start the command.

The ps utility gets this information from /proc, a virtual filesystem that the kernel populates. Each process gets its own directory, and within that directory there’s lots of interesting files. The shorthand for talking about these files is to say “/proc/pid/...“, which makes it easier for people to write about (instead of writing “/proc/<process ID number goes here>/...” lots of times).

The ps utility reads /proc/pid/cmdline, a file that holds the string that was used to start this process. It’s a null-terminated, and null-separated string. So if you run “cat /proc/pid/cmdline“, you’ll see something like this (using the example process from above):

gorunyoutube-gif-go.go

If you run cat again with the --show-all flag, you’ll see:

go^@run^@youtube-gif-go.go^@

The “^@” symbols mean the null character. Other than blindly memorizing this, you can always figure it back out with man pages! man ascii says that the “character” for NUL is “@”, and the man page for cat describes the use of the caret (“^”) symbol for non-printable characters.

So the ps utility replaces those NUL characters with spaces and shows them to you like this:

go run youtube-gif-go

How does /proc/pid/cmdline get created?

Files under /proc are populated by the kernel. When you read a file under this filesystem, the kernel gives you the answer. You can use regular Linux tools to access information that’s normally only inside the kernel, which is why the proc filesystem is so powerful.

Finding out what the kernel actually does to populate the /proc/pid/cmdline file requires some code-mining skills. The short answer is that it steals the information directly from that process’s stack. Unlike some of the other files under a process’s /proc directory, the cmdline file actually reads from an offset inside the other process.

It turns out that we can do the same thing!

Enter processname-trbl

I wrote processname-trbl to demonstrate how you can take the seriously scenic tour on the way to getting a process name. It takes a process ID number as a command line parameter. Since this tool takes advantage of a couple different files under proc, it may be helpful to cross-reference this post with the proc man page, available under “man proc”. It’s also important to note that you don’t need a fancy compiled program to do this. If you follow the steps below, you can get the same information using standard linux commands in a bash script. When I started exploring this, my prototype just a short perl script that converted some hex to decimal and generated bash commands.

The top-level summary is that this code goes through a process’s stack memory and finds the process name. This works because when a process starts, the kernel saves that information at the bottom of the stack. This can also be confusing terminology, but this post, titled “Where is the top of the stack on x86?“, does a fantastic job of explaining. It’s like having a sheet of paper: the kernel writes three pieces of information and tries to make sure the last word lines up perfectly with the bottom right corner of the paper. The rest of the sheet above it is where the process can write.

But first it needs to know what memory to look at. It does this by reading that process’s /proc/pid/maps file. This file shows where different things are mapped to in a process’s address space. You can take a look at what this file looks like by running cat /proc/self/maps. The “self” name is special in proc, and instantly means “show me the maps file for this process that I’m about to run right now”. It’s very useful for exploring, since you don’t need to look up a process ID beforehand.

Towards the end of the output you’ll see an entry named “[stack]”. This is the location of the process’s stack. This is the chunk of information we want to steal. This post on Memory Layout and the Stack describes the stack in more detail.

The file /proc/pid/mem lets you access the memory of a running process. There are rules about how you’re allowed to access this and it limits how you explore. The /proc/pid/maps file told us where to look inside this file by giving us the address, in hex, of the stack. We can convert that hex to decimal and skip ahead that many bytes into the file to start reading only the stack. We seek to the offset we read from /proc/pid/maps and start stealing information from the bottom of the stack.

There’s some additional trickery that involves more kernel code reading about specifically where to read from the stack. When a process starts, the kernel writes the process invocation, the environment variables, and then the filename all at once, with the goal of making the end of the filename line up with the very bottom right-hand side of the stack/sheet of paper. So we start at the very end and read backwards. Since it’s backwards, we see these values in the opposite order:

  1. The filename.
  2. The environment variables that were set when the process started. The values can be super long or short, so we get the length of the environment from /proc/pid/environ. We go backwards that many characters to get the next part.
  3. The invocation! This is the same information that gets put into /proc/pid/cmdline, which is used by ps to show process names.

That’s it! By now we know how the program is called, and can steal that information right away. The processname-trbl command prints lots of help text to standard error, but the short output is proof enough:

$ ./processname-trbl 25906 2>/dev/null
go run youtube-gif-go.go

That’s the dime tour of /proc and some weird things you can do. Remember this the next time you’re somehow magically stranded on a system and need to get a process invocation without using /proc/pid/cmdline 🙂 .

(PS: There’s a lot of things I had to skim over in this article to keep it a reasonable length. If you’re interested, there’s lots more for you to explore! Look into what it means to ptrace a process and why that’s relevant to this exercise, the Yama security module and its corresponding ptrace_scope flag, how to set a process title, and the difference between /proc/self/comm and /proc/self/cmdline. Please reach out to me if you find anything neat!)

* not necessarily typed out by a human at a keyboard

Write a Comment

Comment