I spent a good chunk of last night working more on my minimal pstree clone. I turned a very nice looking codebase into a less-nice-looking codebase. I wanted to see how I’d go about making it faster.
I’m not sure who needs a blazing fast pstree program, but it was fun and I’m in HackerSchool, so I figured I’d give it a shot. If premature optimization is the root of all evil, then I am a leaf.
I wrote a small benchmark that would run the Populate() function, the main work in my clone:
BenchmarkPopulate 500 5910036 ns/op
Benchmarks in Go are just for-loops, where the benchmark decides how many iterations to run. If the code takes a long time, it’ll do less iterations. In this case, it ran Populate() 500 times, with an average run time of 5910036 nanoseconds per run, which is 5.91 milliseconds. It’s pretty fast.
But not fast enough! Where can the fat be trimmed? I profiled the code using Go’s profiling suite. It’s easy enough to do:
- Run the benchmark with -cpuprofile cpu.out
- Run the profiler, giving it the cpu.out file created above, and a compiled version of the benchmark. It uses the symbols from the binary to help debug.
You can do lots of stuff with the profiler. I started by looking at the top 10 functions appearing in the CPU trace. You can also generate SVGs that help to visualize where most of the time is spent. I ended up dropping the benchmark scores with several iterations of changes. After all the changes, my benchmark was looking like this:
BenchmarkPopulate 1000 1560530 ns/op
It got a lot quicker. I was going to describe the changes in the rest of this post, but it would make this post terrible to read, so I’ll write a smaller post for each change. The big takeaway here is honestly that premature optimization is bad and will affect the readability of your code. You lose out on describing your intent for sake of speed. It’s okay for me to go wild here because this is Hacker School and everyone can be crazy if they want. Plus, I wrote the pretty version already and opening the box is way more fun than collecting it.
Here are the followup posts: