in Hacker School

Hacker School Day 21-22: Programming Pearls

I’ve gone back to working through Programming Pearls in Go. Prior to Hacker School I had finished the first column’s exercises in order to learn Go. It’s been fun working through the problems, since they’re mostly self-contained. It’s easy to spend as much time as I want working through all of the different ways to solve it, and after I finish, I’m essentially starting fresh.

Sad Code

I tried to clean up my old code unsuccessfully. I did lots of weird things — I think I was getting tired of Go’s verbosity at that point and tried swimming upstream instead. I wrote a function that generates benchmark functions, realized the Go testing package wouldn’t pick up those functions, then ran those benchmark’s under the testing side (instead of the benchmark side). It short, it’s a mess.

In reviewing the old Go code, I was also getting used to strongly typed code again. The first column from Programming Pearls deals with bit vectors, and the code has a good amount of casting going on that could be avoided. There’s some other repetition going on as well.

Binary

On Monday I played with the encoding/binary package. I wanted to write a file with lots of numbers in a concise format. The first column of Programming Pearls drives a lot of the questions using files filled with numbers. In my tests for the first column I generated a newline-delimited file of integers. Creating these and reading them back in (and parsing them as integers) takes a while and made the filesize much larger than it had to be. Using the binary package let me create huge lists of integers using less file space and let me read those numbers in quickly.

Using the binary encoding made debugging a pain in the neck, and reminded me of the Unix philosophy:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

It wasn’t so bad though. I was writing 32-bit integers to the file in big-endian format. It was easy to take a peak at the top of the file with head and xxd:

head -c4 file.txt | xxd -b -c4

I said I’d avoid perl for the rest of Hacker School, but it was too useful here 🙂 I threw together a one-liner for listing the numbers in the file. This reads the whole file in, and prints out each number:

perl -0400 -n -E 'say $_ for (unpack("N*", $_));' < file.txt

Go Profiling

I took another detour when I tried profiling the binary encoding code. First, I tried profiling on my Mac. The profile is useless on Mac:

737 100.0% 100.0%      737 100.0% runtime.mach_semaphore_timedwait

It looks like the bug report for this is still open here. Determined to see how awesome my new code was, I switched over to my Linode. I generated the CPU profile, loaded up the profiler, and saw lots of calls to the reflect package. This was confusing and requires some explanation. In my encoding code, I pass a slice of uint32 integers to encode’s Write method. Looking at the code for encode/binary’s Write method shows the following structure: a giant switch statement for commonly encoded things (like a slice of uint32 integers), and then a fallback block of code for everything else, using reflect. So why was my code using reflect?

I went to do what any sad programmer does and decided to add print statements to the core Go code. When I opened the file, I noticed I didn’t have the chunk of the switch statement for uint32 slices. I was running Debian’s packaged version of Go, which is behind the latest release. I used this article for creating .deb files for latest Go and installed the newest version of Go. I profiled the code again and…

I got this:

264  65.3%  65.3%      264  65.3% etext

“What is etext?” Good question! Nobody knows — it’s a bug in Go 1.2. I stopped profiling for the day.

Binary search

Column 2 is titled “Aha! Algorithms” and covers some fun problems. It’s also fun seeing the difference between the two editions of the book. In my first edition copy, the question is essentially “Given a tape containing 1 million 20-bit integers, find an integer not in the list. How would you do this with limited memory?”. The second edition is the same, except with 4 billion 32-bit integers.

The discussion centers around the idea that you can find an integers not in the list by dividing the list repeatedly. Iterate over the integers by looking at the first bit of each number. If it’s a 0, write it to a file, and if it’s a 1, write it to a different file. The file with fewer numbers in it becomes the target for the next search. On the next pass, look at the second bit to divide again. In the end, you’ll hit a point where one of your files (zeros or ones) will be empty, representing a number that isn’t in the list.

I finished implementing this yesterday and discussed the answer with some of the people here at Hacker School. We debated this answer versus a disk-backed sort-and-iterate solution (naive), and the more interesting disk-backed bit array answer. The conclusion was that at the least, this one wins since you’re not writing all over the place. I’m going to try to throw both of these answers together and compare the benchmarks.

I’d also like to see how easy it would be to visualize the search as it happens. Even some ASCII art would be great. I have two pages in my notebook that look like a crazy person came in and started drawing trees.

So today I plan on continuing with this problem from programming pearls, along with attending some talks from other Hacker Schoolers.

Write a Comment

Comment