Matt Robinson

My nerdy blog

Starting at LivingSocial

I’m a little late to write this post since I’m already starting my second week, but I’ve got a new job with LivingSocial. I’m excited about the projects that I’ll have the opportunity to work on and they’ve been hiring a lot of really awesome Ruby developers, so I hope to learn a lot from my new coworkers. I’m in Washington DC right now for my week of orientation since that’s where the company is based, but I’ll be working full time in Portland still, in fact just 5 blocks from my old office.

Leaving Puppet Labs was a difficult decision since they’re a great company that’s also got a lot of great stuff coming down the pipeline. They’re improving, growing and innovating, and I wish them the best to continue to do so. They’ve got a lot of great people with more starting all the time – the company is many, many times the size it was when I started at around 10 employees. I’ll really miss being paid to work on open source software and doing so much work on non-web applications. I like web apps, but it’s certainly nice to do something else from time to time.

I’ll still be looking for ways to contribute to Puppet as I can, now as a community member – maybe I’ll finally finish that patch to have command line completion for Puppet – but first I’ve got a lot of new code to get familiar with.

Rank Your Git Contributors Lines of Code (Then Be Careful What You Do With the Data)

Lately at work there’s been a bit of an obsession with metrics – I’m not sure it’s healthy. Code metrics can be interesting, but in the wrong hands or used the wrong way, they are misleading and harmful. It’s still to be seen what the outcome will be in this case. I’m more than little worried.

That said, even I like to get some numbers on code, especially unfamiliar code, to help me figure out a few things. For example, I’ve been using CLOC (Count Lines of Code) for years to find out quickly how big a project is and what languages are used in what proportion.

The first metric that’s started to be tracked at work is lines of code added and deleted per release, per author. I’ve occasionally glanced at these kinds of numbers in the past too, and even wrote a little script (git-rank) to get the same kind of info when I was on a code deletion spree. I’ve found these numbers to be useful for getting a general idea of who is most knowledgeable about a project or some files within the project.

However, the numbers can easily be misleading. For example, if someone is committing a lot of style reformats, vendoring code from other projects, or autogenerating documentation, their numbers will be very high but they may know very little about the codebase. For example, for a release to a Rails project I’ve worked on the metrics gathered showed that I had added 248,475 lines and deleted 458,020 lines. That’s obviously suspicious, although it does make me look very active. These are the kinds of numbers that you should dig a little to figure out what’s happening, which is why I wrote my little git-rank script makes it easy to break those line totals down by file, and then exclude files for the next count.

$ git rank v1.1.1..v1.1.9 --all-authors-breakdown --author "Matt Robinson"
Matt Robinson         603061
                      1 vendor/gems/json_pure-1.5.1/tests/fixtures/fail14.json
                      1 vendor/gems/json_pure-1.5.1/tests/fixtures/fail12.json
                      1 vendor/rails/railties/test/vendor/gems/dummy-gem-a-0.4.0/lib/dummy-gem-a.rb
                      ........
                      21418 vendor/gems/haml-3.0.13/lib/haml/precompiler.rbc
                      24953 vendor/gems/haml-3.0.13/test/sass/engine_test.rbc
                      30038 vendor/gems/haml-3.0.13/test/haml/engine_test.rbc
Matt Robinson         603061`

It’s pretty obvious that most of what I did was update vendored gems. If you’re trying to figure out who did the most work between the releases I specified, you probably want to ignore the vendor directory when count lines of code.

$ git rank v1.1.1..v1.1.9 --exclude-file vendor
Josh Cooper           4
Nigel Kersten         8
Andreas Zuber         11
Michael Stahnke       11
Jacob Helwig          15
nfagerlund            37
Daniel Pittman        283
Max Martin            418
Nick Lewis            543
Randall Hansen        901
Pieter van de Bruggen 1018
Matt Robinson         1661

Now we’re starting to get a more realistic picture. From here we could dig deeper by listing all the files again for all the authors, and then maybe start excluding lines based on a regex if there were automated changes, and then maybe look at additions vs deletions in the count (I’m summing them together for these numbers).

However, I hope people never think they can get the whole picture, or even most of the picture, of a code base and who is contributing the most from something as arbitrary as lines of code counts. I feel like that’s almost too obvious to say, but I’ve heard horror stories of managers who tie reviews to metrics like this. I hope I never personally experience such a thing.

P.S. If anyone tries using the ‘git-rank’ project I mentioned above, please keep in mind it’s a hacky little sick project that is probably got some bugs and you should take the numbers is spits out with a grain of salt. What a weird phrase that is.

Open Source Platform Benefits From Closed Source Features

When I first started working for Puppet Labs, one of the main reasons was that I would get to contribute to open source software as a full time job. However, for much of the summer and fall I was leading a team to create a closed source feature built on top of the open source projects in the Puppet ecosystem, and it turned out pretty well if I do say myself – Live Management. It’s a web app addon to Puppet Dashboard that exposes the command line interface of the cool server orchestration framework MCollective. MCollective lets you execute commands on large numbers of systems in parallel, including doing things like triggering Puppet runs, starting and stopping services, or gathering lots of realtime information about your infrastructure from Facter. It’s like the big red button for your entire infrastructure: so easy to use even your manager can now blow up your infrastructure with a few clicks.

Besides building on the open source Puppet projects of Puppet, MCollective, Puppet Dashboard and Facter, we also used a ton of other open source software directly: Rails, Sinatra, BatmanJS, JQuery and many more indirectly. One of our team members pvande was able to contribute a lot of code and help back to the BatmanJS project – we had to since it’s alpha software and we were using it in a commercial application.

As I worked on the project, I sometimes felt sad that it wasn’t being open sourced. I knew the rationalization that at some point the company has to make a product that people will pay us money for, but that wasn’t the rationalization that helped me accept it most at the end. What helped was knowing that by building on top of other open source projects, we ended up contributing a lot back to them. Puppet got bug fixes and small new features to help us manage resources, MCollective got plugin improvements and feedback on how it can better support web app use cases and scale testing, Puppet Dashboard got some CSS cleanup and UI improvements, and BatmanJS got lots of real world use, feedback and code.

Besides, it’s free to try out for fewer than 10 nodes, and since it’s written in interpreted languages, people can view the source code if they really want to. There’s nothing stopping someone from writing a similar thing open sourced, and now we’ve done some of the hard work of improving the platform they would also build on. This does mean that we’ll have to continue to innovate and improve on our closed source features that sell product, but also on our open source platform that enables anyone to build their own features make their life managing systems easier.

Tmux Is Awesome - Even Better Than Screen

I’ve been using GNU screen for years now as a way to deal with all the terminals I need to use at once (terminal multiplexing is what I talk to my wife about when I know she doesn’t want to hear about work). I started out using it, as many do, because I needed to do some work over ssh on a remote terminal, and losing that connection was painful without using screen. After getting used to it, I found I liked using it even when I wasn’t working over ssh because it gave me a way to easily manage a bunch of terminal sessions.

Then one day I wanted to do remote pairing and had heard of people doing this in screen. It’s a bit of a pain to setup with screen (shared login or chmod to set uid bit), but then I discovered tmux – and realized how many things screen didn’t have that I now don’t want to live without. The main thing being vertical splits. I used to think vertical splits were just for vim sessions, but I was wrong. The tmux faq has a much fuller list of differences from screen that are nice (auto naming windows, more intuitive help, easier multiuser, multiple sessions). In tmux vs screen tmux wins hands down. The only things I’ve found I miss about screen is that my vim sessions were part of the scrollback buffer, but in tmux they disappear from tmux’s scrollback when you exit vim.

Why not iTerm2, Terminator for Gnone, other OS specific terminal app here? Did I mention they’re OS specific? I want a terminal solution that works on any terminal the same way, and over ssh connections. Oh, and open source and BSD licensed is nice too even if I never look at the source code.

Version Control Your Configuration / Dot Files

I’ve been meaning to write this post forever, but once I started doing what I suggest here it seemed so obvious that I have trouble remembering what life was like before I did it. Once you’ve been working in a terminal for long enough, you realize how much more productive you can be with your aliases, scripts, prompts, plugins and other configurations setup the way you like. This becomes especially apparent when you either work on someone else’s machine, or something happens and you have to rebuild yours and you realize just how slow you are in comparison.

How do you avoid this slowdown and prevent loss of all those great hacks and configurations you’ve built up? Version control all the files that enable your productive environment. Here’s my repo for all those files.

https://github.com/mmrobins/config-files

I’ve got bash, zsh, vim, screen, tmux, irb, tmux, irssi, ruby scripts, puppet scripts and more in this git repository. I spend a lot time working remotely via ssh or on virtual machines on my development box, so it’s important not only that I have these files available, but I also have a fast way to get all these files in place quickly. I also want changes that I make to these files to be easy to commit back to my repository. To accomplish this, I make all these files symlinks to the files in my repository so that if I edit ~/.bashrc with my great new alias that saves all sorts of typing, it’s actually editing the .bashrc in my git respository. Then I have a little perl script that when run puts all these symlinks in place for my automatically and backup up any files I’m moving out of the way to create the links, just in case.

https://github.com/mmrobins/config-files/blob/master/create_symlinks

Now whenever I end up in a new environment on the terminal, I can get all my productivity in place with a few simple commands from the home directory.

git clone git://github.com/mmrobins/config-files.git
./config-files/create_symlinks
source .bashrc

Now vim has my plugins and configuration, bash has my prompt, if GNU Screen or tmux are available they’re how I like them, my aliases work, etc. There’s always going to be tweaks necessary for this to work cross platform and on different machines with different privileges, but since committing those changes back to the git repo is so easy, it become a natural part of my workflow to have these incremental improvements to my work environment available everywhere I work.

Working at Puppet Labs

I’m excited to say that I started a new job last week at Puppet Labs. This means I’m back to working with Ruby code (I don’t think I’ll miss Perl much), not going to be doing much web development and pumped to be working on an open source project. The development team is small and really smart, and it’s fascinating to see how things operate in a startup company that’s based on a widely used, mature code base with a lot of community involvement.

There’s a lot of new things to learn, and finally working on a Mac and retraining my fingers on where the control key is may be one of the hardest. Just kidding, although that has been more of my brain power this last week than I’d like. Some of the cool new areas I’m looking forward to exploring in Puppet include parsers (Puppet has it’s own language), client server models other than with a web browser and working with directed graphs (finally going to get a chance to use all that graph math from college).

I’ll miss my coworkers at Rentrak and wish them luck with their code. I’ll be going to Open Source Bridge next week, so I’ll see some of them there I assume.

Reputation Addiction on Stackoverflow

I’ve run across Stack Overflow plenty of times in the last year or so while looking for answers, but until now I hadn’t been motivated to sign up and post questions and answers. I finally decided it might be a good thing to try out when I saw they had a jobs section, because presumably the company you’re applying to might be impressed with your participation on the website. So I signed up thinking I’d give it a try and quickly lose interest as I always have participating in message boards or IRC since I always found the signal to noise ratio to be very low.

However, the simple little reputation system they’ve implemented on Stack Overflow has kept me interested for a least a couple weeks – probably will for longer. The reputation points makes me feel like it’s a something of a game where I’m trying to improve my score. I’m just over 400 reputation, and have started to look around to see what silly little badges I might be able to easily earn. I’ve even put up one of their silly flair badges on this site for now. We’ll see how long that stays up.

In fact, I think the minor addition of these badges helps make the whole reputation system more fun than a lot of other web sites that do ‘points’ or ‘karma’ or something else to measure your participation. I’m specifically thinking of the Y Combinator Hacker News Site that has these points. I tried to get some for a while, but I feel like there’s too many people who just sit around posting links and spouting out useless comments all day to compete for mind share. Even stack overflow suffers from this a little bit, where often the first person to answer gets a ton of points even if a better answer comes along later. Overall though, the community feel is still really good and there’s a nice balance of users who have enough earned power to do things to organize the site, and users who just need an occasional question answered. I’ll have to remember this as resource when I get stuck on things.

Update – Jan 3 2010 – The initial fun has worn off some, partly because you have to be almost the first person to respond to questions to stand a chance of getting your answer accepted. I just don’t have the time to watch for new questions that frequently. What I will say is that there’s a treasure trove of great answers that I’ve been using as a resource more and more often. It’s hard to even come up with original questions to ask for points. Guess I need to start working on harder problems – or at least more obscure ones :–)

Flickr Finally Has Competition - Cheaper Storage on Google Web Albums

I’ve been using Flickr for quite a few years now because it’s a great website for photo sharing. Perhaps most importantly to me, it’s offered super cheap online storage – less than $25 for unlimited storage. However, offline I’ve been using Google’s Picasa software to do all my photo organization – tagging, captioning, touch ups, sorting into albums, and recently geotagging and face recognition. The one thing that I’ve really wanted was a great way to sync between Picasa and somewhere online – but for cheap. Google was charging $75 for 40GB, which is about how much photo data I have that I’d like to backup – way too much. Now it’s $20 for 80GB. Sweet. I’m in the process of uploading everything now.

Not only does Google’s Picasa Web Albums sync data between online and desktop, it syncs tags and captions. Now I can upload my photos and tag them from anywhere with an internet connection if I get the urge to organize. Then I can sync those changes back down.

So far the main thing I’m missing from Flickr is the post to blog feature. I’m finding a ton of plugins for Wordpress that supposedly help with this, but I just want one good one. I tried one already called Goldengate that used up all my PHP memory. No thanks. I tried another I don’t even remember already that was just terrible. I’m sure a good one is out there, and once I find it, I think I’ll be ready to switch over from Flickr completely.

Master of Software Engineering From PSU

I’d been wanting to take some computer science courses for a while now because after being out of school for five years, I miss it. Besides, my employer pays for a bit of the education costs. But it’s hard to find graduate level courses that are convenient times and locations. I stumbled across the OMSE program (Oregon Master of Software Engineering) from someone’s blog post that was linked to from an Ignite Portland web page, and saw that they offer online class for most of the courses, and convenient evening times for the face to face classes. Bingo.

I’m taking the first course, OMSE 500, Principles of Software Engineering, right now. The course is an overview of the rest of the program, and it’s helping me realize that software engineering is quite a different topic from computer science. The course is all discussion based about topics such as project management, system architecture, development methodologies and other high view topics, but we never actually look at code. I’m not sure if I like that or not yet. I wonder sometimes if I would get burned out on code if I worked a normal week coding, and then had classes where all I did was code on top of that. On the other hand I feel like the OMSE courses will prepare me more to be a project manager than a better programmer.

What I like best about the course so far is getting a lot of perspective and stories from the fellow students. A prerequisite for taking the classes is that you’ve been working in software development for a few years, so it’s interesting to hear about the real world problems that people face as opposed to the type of fellow students I had when I was finishing my undergrad where almost nobody had any real experience outside of homework assignments. It really drives home the point that with software being as ubiquitous as it is now, some of the biggest challenges in developing it are managing how programmers work with each other since there’s not much that is done by a single person anymore.

I plan to at least get the certificate, which is 5 courses and should take me a little over a year. The full masters program is 13 or so courses, but I’m not sure yet that I wouldn’t rather focus on more computer science courses. The biggest downside to the program is that it’s expensive – over $1500 for 3 credits. I can’t imagine paying that if work didn’t chip in for most of it. I suppose they charge a little extra since most people enrolled have their employers paying for it.