How do you navigate a large unknown codebase?

2017-07-09

One of the projects I worked on while I was at the Recurse Center was a research project, which attempted to answer the questions:

A few months ago, navigating large projects seemed like a daunting and mystical task. At the Recurse Center, I spent time exploring opensource projects with more experienced programmers. Those experiences brought home to me that there are strategies people use to do this - it isn't magic - and inspired me to try and find out what they are.

I spent the last week of my batch at the Recurse Center researching this (after spending a couple of weeks planning and discussing it). I learnt a lot during that week, and the process was fascinating. What I found changed my understanding of what I wanted to find out, and opened new questions. I also learnt about how to better carry out the project, something I am still learning. I realized that I wanted to continue working on this project past that week; I decided to do a part two.

Being at the Recurse Center not only facilitated this project, but allowed me to have the realizations that led me to start it in the first place. I was surrounded by programmers of different backgrounds and experiences, who were incredibly generous with their time and knowledge, for three months.

This post will briefly cover:

I will not go into too much detail, as I am hoping some of the people who read this might want to help with part two :)

how I carried out the research

I found an opensource project, and a task to do on that project. I found people who were comfortable working on large unknown codebases and asked if they had an hour to help me with the project. In the end I did this with seven different people. People were so generous with their time that I didn't have chance to work with everyone who wanted to help in that week.

I spent an hour or so with each person - I would give them the task and let them work on it for 45 minutes to 1 hour, and ask questions at the end. This wasn't enough time to complete the task, but the project is about the process rather than the end point. I recorded what they did by taking screenshots using Capture Workspace, and taking notes.

The way I captured people's thought processes changed slightly over the week. At the beginning I didn't want to interfere with the process at all, so I took notes and tried to avoid asking distracting questions. Sometimes I'd ask why they had done something at a particular point at the end, but they had taken so many small decisions throughout the hour that it was difficult for me to keep track, and for them to remember why they'd done each one. After a few sessions I realized that the approach people take is a lot more important for this project than the tools they use. I started to see patterns that made me a lot more interested in why someone does something than what they did. I started asking people to try to vocalize what they were thinking, without feeling the need to explain, and asking questions to clarify when I needed to, and that worked pretty well.

some of the things I found in part one

When I started this project, I think I was expecting to come out at the end with a list of tools and strategies people use. I am now thinking about what I learnt from part one in the following categories:

I came accross several tools (ie. tools people have installed, or even scripts they have written themselves) and strategies (eg. close reading of names of files and within the code). And I saw that having a mental toolbox to draw on is really important, but I started to realize how important the other aspects are. I was surprised to find that approach seemed to be more important than the tools used.

I noticed a difference between macro strategies, which were more about a problem solving approach that guided people through the whole task, compared to micro strategies, which were more about a specific action. There were a lot of similarities as well as differences in terms of problem solving approaches and macro strategies. There were some approaches that everyone seemed to use to different degrees. Getting to see each person's style was one of the most enjoyable parts of this project. It's also the area about which I have more questions than answers, for example understanding the macro strategies I noticed better. I want to learn more about this in part two.

I noticed a lot of similarity in the attitudes people had. Everyone was very positive - there was a lot of curiosity and even joy. People would say things like 'that's so cool', 'neat', and 'wow' a lot - even when something didn't work! I started to ask people about this. One person said they 'deliberately learned to see challenge and learning like that'. Someone else said that they didn't feel like that when they started working with large codebases, and that they had cultivated this attitude. Another thing I noticed was that people were comfortable with the unknown. If things didn't work, or they didn't know something, it didn't reflect on their ability as a programmer - it was a neutral fact, or even something to be expected. I was told 'things not working a lot of the time is part of programming'. One person described this as a chance to learn something. There seemed to be a shared confidence that whatever the problem is, it can (eventually) be fixed.

part two

Some changes I will make for part two:

I have a few people to work with for part two, and I am looking for more. If spending an hour exploring an opensource project sounds like fun to you, and you want to help me with this project, please get in touch.