Empirically, I Have No Idea
How We Don’t Know What We Know
Previously on Locally Sourced. Who can even remember? Sorry, it’s been a while. I’ve been stuck writing two-thirds of posts and not quite wanting to finish them, but if you are reading this, I assume that means that I finally ended this one. In the meantime, I’ve sold a book-shaped object to Pragmatic that will be about Tailwind CSS. It’s part of the Pragmatic Answers line, which means it’ll be short, and it’s what I’m doing to keep me busy until the Stimulus book is releasable.
If you are reading this, you probably have strong opinions about what makes developers and software teams work better. You might believe that TDD is great, or that Object-Oriented design lowers the cost of change. Or you might think that immutable functional code lowers the amount of defects. Or that pair programming makes things better. Or that Vim makes you a better coder.
How strongly do you hold these beliefs? Why do you believe them? What would it take for you to change your mind?
I find the change your mind question really interesting, especially because there are so many things in software that feel like they should have empirical answers and just, you know, don’t.
For example, I have a strongly held belief that TDD can help me write better software and that I enjoy writing software with TDD more than I otherwise would.
That belief came to me first from trusted experts telling me TDD would be helpful, and then, critically, from my own experience. (I had a really good first TDD experience, and that probably makes a big difference – see this conversation with Sandi Metz.) That said, my belief in TDD is weaker than it was a few years ago, now I say TDD “can help”, ten years ago, I probably would have said “will help”. Changing my mind here was a mix of personal experience (especially the experiences I had teaching TDD and also trying TDD on especially web design problems), trusted experts, and trusted novices telling me it was hard.
I have a mildly strong belief that Single Page Apps are more complexity than most web sites need. This comes partially from my first real try at an SPA (previously I had been on-board with them), but has also been reinforced by continued experience, even though I know a lot of people disagree with this one. My mind could be changed on this one, I think, I vaguely suspect that there’s a future generation of frameworks that will help.
Thinking about this, I change my mind based on:
- Personal experience. Where possible I like to try one new thing in a project so as to widen my personal experience.
- A fairly small group of trusted experts, some of whom I know and some of whom I don’t, who I have agreed with enough in the past that when they say something surprising I take note. (For example, both Dave Thomas and Penelope Phippen have talked about doing less TDD, causing me to at least think about what I believe.)
- A much larger group of people I work with or encounter in the community, especially if there’s a trend of people. This is often more just a way of finding new ideas to evaluate.
Here’s one way it works. When I first saw Tailwind, I thought it looked awkward and hard to use. A trusted expert (Gary Bernhardt, IIRC) wrote about how useful he was finding Tailwind, and a lot of people around me seemed to be talking about it, which convinced me to give it a try on a side project, and I liked it quite a bit.
You may notice that empirical data in the form of scientific proof is absent from the list of things that change my mind. If you know me, that might seem weird, since I actually studied how programmers learn as a grad student and have read many empirical studies about programmers. And even wrote a couple of them.
But when I think about what kind of “scientific” data that would make me change my mind about, say, TDD, it’s really hard for me to imagine what the data would be that I would find so convincing that it would cause me to doubt my own experiences. I think this is precisely because I spent so much time reading social science papers of various kinds and also preparing the data in my own work. There’s nothing like trying to turn a big pile of data into an argument to make you skeptical of other people’s big-piles-of-data-based arguments.
The immediate cause of this whole post was a blog post looking at evidence that short methods lead to more bugs. And I don’t really want to re-litigate that argument. There’s some data, it’s suggestive. It works against a reasonably strongly held belief. The methodology is questionable enough that if you want to question it, you certainly can.
But it did lead me to think about why exactly I find this kind of empirical data so unsatisfying and not effective at changing my mind.
First off, What makes code “better” is a highly complex social science question. You can’t just say that some technique caused a coder to do a toy problem in 15 minutes instead of 30 minutes, say it makes the coder twice as fast and call it a day. There are long term effects, there are effects that might only show up with dozens of developers on the same project. The prior experience of the developers is probably relevant.
Successful coding is a complex problem with a lot of potential confounding factors. And while you can do research in that kind of environment (social sciences do move forward, after all), it’s not as straightforward a scientific method as you might use in physics or something.
You can’t do a double-blind study of programmers in the real world… well, you can, but you have to make a lot of simplifying assumptions. You either need a toy problem, or a very specific set of developers or some way to limit the confounding factors.
In all these cases, the problem is not necessarily getting something that looks like a significant effect, but making the leap that the effect will apply to environments other than the original test. That’s the part that makes it hard for these studies to be convincing.
Social science has other mechanisms, including research that is more qualitative or descriptive. The goal is more to explain how pieces work together than to prove that one method is better than the other. In some sense, we kind of all do this informally as we take good practices from one project to the next. That said, I’m not aware of work that has tried to do this kind of study of programming teams more formally – I’m almost sure that it’s out there somewhere (I’d bet that most really big tech companies have done something like this), but I don’t think insights from that kind of work have spread.
I don’t know what my conclusion here is, except that it’s probably worth understanding how little foundation there is for any practice we do, and be flexible in the face of people or teams that seem to be successful doing something else. Oh, and read Leprechauns of Software Engineering, which is a very good book about how claims without data can take on a life of their own.