Another Refactoring Story: ActiveRecord Lists
I’ve now tried to write this post like three times, and one thing that’s clear to me again is why you don’t see more write-ups of even moderately complicated real-ish problems. It’s hard to get enough of the context across to make the code sensible without being overwhelming.
But there’s a Rails and ActiveRecord structure that this example gets to that I think is useful, so I’m going to try again. (Spoiler alert: the trick is refactoring ActiveRecord has_many
lists to wrap them in a dedicated list class.)
Here’s the context…
Elmer is a project tracking tool and the main unit is the Card
. I’ve talked about Elmer’s code a few times already.
Cards have some data attached. For our purposes here, the important attribute is status
, representing where the card is along its path. The current path is “attic” to “unstarted” to “started” to “done”. Cards can move back and forth through the status path, so a card could be “done” but then it might retreat back to “started” for some reason or another.
For project data collection purposes, Elmer keeps track of when a card enters a new state. There’s a class called Calculator
that takes a card and calculates how many days it was in progress, how many business days, and whether it’s currently part of the active project.
For historical comparison purposes, I added functionality by which I can get the historical data for a card, to give me its status in the project and its estimated size for any day that the card has existed. The idea is to use this data to see if the projected end date is moving or not.
However, with this feature added, Elmer now has two completely independent ways of determining the history of a card’s status:
- Cards have attributes called
date_added
,date_started
anddate_finished
. These attributes are updated when the card’s state changes using state-change methods calledadvance!
andretreat
. - Cards also have a list of associated objects called
CardSnapshot
. Each CardSnapshot represents a a date and the status or size of a card on that date. The entire list of snapshots is a history of the card over time. These snapshots are created or updated using ActiveRecord callbacks when the card is saved if the save changes the size or status. Complicating things, there is only one snapshot per day, if the size or status changes again on the same day, the existing snapshot for that day is updated rather than a new one created. I do this because Elmer doesn’t care about units of time shorter than a day.
These two mechanisms are redundant — the information in the card attributes can be completely gleaned by examining the Card Snapshots.
This is duplication in the strict sense of what Don’t Repeat Yourself (DRY) means. If I want to know when a card was started, I have two sources of truth, and it’s not hard to see how they might get out of sync. Just updating the status without using the special advance!
or retreat!
methods would do it.
Using the attributes on Card has two problems. One is that the attribute names are hard-coded to match the names of the statuses, which would be a problem if and when I ever decide to change the status names.
More importantly, there’s a requirement for the historical data that I’m not sure how to do with just the attributes — the ability to recreate the state of the card as it appeared on a particular day, no matter how the card has moved back or forth since then.
In other words, if the card was done
on Tuesday, but we revert it back to started
on Wednesday then finish it again on Thursday, if we look back and generate historical graphs then Tuesday’s historical data should reflect the card being done, and Wednesday’s shouldn’t, even though the card is finished again in the future.
I don’t think you can do this with the attributes alone, going back and forth past a given status wipes out older times you might have entered the status. You can do this with the data history, but it’s not clear to me as I start exactly how I want to manage it. Specifically, the Calculator still depends on the Card’s present-day state in a way that confounds doing historical data.
So, the plan is to replace the attributes with the snapshots, first of all to remove the duplication and then see what that gives us.
I’d also like to preserve the API — I have a lot of code that depends on the existence of getters like card.date_started
, and I’d prefer not to change it.
Luckily for me, at this point the basic behavior of both the original attributes and the card snapshots is well covered by tests, so I’m pretty confident that I can start with my existing tests and if they all pass, I’m likely fine.
What I want to do is replace all attribute getters like date_started
with new versions that look through the list of snapshots to infer the correct date.
First, I removed all the attribute columns via a migration
class MoveDateToSnapshot < ActiveRecord::Migration[7.0]
def change
remove_column(:cards, :date_added, :date)
remove_column(:cards, :date_deleted, :date)
remove_column(:cards, :date_finished, :date)
remove_column(:cards, :date_started, :date)
end
end
There’s no real data yet, and even if there was, all the running projects already have snapshots, so nothing should be lost here.
Replacing the attributes calls for a bunch of similar methods that all have the same pattern.
The simple logic here would be to search for the one and only snapshot in the list whose status matches what we are looking for and return its date.
However, in the actual data, the card might have gone back and forth over the actual status leading to multiple snapshots with the same status, or there might be zero cards with the same status if the card has jumped through multiple statuses on the same day.
Because I’m me, I metaprogammed them1 — this is the quick to green, un-refactored version:
Status.sorted_statuses.each do |a_status_value|
define_method(:"date_#{a_status_value.verbed}") do
return nil unless achieved?(a_status_value)
card_snapshots
.select { _1.advance? }
.sort_by { [-_1.status, _1.snapshot_date] }
.select { _1.status_value.after_or_eq?(a_status_value) }
.last
&.snapshot_date
end
end
This gives us a set of attribute methods for each status that look like this sample:
def date_started
return nil if unless achieved?(Status["started"])
card_snapshots
.select { _1.advance? }
.sort_by { [-_1.status, _1.snapshot_date] }
.select { _1.status_value.after_or_eq?(Status["started"]) }
.last
&.snapshot_date
end
So, the code:
- Returns
nil
if the card has not currently achieved the status — in the sample method, if the current state of the card is before:started
, returnnil
- Then we filter out the card snapshots to try and find the most recent relevant one: first we limit our list to only snapshots that are an
advance?
, meaning that they have moved the status forward from the previous snapshot. - Then we sort such that the latest statuses are first, with ties broken by snapshot date, filter down to the remaining snapshots that are after or equal the one we are testing, and pick the last one (trust me), and grab its
snapshot_date
.
All told, this should handle the weird cases where there are more than one or less than one matching snapshot. I think…
The exact details of the algorithm don’t matter — even though I just spent 500 words explaining them. What’s important is this:
All the history calculations are now completely separated from the card itself — all the data we need for the calculation is encapsulated in the list of snapshots. We do use the current status of the card, but, importantly, that’s equivalent to the status of the most recent snapshot.
In other words, we can refactor these attributes to this separate class:
class CardSnapshotList
include Enumerable
include HasValueObject
attr_reader :card_snapshots
value_object_for(:status, Status)
value_object_for(:size, Size, default_value: "sm")
def initialize(*card_snapshots)
@card_snapshots = card_snapshots.flatten
end
delegate :each, :empty?, to: :card_snapshots
delegate :status, :project, :in_project?, :size, to: :last
def last
card_snapshots.max_by(&:snapshot_date)
end
def achieved?(a_status_value)
status_value.after?(a_status_value)
end
Status.sorted_statuses.each do |a_status_value|
define_method(:"date_#{a_status_value.verbed}") do
return nil unless achieved?(a_status_value)
last_snapshot_at_or_beyond(a_status_value)&.snapshot_date
end
end
def last_snapshot_at_or_beyond(a_status_value)
in_status_order.select { _1.advance? }
.select { _1.status_value.after_or_eq?(a_status_value) }
.last
end
def in_status_order
sort_by { [-_1.status, _1.snapshot_date] }
end
end
We also need a little glue in the Card
class to make the attributes continue to delegate appropriately:
def card_snapshot_list
CardSnapshotList.new(card_snapshots)
end
Status.sorted_statuses.each do |a_status_value|
delegate :"date_#{a_status_value.verbed}", to: :card_snapshot_list
end
This all works, the tests pass. There is probably a performance issue eventually about how advance?
is calculated, but that should be solvable should it ever be an issue.
So…
We’ve fixed our issues with having duplicate sources of truth, and the code much better suited to handle different kinds of statuses over time.
The code has gotten more complicated, but so has the business logic — if we weren’t also trying to allow statuses to go back and forth, the code would still be pretty simple.
Yes, there’s metaprogramming, and I know, but if you wanted to get rid of it you could replace all the dynamic date_started
methods with something like date_of_status(:started)
, it’d all work pretty much the same.
More to the point, we’ve completely solved our issue with being able to recreate historical status exactly as it looked on an arbitrary day.
Here’s how:
Before this refactor, the Calculator
class that does all the of calculations on a card to get the duration of time spent on it, and so on, takes a Card
as an argument.
But with all the timing data now siphoned off into CardSnapshotList
, we can pass that class a CardSnapshotList
instead. Everything still works exactly same because we’ve preserved the method interface, both classes respond to date_started
and card_snapshots
.
And once we can do duration calculations on a whole set of snapshots, we can do duration calculations on a partial set of snapshots…
def at(previous_date)
CardSnapshotList.new(card_snapshots.select { _1.snapshot_date <= previous_date })
end
The method can go either in Card
or CardSnapshotList
, it works either way.
All we need to do is pass a truncated list of the card snapshots through the day we care about, and we’ll have preserved the state of the card as it looked on that day.
This trick — encapsulating lists of objects, especially has_many
ActiveRecord relations — is a useful one that I don’t see suggested very often. (I’ve often felt it’d be interesting for Rails to create list classes for ActiveRecord models by default). Hopefully this example starts to show you why that is.
In many cases there’s some kind of custom list traversal with a domain-specific meaning. For example, you might have
- A custom way of calculating a total value of the list — for example a list of Elmer cards might have custom size calculation.
- Other custom ways of converting the list to a single value, as this case converts a list of snapshots to single start values for each status.
- A custom way of filtering, for example a list of Elmer cards might have a constructor or factory that takes a search.
- Custom conversions: a list of Elmer cards might need to convert to a list of their statuses.
The point is that if those domain specific features of lists are useful in one context, there is a really good chance they are useful in others — especially if the ActiveRecord class has a relationship with multiple parent classes
I notice, for example, that the Elmer Project
class right now has four methods that start with either cards.select
or cards.map
. And one method that starts with people.map
. The Elmer Team
class also has_many
cards and has_many
people, and it’s highly likely that Team
is going to need most or all of those methods on a Team list of cards once I do cross-project team wide displays. (This doesn’t even count searching through cards, which I haven’t written yet). Creating a CardList
class means that behavior is encapsulated. And this example shows that once the behavior is encapsulated, it’s also possible that new behavior will be easier to write and manage.
Let’s just do this. Here’s an ActiveRecord concern:
concern :HasListClass do
class_methods do
def list_class_for(attribute, class_name = nil)
class_name ||= "#{attribute.to_s.classify}List"
define_method :"#{attribute.to_s.singularize}_list" do
class_name.constantize.new(send(attribute))
end
end
end
end
The classify
method is an ActiveModel feature that converts the attribute name from something like :card_snapshots
to the default class name CardSnapshotList
, including converting the name to a singular noun. (It’s what ActiveRecord already uses for table names from has_many
declarations.) The defined method does exactly what our glue method did above: creates an instance of the list class with the internal list as an argument.
Then our existing Card
code can change to:
includes HasListClass
list_class_for(:card_snapshots)
And the method card_snapshot_list
still exists and works.
It’s not much of a change, but we’re creating a new convention, and now I can easily adjust the project class
includes HasListClass
list_class_for(:people)
list_class_for(:cards)
This gives me methods called card_list
and person_list
(the CardList
and PersonList
classes have to exist for the methods to work).
Wait, while I’m here, let’s take it another step, we’re going to want common features to all the list classes, here’s a parent class pulling out from CardSnapshotList
class ActiveRecordList
include Enumerable
attr_reader :elements
def self.from(elements_or_list)
if elements_or_list.is_a?(self)
return elements_or_list
end
new(elements_or_list)
end
def initialize(*elements)
@elements = elements.flatten
end
delegate :each, :empty?, to: :elements
end
Now I can make CardSnapshotList
a subclass of ActiveRecord
list and remove the duplicate lines from that class, and I’ve got a foundation of common behavior for my list classes that will make them instantly Enumerable and instantly useful.
I can then move all the cards.select
and cards.map
method from Project
to the new CardList
class. It works, but there are some references that need to change from, say project.cards_at
to project.card_list.at
. I think I’m okay with that change, if it bothers me, I can always add a delegate
call.
And that’s refactoring to encapsulate a list of objects in ActiveRecord.
-
I also did setters, which I needed temporarily to make the tests pass, but they were very slow and it wound up being easier to change the tests to create and update snapshots directly. I’m not showing the setters here because they would up being unrelated to the general point. ↩︎