How Not To Use Static Typing In Ruby
How To Not Use Static Typing In Ruby
Last time, I took a short example and examined in some detail what you would gain by adding static typing to it and what it would cost to use static typing.
What I didn’t do was explain how I might handle the problem without static typing.
For reference, Here’s the example again. Consider this to be part of a larger system and don’t worry too much about the rest of the world:
class CheckoutService
def checkout(user, items, amount, status)
# do some things
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
class ManagePayment
def manage_payment(user, items, amount, status)
# make the user pay
HandleShipping.new.handle_shipping(user, status)
end
end
class HandleShipping
def handle_shipping(user, status)
send_item_to(user.address)
end
end
The problem, as originally presented to me, was:
“Even though address
isn’t used until the third step, none of the steps should happen if the user
input doesn’t have an address”.
A statically typed system solves this problem by preventing you from passing an object to user
if it isn’t of the class User
, which presumably has a address
attribute. A dynamically typed system needs to do something else.
In the previous article we talked about what other potential data validation errors might need to be handled, but in this one, I’m going to focus on that fast-fail if the type isn’t correct. The other potential validation issues still need to be handled, but that’s true in both cases so I’ll focus on the dynamic specific structures.
Also, I’m not going to be worrying about editor tooling — static typing is better for that, some of these options do provide tooling with Solargraph, Ruby LSP, or RubyMine’s tooling.
Option 1: Do Nothing
Really. This is an option. Okay, technically it doesn’t conform to the constraint, but in the real world, in this situation you should at least raise the question of how much damage is actually done if the code does not fast-fail.
I realize that this is kind of cheating for the purposes of the problem, but my whole point here is that there are scenarios where the tradeoff of the flexibility might be worth the occasional error (in practice this would mean waiting until the code gets to the actual call in handle_shipping
to catch the error):
- In some cases, something else is doing the data validation and you are pretty sure that the incoming data is going to be valid. (This is perhaps another thing that is more often true in the smaller team / less complex code setup). This is the “if a non-
User
gets here, 10 other terrible things will have already happened, and this code path is the least of our problems” scenario. - In a related case, you have a setup where something else will fail loudly in this code – in this example, you might have a call to
address
(or some otherUser
-specific attribute) before the other calls happen, so it will naturally fail. - It might not actually be terrible for the point of failure to be in
handle_shpping
– the problem is set up so it is, but if this code isn’t actually about money or irreversible real-world outcomes, you should make sure that the constraint is real and not an obstacle. The simpler code that waits to catch the error may have benefits down the road. Malka Older has a quote in this book about “the human tendency to romanticize the imposition of unnecessary obstacles”, and wow does that apply to developers. Fight it. - Alternately, it might be easier to mitigate an error after it happens than to prevent it before it happens.
Anyway, doing nothing is an option here.
Option 1a: YARD
This is somewhere between doing nothing and actually type checking, but the YARD documentation tool does allow you to annotate with type information:
class CheckoutService
# @param user [User]
# @param items [Array<Item>]
# @param amount [Integer]
# @param status [Symbol]
# @return [void]
def checkout(user, items, amount, status)
# do some things
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
This doesn’t actually give you runtime tooling, but RubyMine does parse this and will give you editor hints based on the YARD comments (other tools might as well), plus you get nifty documentation to boot.
I feel there’s a very strong chance that we’ll get some kind of YARD -> RBS bridge sooner or later.
Option 2: Actually Type Check
You can trivially do a real runtime type-check in Ruby:
class CheckoutService
def checkout(user, items, amount, status)
raise RuntimeError unless user.kind_of?(User)
# do some things
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
And while I grant you that it’s less elegant than a static type declaration, it will prevent what we are told we want to prevent.
Most Ruby style guides will tell you to avoid this usage (see pages 361-2 of the Pickaxe, for example) for two basic reasons:
- You are getting all the costs of static typing in terms of making the code less flexible and almost none of the benefits – Ruby editors and tools likely won’t accept that
user
is aUser
even after the guard clause. - If you are doing any logic based on the class of an object, you are potentially replicating what a late-binding system does and you should just use method calls.
Have I done this in Ruby? Yes. Most often, as we’ll see, because it’s useful in the factory methods you use for coercion methods. Less frequently as a way to avoid monkey patching code I don’t own. But there’s almost always a better way than integrating the type check with the business logic.
Option 3: Modified Type Check
Most Ruby style guides will tell you that if you want to do logic based on the type of an object, don’t use the class – classes aren’t types in Ruby – but check whether the object responds to a method:
class CheckoutService
def checkout(user, items, amount, status)
raise RuntimeError unless user.respond_to?(:address)
# do some things
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
Pickaxe suggests this on page 362, including this joke
“Will you get thrown out of the duck typing club if you check the parameter against a class? No, you won’t. The duck typing club doesn’t check to see whether you’re a member anyway”
I wish I could take credit for the joke, but it predates me and sounds like Dave.
From a Ruby style standpoint, respond_to?
is considered incrementally better than kind_of?
. Why? Because you keep the potential of dynamically extending the code. If we eventually have Customer
objects here, the code will still work as long as they respond to address
, but the code will still fail if you pass in a String or nil
or something else random.
There are a couple of downsides, the main one being that if you are dependent on more than one method of user
this gets unwieldy quickly.
I find that I don’t actually use this one very often in practice, I usually jump to the next step.
Option 4: Coercion to Existing Classes
One way to look at static typing is that it’s preventative — you are dealing with complex and potentially messy data by limiting the kinds of data that can be passed to a method. You are building a fence around your code.
That’s obviously appealing, but the result is that you are placing a burden on any of the method’s callers to adjust their data to the shape you are expecting.
There are a couple of potential problems here. The theoretical problem is that this way of managing data inverts what is supposed to be the flow of knowledge in an object-oriented program, where a class is supposed to manage its own data. Another theoretical problem is that your wall might be too strict, that actually useful usages might be prevented. The practical problem is that if there are a lot of callers to a method, that’s potentially a lot of sites that are adding complexity to adjust their data to match the call site.
Alternately, we could make our method more welcoming in what data it accepts. This puts the complexity involved making sure the data is the correct shape inside the method or class. Rather than a fence, this is a toll road in, a road in which you do have to get past a barrier.
One place this might work in our example is the status
parameter, which presumably could be a string or a symbol.
Strings and symbols are a particularly fraught type issue in Ruby because they are so similar and because external data sources often don’t recognize symbols.
A very common type problem in Ruby goes like this:
- Ruby code treats an attribute as a symbol
- The attribute round-trips to a database or a JSON payload and comes back as a string.
- The Ruby code does an equality test on the data against a symbol, which always fails and won’t raise an error. (Or the data is used as the key in a hash and never matches the symbol key, which is basically the same thing.)
I don’t consider this a reason to use static typing in general, but I do consider it an inconvenience in the way Ruby interacts with the outside world that needs to be dealt with.
I frequently normalize potential string/symbol data as it comes in — usually this would be in the initializer, but I don’t have an initializer in this example yet.
class CheckoutService
def checkout(user, items, amount, status)
status = status&.to_sym
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
We could do something similar for amount
, to convert it to BigDecimal
or use the money gem.
I will also do things like call symbolize_keys
on a Hash
to normalize the hash keys, or alternately use the Rails HashWithIndifferentAccess
so I don’t need to care what the input type is.
The common argument against using things like HashWithIndifferentAccess
is that it encourages or at least enables sloppiness on the part of the programmers. In the specific case of string/symbol, I don’t think that’s a problem – Rails developers have been using HashWithIndifferentAccess
for 20 years, and I’d bet there’s a very high number who use it regularly and don’t know it exists.
In the general case, you do need to be careful — I used to do this a lot in code:
def thing(user_or_id)
user = user_or_id.is_a?(Integer) ? User.find(user_or_id) : user_or_id
# more stuff
end
I liked it, but the problem is that if somebody accidentally passes in an id for a different class, this code will blithely convert to a user and you get a very subtle bug.
This is where somebody tells me about their TypeScript code where every object’s ID was a unique type so you avoid that problem, and I tell them about Rails Global ID, which effectively allows you to do the same thing.
You could get fancy with this with a little monkey patching… This uses the object system to ensure that we’ve got a User
object given a User
or a GlobalID
.
class GlobalID
def find_if(klass)
raise RuntimeError unless klass == model_class
find
end
end
class ActiveRecord::Base
def find_if(klass)
raise RuntimeError unless klass == self.class
self
end
end
def thing(user_or_global_id)
user = user_or_global_id.find_if(User)
end
I like this, the main problem is that you are less likely to be dealing in Rails GlobalIDs, but I could see this being valuable if you have a method that is called both normally (with a User
) or from a background job (with a GlobalID
).
You can do coercion for our non-literal classes as well.
Let’s say we had a method like this:
class User
def self.from(object)
case object
when User
object
else
raise RuntimeError, "#{object isn't a user}"
end
end
end
(I’m using the case
statement here because then I don’t need to use kind_of?
)
Now, I take my original method and run my user through that:
class CheckoutService
def checkout(user, items, amount, status)
vetted_user = User.from(user)
ManagePayment.new.manage_payment(vetted_user, items, amount, status)
end
end
Okay, big whoop, it’s the same type check but buried behind some abstraction… But I could extend it to be more forgiving about what types got passed in.
class User
def self.from(object)
if object.is_a?(GlobalID)
object = object.find
end
case object
when Company
object.contact_user
when String
User.find_by(email: object)
when User
object
else
raise RuntimeError, "#{object} isn't a user"
end
end
end
(The Global ID check is separate from the case statement because if the Global ID is for a Company
, you’d still want the conversion to happen.)
And so on, depending on how causal you want to be about accepting values.
And, for what it’s worth, if you do want some static typing, and you type check the coercion method…
class User
def self.from(object: any): User
end
You get a lot of the benefit of static typing – your tooling will be able to treat the vetted_user
as a User
, but you don’t have to limit the set of parameters that your method takes.
(I’m talking myself into a very counterintuitive position where I could imagine not type checking parameters to methods, but at least occasionally type checking the return values on the theory you get a lot of the benefit of tooling without losing much flexibility. I don’t completely believe this yet, but I’d be into trying it once.)
Validated Objects
A slightly different structure that was pointed out to me allows you to incorporate all your validations into the type system by having a ValidUser
and InvalidUser
class. There are a lot of ways to do this, and extension of what we’ve got looks like this:
class User
def self.from(object)
if object.is_a?(GlobalID)
object = object.find
end
case object
when Company
object.contact_user
when String
User.find_by(email: object)
when User
object
else
raise RuntimeError, "can't convert #{object} to a User"
end
object.valid? ? ValidUser(object) : InvalidUser(object)
end
end
I think we’ll talk more about this in a future post about getting rid of if statements, but the idea here is that then the InvalidUser
acts like more of a null object and prevents the future operations from happening.
A digression about initializers
If you have a quote-unquote “service object” that effectively has one main public method (which is basically what we have here), there are three ways to structure the code in Ruby:
Class method:
class Service
def self.call(arg1, arg2)
end
end
Instance method, empty initializer:
class Service
def initialize
end
def call(arg1, arg2)
end
end
Instance method, full initializer:
class Service
def initialize(arg1, arg2)
end
def call
end
end
You should almost always use the last version.
There are a couple of reasons, the relevant one here is that it allows us to ensure that our data normalization happens.
In our example, we’re potentially coercing the user
, the items
, the status
, the amount
… it’s a lot to do in the actual business logic class. It’s easier, and clearer to do this:
class CheckoutService
attr_reader :user, :items, :amount, :status
def initialize(user, items, amount, status)
@user = User.from(user)
@items = items
@amount = amount
@status = status&.to_sym
end
def checkout
# do some things
ManagePayment.new.manage_payment(user, items, amount, status)
end
end
There are a couple of advantages to this structure. A big one is that if we add a second public method besides checkout
, we don’t have to re-do all the data shaping.
There aren’t very many ways to enforce order of methods in Ruby, but initialize
is one of them – the object has to be initialized before its used, so you can guarantee the data is validated before checkout.
And… if you are into the partial static typing idea, you can type the instance variables and not the arguments to initialize, and you can still get a lot of the tooling benefit of typing.
Option 5: Coercion to New Class
One downside of putting all this initialization and whatever in the initializer is that this set of objects is effectively shared across the three services. Assuming we don’t want to duplicate the validation each time we call the new service, another option is to create a new class that contains all the validation and then use that:
class CheckoutOrder
attr_reader :user, :items, :amount, :status
def initialize(user:, items:, amount:, status:)
@user = User.from(user)
@items = items
@amount = amount
@status = status&.to_sym
end
end
This time I’ve made the arguments keyword arguments — my normal practice if there are more than two arguments.
Then:
class CheckoutService
attr_reader :checkout_order
def initialize(user, items, amount, status)
@checkout_order = CheckoutOrder.new(user:, items:, amount:, status:)
end
def checkout
# do some things
ManagePayment.new(checkout_order).manage_payment
end
end
class ManagePayment
attr_reader :checkout_order
def initialize(checkout_order)
@checkout_order = checkout_order
end
def manage_payment
HandleShipping.new(checkout_order).handle_shipping
end
end
class HandleShipping
attr_reader :checkout_order
def initialize(checkout_order)
@checkout_order = checkout_order
end
def handle_shipping
send_item_to(checkout_order.user.address)
end
end
This passes the full order
object to the HandleShipping
class, which in the previous examples didn’t actually use all the attributes. I don’t have a problem with that, but it is a difference in the code.
Like the other examples, this code will flag the type error immediately on the attempt to create the CheckoutOrder
object, and basically any code path into the other methods will need to be checked when that object is created. I guess technically, you’d consider validating that the initialize
methods of the other classes receive a CheckoutOrder
.
This could be combined with the valid/invalid factory to give a ValidCheckoutOrder
vs. InvalidCheckoutOrder
if you wanted.
What I like about this one is that it makes it easy to have the common validation logic shared across the different parts of this workflow. My experience is that these kinds of value classes tend to wind up attracting useful functionality that would otherwise awkwardly be attached to one of the other objects.
The down side is the increased complexity, so you’d want to do this in cases where there is shared validation logic across multiple users of these classes.
Next
This got super long, so I just want to conclude with these statements:
- You can do runtime type checking in Ruby if you must
- You can also do implicit type management by using the object system to shape the data as you need it.
Next post will go more into the extracting class techniques and why they are valuable.