Arel, Part III: Set Subtraction - Rails
In modern life science research, it's relatively common practice to work with large datasets. As time goes on and new information is gleaned from experiments or analytics, elements in that original set may be deemed irrelevant or uninteresting and are removed from the active dataset.
As developers, the notion of subtracting one set from another is a common one.
In Ruby, for example, we have methods like
constructs exist in SQL, like the Postgres'
Unfortunately, ActiveRecord doesn't provide an interface for database set subtraction. Luckily for us, Arel does.
Let's say that you have the following models in your system:
class Post < ActiveRecord::Base has_many :posts_authors has_many :authors, through: :posts_authors end class PostsAuthor < ActiveRecord::Base belongs_to :post belongs_to :author end class Author < ActiveRecord::Base has_many :posts_authors has_many :posts, through: :posts_authors end
Let's further assume that your client wants to be able to search posts for those that do not have one or more specific authors tied to them. The following query is trivial to write using ActiveRecord and might appear to be the correct solution.
SELECT posts.* FROM posts INNER JOIN posts_authors ON posts_authors.post_id = posts.id INNER JOIN authors ON authors.id = posts_authors.author_id WHERE authors.id NOT IN <given_author_ids>
However, imagine the situation where the post Rails is the best! has two
authors: Emerson Huitt and Mike Simpson. If you search for posts not written by
Emerson with the above query, you'll actually see Rails is the best! in your
search results. Why? The row in the
posts_authors join table that's tied to
the post "Rails is the best!" and the author "Mike Simpson" satisfies the
condition where the user is not Emerson Huitt, giving us a false positive in our
Luckily, there are a few different ways we can approach this problem. In this post, we'll be using set subtraction.
If we were writing raw SQL for, say, an Oracle database, we could do the following:
SELECT * FROM posts MINUS SELECT * FROM posts INNER JOIN authors ON authors.post_id = posts.id WHERE authors.id IN <given_author_ids>
Looking over the Arel documentation, it may not be immediately apparent how to
accomplish this. Luckily, though, instances of
Arel::SelectManager have the
except method, which we can utilize to get our desired query.
First, we want to store the Arel tables of our two models as variables for convenience and reusability.
posts = Post.arel_table authors = Author.arel_table
Next, we need to figure out how to get the ids of the posts that are tied to the author ids that we want to exclude.
unwanted_post_ids = Post .joins(:authors) .select(posts[:id]) .where(authors[:id].in(unwanted_author_ids))
Here, we're doing an
INNER JOIN between the
authors tables and
posts.id for all posts tied to our unwanted authors. Now, we just
need to subtract our unwanted posts from the rest of the rows in the
target_post_ids = posts .project(posts[:id]) .except(unwanted_post_ids)
Now that we have our Arel set up, we just need to pass it along to ActiveRecord
to get back our
ActiveRecord::Relation of posts.
Post.where( posts[:id].in(target_post_ids) )
Voilà! We now have all of the posts in our database that have not been tied to
our given authors packaged up into a neat
Learning More About Arel
This brings our series on Arel to a close. Hopefully, these last few posts have
shown that Arel can be a hugely powerful asset in the Rails toolbox. If you'd
like to learn more about the features of Arel not covered in these posts or the
Arel README, I'd encourage you to dive into the source code itself. The
tests give some example use
cases while offering a relatively easy way to see some of the different methods
that might be of use to you, such as
does_not_match_all. If you
haven't already been exposed to the visitor
pattern, you may want to
familiarize yourself with it first to have a better understanding of Arel's