ManyToManyFields in Django have a big inefficiency when running a query to get objects that share a
ManyToMany relation with another object. Consider these three models:
We can run this query to see what Authors are available in a particular bookstore:
What’s The Problem?
That QuerySet produces the following SQL:
Look at the subquery. There’s an unnecessary join against the table
book. The information returned by that subquery is the
ids of books associated with a particular bookstore. This information is available directly from M2M table
So What Can We Do?
We’re really looking at two problems here:
- Django doesn’t realize we only need the
ids of the books and can obtain them without accessing the
booktable. (I’m not going to dive into this one)
- Even if the developer realizes, there is no easy way to extract that information using the ORM.
This second issue can be solved by subclassing two Django classes and creating an attribute that collects the data we need in a more efficicent way. Here’s the code I came up with:
NewManyToManyField will give the
ManyRelatedManager accessed through the attribute
books the attribute
id_list, which is a method that returns a
QuerySet containing the list of the
ids of the target model that have a connection to the current object. In this case it returns a list of book ids in the B&N book store. If we declare our models using the
NewManyToManyField we can take advantage of this new attribute:
Now we can use the following code:
Which produces this SQL
Check out the subquery. We’ve saved a
JOIN against the book table. This may seem like an insignificant optimization, but if there are a lot of people on your site visting pages that require this type of query, losing that
JOIN can really speed things up.
How does it work?
During class creation of a model, a method called
contribute_to_class is called for every database field. This method gives the class an attribute that is the name of the field and gives an instance access to the value of that field. For
ManyToManyFields this attribute is a
ReverseManyRelatedObjectsDescriptor object, which returns an instance of
ManyRelatedManager for accessing the related rows of the related table. The code subclasses
ReverseManyRelatedObjectsDescriptor, catches the
ManyToManyRelated manager on the way out, and then assigns it the attribute
id_list, which is method that returns a
QuerySet that only accesses the M2M table and contains the
ids of the related objects.
There is still a lot of room for improvement, but this gets the job done. For starters, the
ManyRelatedManager gets created behind the scenes here and is then overwritten. Also,
id_list isn’t a bound method of the
ManyRelatedManager. I know both this issues could be solved by jumping down one more level and also subclassing
ManyRelatedManager, but I think this is good enough for now.