ManyToManyFields in Django have a big inefficiency when running a query to get objects that share a ManyToMany relation with another object. Consider these three models:
We can run this query to see what Authors are available in a particular bookstore:
What’s The Problem?
That QuerySet produces the following SQL:
Look at the subquery. There’s an unnecessary join against the table book. The information returned by that subquery is the ids of books associated with a particular bookstore. This information is available directly from M2M table bookstore_book.
So What Can We Do?
We’re really looking at two problems here:
- Django doesn’t realize we only need the
ids of the books and can obtain them without accessing thebooktable. (I’m not going to dive into this one) - Even if the developer realizes, there is no easy way to extract that information using the ORM.
This second issue can be solved by subclassing two Django classes and creating an attribute that collects the data we need in a more efficicent way. Here’s the code I came up with:
This NewManyToManyField will give the ManyRelatedManager accessed through the attribute books the attribute id_list, which is a method that returns a QuerySet containing the list of the ids of the target model that have a connection to the current object. In this case it returns a list of book ids in the B&N book store. If we declare our models using the NewManyToManyField we can take advantage of this new attribute:
Now we can use the following code:
Which produces this SQL
Check out the subquery. We’ve saved a JOIN against the book table. This may seem like an insignificant optimization, but if there are a lot of people on your site visting pages that require this type of query, losing that JOIN can really speed things up.
How does it work?
During class creation of a model, a method called contribute_to_class is called for every database field. This method gives the class an attribute that is the name of the field and gives an instance access to the value of that field. For ManyToManyFields this attribute is a ReverseManyRelatedObjectsDescriptor object, which returns an instance of ManyRelatedManager for accessing the related rows of the related table. The code subclasses ReverseManyRelatedObjectsDescriptor, catches the ManyToManyRelated manager on the way out, and then assigns it the attribute id_list, which is method that returns a QuerySet that only accesses the M2M table and contains the ids of the related objects.
There is still a lot of room for improvement, but this gets the job done. For starters, the ManyRelatedManager gets created behind the scenes here and is then overwritten. Also, id_list isn’t a bound method of the ManyRelatedManager. I know both this issues could be solved by jumping down one more level and also subclassing ManyRelatedManager, but I think this is good enough for now.


