Misconceptions with "select_related" in Django
Django's select_related QuerySet method is a great way to reduce the query count when joining tables but it should come with small print in the documentation. This post goes over that small print.
🥱 TLDR: you can only use
select_related in forward
OneToOne relationships as well as backward
OneToOne relationships. You can't use it in any
ManyToMany relationships or in reverse
select_related QuerySet method is a great way to easily reduce the query count when you know you will be following
ForeignKey relationships after the query has been performed. From the Django documentation:
select_relatedis a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
Take this simple example:
class Genre(...): name = models.CharField(...) class Album(...): name = models.CharField(...) genre = models.ForeignKey(...)
select_related when we fetch the albums, we can avoid any further queries that would arise from traversing the
ForeignKey relationship on an album. Here's a run-through example:
from django import db rock = Genre(name="Rock") folk = Genre(name="Folk") zep = Album(name="Led Zeppelin", genre=rock) ok = Album(name="Ok Computer", genre=rock) ck = Album(name="Pink Moon", genre=folk) # Without using select_reated, there is no join in the query, and we can see that there a total # of 4 queries after we loop through our albums printing their genres albums = Album.objects.all() print albums.query > SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id" FROM "sandbox_album" for album in albums: print album.genre.name print len(db.connection.queries) > 4 db.reset_queries() # When using select_reated, there is an join in the query and there is only one # query in total, even after looping through the albums albums = Album.objects.select_related("genre").all() print albums.query > SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id", "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_album" INNER JOIN "sandbox_genre" ON ("sandbox_album"."genre_id" = "sandbox_genre"."id") for album in albums: print album.genre.name print len(db.connection.queries) > 1
So surely you can optimise the reverse
ForeignKey relationships too? … No!
Take a look:
genres = Genre.objects.select_related('album').all() print genres.query > SELECT "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_genre" for genre in genres: for album in genre.album_set.all(): print album.genre.name print len(db.connection.queries) > 4
Actually, the documentation also mentions this specifically but having been bitten by this shortcoming numerous times I think it's worth repeating:
select_relatedworks by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason,
select_relatedgets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
So what are the options? Well luckily, Django 1.4 introduces a new
prefetch_related QuerySet method:
prefetch_relatedon the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using
select_related, in addition to the foreign key and one-to-one relationships that are supported by
select_related. It also supports prefetching of GenericRelation and GenericForeignKey.
If you are stuck with Django < 1.4, there is a more laborious option. You can roll a solution yourself using an approach such as this