Misconceptions with "select_related" in Django
Django's select_related QuerySet method is a great way to reduce the query count when joining tables but it should come with small print in the documentation. This post goes over that small print.
TLDR
You can only use select_related
in forward ForeignKey
and OneToOne
relationships as well as backward OneToOne
relationships. You can't use it
in any ManyToMany
relationships or in reverse ForeignKey
relationships.
Django’s select_related
QuerySet method is a great way to easily reduce the query count when you know you will be following ForeignKey
relationships after the query has been performed. From the Django documentation:
select_related
is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
Take this simple example:
class Genre(...):
name = models.CharField(...)
class Album(...):
name = models.CharField(...)
genre = models.ForeignKey(...)
By using select_related
when we fetch the albums, we can avoid any further queries that would arise from traversing the ForeignKey
relationship on an album. Here's a run-through example:
from django import db
rock = Genre(name="Rock")
folk = Genre(name="Folk")
zep = Album(name="Led Zeppelin", genre=rock)
ok = Album(name="Ok Computer", genre=rock)
ck = Album(name="Pink Moon", genre=folk)
# Without using select_reated, there is no join in the query, and we can see that there a total
# of 4 queries after we loop through our albums printing their genres
albums = Album.objects.all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id" FROM "sandbox_album"
for album in albums:
print album.genre.name
print len(db.connection.queries)
> 4
db.reset_queries()
# When using select_reated, there is an join in the query and there is only one
# query in total, even after looping through the albums
albums = Album.objects.select_related("genre").all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id", "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_album" INNER JOIN "sandbox_genre" ON ("sandbox_album"."genre_id" = "sandbox_genre"."id")
for album in albums:
print album.genre.name
print len(db.connection.queries)
> 1
So surely you can optimise the reverse ForeignKey
relationships too? … No!
Take a look:
genres = Genre.objects.select_related('album').all()
print genres.query
> SELECT "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_genre"
for genre in genres:
for album in genre.album_set.all():
print album.genre.name
print len(db.connection.queries)
> 4
Actually, the documentation also mentions this specifically but having been bitten by this shortcoming numerous times I think it's worth repeating:
select_related
works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason,select_related
gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
So what are the options? Well luckily, Django 1.4 introduces a new prefetch_related
QuerySet method:
prefetch_related
on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done usingselect_related
, in addition to the foreign key and one-to-one relationships that are supported byselect_related
. It also supports prefetching of GenericRelation and GenericForeignKey.
If you are stuck with Django < 1.4, there is a more laborious option. You can roll a solution yourself using an approach such as this