Misconceptions with "select_related" in Django

Django's select_related QuerySet method is a great way to reduce the query count when joining tables but it should come with small print in the documentation. This post goes over that small print.

🥱 TLDR: you can only use select_related in forward ForeignKey and OneToOne relationships as well as backward OneToOne relationships. You can't use it in any ManyToMany relationships or in reverse ForeignKey relationships.

Django’s select_related QuerySet method is a great way to easily reduce the query count when you know you will be following ForeignKey relationships after the query has been performed. From the Django documentation:

select_related is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.

Take this simple example:

class Genre(...):
  name = models.CharField(...)
class Album(...):
  name = models.CharField(...)
  genre = models.ForeignKey(...)

By using select_related when we fetch the albums, we can avoid any further queries that would arise from traversing the ForeignKey relationship on an album. Here's a run-through example:

from django import db
rock = Genre(name="Rock")
folk = Genre(name="Folk")
zep = Album(name="Led Zeppelin", genre=rock)
ok = Album(name="Ok Computer", genre=rock)
ck = Album(name="Pink Moon", genre=folk)
# Without using select_reated, there is no join in the query, and we can see that there a total
# of 4 queries after we loop through our albums printing their genres
albums = Album.objects.all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id" FROM "sandbox_album"
for album in albums:
  print album.genre.name
print len(db.connection.queries)
> 4
# When using select_reated, there is an join in the query and there is only one
# query in total, even after looping through the albums
albums = Album.objects.select_related("genre").all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id", "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_album" INNER JOIN "sandbox_genre" ON ("sandbox_album"."genre_id" = "sandbox_genre"."id")
for album in albums:
  print album.genre.name
print len(db.connection.queries)
> 1

So surely you can optimise the reverse ForeignKey relationships too? … No!

Take a look:

genres = Genre.objects.select_related('album').all()
print genres.query
> SELECT "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_genre"
for genre in genres:
  for album in genre.album_set.all():
    print album.genre.name
print len(db.connection.queries)
> 4

Actually, the documentation also mentions this specifically but having been bitten by this shortcoming numerous times I think it's worth repeating:

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

So what are the options? Well luckily, Django 1.4 introduces a new prefetch_related QuerySet method:

prefetch_related on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey.

If you are stuck with Django < 1.4, there is a more laborious option. You can roll a solution yourself using an approach such as this