Misconceptions with "select_related" in Django

Django's select_related QuerySet method is a great way to reduce the query count when joining tables but it should come with small print in the documentation. This post goes over that small print.

TLDR

You can only use select_related in forward ForeignKey and OneToOne relationships as well as backward OneToOne relationships. You can't use it in any ManyToMany relationships or in reverse ForeignKey relationships.

Django’s select_related QuerySet method is a great way to easily reduce the query count when you know you will be following ForeignKey relationships after the query has been performed. From the Django documentation:

select_related is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.

Take this simple example:

class Genre(...):
  name = models.CharField(...)
 
class Album(...):
  name = models.CharField(...)
  genre = models.ForeignKey(...)

By using select_related when we fetch the albums, we can avoid any further queries that would arise from traversing the ForeignKey relationship on an album. Here's a run-through example:

from django import db
 
rock = Genre(name="Rock")
folk = Genre(name="Folk")
zep = Album(name="Led Zeppelin", genre=rock)
ok = Album(name="Ok Computer", genre=rock)
ck = Album(name="Pink Moon", genre=folk)
 
# Without using select_reated, there is no join in the query, and we can see that there a total
# of 4 queries after we loop through our albums printing their genres
albums = Album.objects.all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id" FROM "sandbox_album"
for album in albums:
  print album.genre.name
print len(db.connection.queries)
> 4
db.reset_queries()
 
# When using select_reated, there is an join in the query and there is only one
# query in total, even after looping through the albums
albums = Album.objects.select_related("genre").all()
print albums.query
> SELECT "sandbox_album"."id", "sandbox_album"."name", "sandbox_album"."genre_id", "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_album" INNER JOIN "sandbox_genre" ON ("sandbox_album"."genre_id" = "sandbox_genre"."id")
for album in albums:
  print album.genre.name
print len(db.connection.queries)
> 1

So surely you can optimise the reverse ForeignKey relationships too? … No!

Take a look:

genres = Genre.objects.select_related('album').all()
print genres.query
> SELECT "sandbox_genre"."id", "sandbox_genre"."name" FROM "sandbox_genre"
 
for genre in genres:
  for album in genre.album_set.all():
    print album.genre.name
print len(db.connection.queries)
> 4

Actually, the documentation also mentions this specifically but having been bitten by this shortcoming numerous times I think it's worth repeating:

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

So what are the options? Well luckily, Django 1.4 introduces a new prefetch_related QuerySet method:

prefetch_related on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey.

If you are stuck with Django < 1.4, there is a more laborious option. You can roll a solution yourself using an approach such as this