{# Blog title goes here #}

GenericForeignKey Deep Filtering

One of the many "batteries" Django comes with is GenericForeignKey (often shortened to GFK). I'm not necessarily the biggest fan of that particular battery (that might be a topic for another post?), but it's hard to deny that GFKs can enable some pretty nifty use-cases. Recently at work I was tasked with implementing a kind of deep filtering of a model that used a GFK, and came up with a technique that seems generic (hehe) enough to be worth sharing.

Quick refresher: regular foreign keys

In order to show the limitations of GFKs that led me to create my "deep filtering" technique, let's first start with a quick example involving a regular ForeignKey. I'll go for the classic Book model, this time with a related Review model that will come in handy later.

from django.db import models

class Book(models.Model):
    author = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    title = models.CharField(max_length=200)


class Review(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    reviewer = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    score = models.PositiveIntegerField()

Now if you want to list all reviews attached to a book whose title contains the word "Django", you can do Review.objects.filter(book__title__icontains="django"). The nifty __ double-underscore syntax of Django's ORM enables "jumping" over any foreign key. You can even do it multiple times. Review.objects.filter(book__author__username="baptiste") will list all reviews attached to a book authored by the user bmispelon. Neat!

Generic Foreign Keys

Whereas a regular foreign key points to a single model class (boring!), a generic foreign key can point to any model you wish (exciting!). Let's try an example, inspired by the real-life LogEntry model from Django's admin:

from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models


class LogEntry(models.Model):
    timestamp = models.DateTimeField(auto_now_add=True)
    user = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    affected = GenericForeignKey("content_type", "object_id")
    message = models.TextField()

The LogEntry model is meant to track events on different objects within our codebase. It has a timestamp to store when the even happened, a user to know who triggered the event, a message where we can store a description of the event, and finally an affected generic foreign key that lets us attach the log entry to any model.

"Deep filtering"

The problem I was trying to solve was that I wanted to get a list of all log entries that "affected" a given user. This could be because the entry was attached directly to the user instance, but it could also be because it was attached to a book whose author was the user, or a review from the given user, ...

With a regular foreign key, we could have used __ filtering like we showed in the previous section, but that's not possible anymore with a generic foreign key.

If we restrict the problem to a single model, it becomes easier to solve. Say for example that we want to get all log entries that are attached directly to a given user USER (an instance of the django.contrib.auth.models.User model), we can do:

LogEntry.objects.filter(affected=USER)

Though it's a bit more complicated, it's also possible to get all entries that are attached to a book where USER is the author:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Book),
    object_id__in=Book.objects.filter(author=USER)
)

This approach works also for reviews by USER:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Review),
    object_id__in=Review.objects.filter(reviewer=USER)
)

Or even getting entries attached to a review for one of USER's books:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Review),
    object_id__in=Review.objects.filter(book__author=USER)
)

CASE WHEN to the rescue

The idea is to generalize the approach of the last three examples by creating a mapping of model -> Q object, where the Q object is used to filter down the model queryset:

from django.contrib.auth.models import User
from django.contrib.contenttypes.models import ContentType
from django.db.models import BooleanField, Q, Value
from django.db.models.expressions import Case, When


def is_affected(user):
    Q_OBJS = {
        Book: Q(author=user),
        Review: Q(book__author=user) | Q(reviewer=user),
    }

    whens = [
        # The entry is directly attached to the user
        When(content_type=ContentType.objects.get_for_model(User), then=Q(object_id=user.pk))
    ]

    for model_class, qobj in Q_OBJS.items():
        content_type = ContentType.objects.get_for_model(model_class)
        object_ids = model_class.objects.filter(qobj)
        whens.append(When(content_type=content_type, then=object_id__in=object_ids))

    return Case(*whens, default=Value(False), output_field=BooleanField())

Now that we have this function, getting a list of log entries that affect USER becomes as simple as:

LogEntry.objects.filter(is_affected(USER))

Voilà!