Splitting strings with Postgres and Django

Jan. 31, 2023

The Django docs give an example of querying an `ArrayField` with some really powerful lookups. But what if we inherited a project, and the prior developer didn't know about this Postgres-specific field type?

Imagine we have this version of the Post model, with a simple CharField for the tags...

class Post(models.Model):
    name = models.CharField(max_length=200)
    tags = models.CharField(max_length=200, blank=True)  # not ArrayField 😞

    def __str__(self):
        return self.name

...and by convention, the tags are filled out with a pipe separator, like this: "thoughts|django". It turns out we can almost query this field the same way as is done in the docs, but there's a gotcha:

>>> Post.objects.create(name="First post", tags="thoughts|django")
>>> Post.objects.create(name="Second post", tags="thoughts")
>>> Post.objects.create(name="Third post", tags="tutorial|django")
>>> Post.objects.create(name="Fourth post", tags="though")

>>> Post.objects.filter(tags__contains="thoughts")
<QuerySet [<Post: First post>, <Post: Second post>]>

>>> Post.objects.filter(tags__contains="django")
<QuerySet [<Post: First post>, <Post: Third post>]>

>>> Post.objects.filter(tags__contains="django").filter(tags__contains="thoughts")
<QuerySet [<Post: First post>]>

>>> Post.objects.filter(tags__contains="though")
<QuerySet [<Post: Second post>, <Post: Fourth post>]>

The gotcha is that substrings match! We probably weren't expecting to get Second post back in that last query.

So how do we get around this? One solution would be to OR together a bunch of filters:

>>> Post.objects.filter(
>>>     Q(tags="though") |
>>>     Q(tags__contains="|though|") |
>>>     Q(tags__startswith="though|") |
>>>     Q(tags__endswith="|though")
>>> )
<QuerySet [<Post: Fourth post>]>

This works, but it's not great. It's ugly, and while I think I've covered every case, it doesn't feel very safe. The same could be said for a regex solution:

>>> Post.objects.filter(tags__regex=r"(?:^|[\|]+)though(?:[\|]+|$)")
<QuerySet [<Post: Fourth post>]>

Again, it probably works, but... well, regex is gross, and hard to understand when you come back to this code months from now, or hand it off to someone else.

Use the power of Postgres!

A much better solution is to use some built-in power we get from Django and Postgres: annotations and database functions! Specifically, we're going to use Postgres' string_to_array function.

>>> posts = Post.objects.annotate(
>>>     tag_list=Func(
>>>         F("tags"),
>>>         Value("|"),
>>>         function="string_to_array",
>>>         output_field=ArrayField(CharField()),
>>>     )
>>> )
>>> posts.filter(tag_list__contains=["though"])
<QuerySet [<Post: Fourth post>]>

Note that we query the tag_list queryset using the ArrayField syntax (with the brackets), because it is an ArrayField. This means we can also use all the other powerful lookups like contained_by, overlap, etc.

If this is the kind of query you're doing a lot, it may be worth adding a custom manager that automatically annotates the queryset with this field.

class TaggedPostManager(models.Manager):
    def get_queryset(self):
        return super().get_queryset().annotate(
            tag_list=Func(
                F("tags"),
                Value("|"),
                function="string_to_array",
                output_field=ArrayField(CharField()),
            )
        )


class Post(models.Model):
    name = models.CharField(max_length=200)
    tags = CharField(max_length=200, blank=True)

    objects = models.Manager()
    objects_with_tags = TaggedPostManager()

    def __str__(self):
        return self.name

Now we can query our Posts like so:

>>> Post.objects.create(name="First post", tags="thoughts|django")
>>> Post.objects.create(name="Second post", tags="thoughts")
>>> Post.objects.create(name="Third post", tags="tutorial|django")
>>> Post.objects.create(name="Fourth post", tags="though")

>>> Post.objects_with_tags.filter(tag_list__contians=["though"])
<QuerySet [<Post: Fourth post>]>

>>> Post.objects_with_tags.filter(tag_list__len=2)
<QuerySet [<Post: First post>, <Post: Third post>]>

Handy!

Return to blog