Imagine we have this version of the Post
model, with a simple CharField
for the tags...
class Post(models.Model):
name = models.CharField(max_length=200)
tags = models.CharField(max_length=200, blank=True) # not ArrayField 😞
def __str__(self):
return self.name
...and by convention, the tags are filled out with a pipe separator, like this: "thoughts|django"
. It turns out we can almost query this field the same way as is done in the docs, but there's a gotcha:
>>> Post.objects.create(name="First post", tags="thoughts|django")
>>> Post.objects.create(name="Second post", tags="thoughts")
>>> Post.objects.create(name="Third post", tags="tutorial|django")
>>> Post.objects.create(name="Fourth post", tags="though")
>>> Post.objects.filter(tags__contains="thoughts")
<QuerySet [<Post: First post>, <Post: Second post>]>
>>> Post.objects.filter(tags__contains="django")
<QuerySet [<Post: First post>, <Post: Third post>]>
>>> Post.objects.filter(tags__contains="django").filter(tags__contains="thoughts")
<QuerySet [<Post: First post>]>
>>> Post.objects.filter(tags__contains="though")
<QuerySet [<Post: Second post>, <Post: Fourth post>]>
The gotcha is that substrings match! We probably weren't expecting to get Second post
back in that last query.
So how do we get around this? One solution would be to OR together a bunch of filters:
>>> Post.objects.filter(
>>> Q(tags="though") |
>>> Q(tags__contains="|though|") |
>>> Q(tags__startswith="though|") |
>>> Q(tags__endswith="|though")
>>> )
<QuerySet [<Post: Fourth post>]>
This works, but it's not great. It's ugly, and while I think I've covered every case, it doesn't feel very safe. The same could be said for a regex solution:
>>> Post.objects.filter(tags__regex=r"(?:^|[\|]+)though(?:[\|]+|$)")
<QuerySet [<Post: Fourth post>]>
Again, it probably works, but... well, regex is gross, and hard to understand when you come back to this code months from now, or hand it off to someone else.
A much better solution is to use some built-in power we get from Django and Postgres: annotations and database functions! Specifically, we're going to use Postgres' string_to_array
function.
>>> posts = Post.objects.annotate(
>>> tag_list=Func(
>>> F("tags"),
>>> Value("|"),
>>> function="string_to_array",
>>> output_field=ArrayField(CharField()),
>>> )
>>> )
>>> posts.filter(tag_list__contains=["though"])
<QuerySet [<Post: Fourth post>]>
Note that we query the tag_list
queryset using the ArrayField
syntax (with the brackets), because it is an ArrayField
. This means we can also use all the other powerful lookups like contained_by
, overlap
, etc.
If this is the kind of query you're doing a lot, it may be worth adding a custom manager that automatically annotates the queryset with this field.
class TaggedPostManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(
tag_list=Func(
F("tags"),
Value("|"),
function="string_to_array",
output_field=ArrayField(CharField()),
)
)
class Post(models.Model):
name = models.CharField(max_length=200)
tags = CharField(max_length=200, blank=True)
objects = models.Manager()
objects_with_tags = TaggedPostManager()
def __str__(self):
return self.name
Now we can query our Post
s like so:
>>> Post.objects.create(name="First post", tags="thoughts|django")
>>> Post.objects.create(name="Second post", tags="thoughts")
>>> Post.objects.create(name="Third post", tags="tutorial|django")
>>> Post.objects.create(name="Fourth post", tags="though")
>>> Post.objects_with_tags.filter(tag_list__contians=["though"])
<QuerySet [<Post: Fourth post>]>
>>> Post.objects_with_tags.filter(tag_list__len=2)
<QuerySet [<Post: First post>, <Post: Third post>]>
Handy!