Bulk update or create in Django 4.1

Sept. 6, 2022

Tags

Django 4.1 quietly added a really powerful feature to the bulk_create queryset method that effectively turns it into bulk_update_or_create.

The release notes do not make a big deal about this. Buried in the “Minor updates” section, you’ll find:

QuerySet.bulk_create() now supports updating fields when a row insertion fails uniqueness constraints.

This doesn’t sound very exciting, but it turns out it’s a huge performance win for certain use cases. Let’s look at an example.

I’m working on a side project related to public transit, and I need to periodically request “static” route and stop data from transit agencies. For context, TriMet in Portland OR has just over 9,000 stops across its system with various attributes. Here’s a simplified version of the code to update stop data from a given transit agency:

for stop_data in stop_list:
    savable_data = map_field_values(
        stop_data, agency.stop_field_mapping
    )
    stop_identifier = savable_data.pop("stop_identifier")
    stop, created = Stop.objects.update_or_create(
        agency=agency,
        stop_identifier=stop_identifier,
        defaults=savable_data,
    )

Since we’re only doing this occasionally, this is okay, but we are making O(n) queries to update this data (note there have been slightly hacky ways around this, but they're not ideal). Here’s how we can significantly improve the performance using the new features of bulk_create:

stops = []
for stop_data in stop_list:
    savable_data = map_field_values(
        stop_data, agency.stop_field_mapping
    )
    stop = Stop(agency=agency, **savable_data)
    stops.append(stop)

Stop.objects.bulk_create(
    stops,
    update_conflicts=True,
    update_fields=agency.stop_field_mapping.keys(),
    unique_fields=[”agency_id”, “stop_identifier”],  # note we must use agency_id not agency
)

Boom! We just updated (or created) the data for 9,000 stops in a single query!

Return to blog