Async task queues @ Insurami 🤖

Introduction

At Insurami, we provide landlords and their tenants with an alternative to traditional lease deposits. Landlords can refer new and existing tenants on our platform deposit guarantee quote.

What this means in our backend is a sequence of I/O operations and network requests to third-party services that can take more time than the user might expect. Additionally, there could be situations where a HTTP request fails temporarily, in which case the entire processing flow is broken forever and data in the backend is rendered in an invalid state.

To tackle this problem, we wanted to breakdown the backend processing code into individual steps that can be run independently in a sequence of steps, retried if necessary and the entire flow rolled-back if needed. Following these principles we can start looking at our backend in a transactional way, giving us confidence that our data is consistent.

Implementation

Task queues provide a way to execute code asynchronously by sending messages to a queue, which are picked up and processed by a process running in the background. There are many task queue frameworks available, but we chose Celery as it is a very established solution and compatible with AWS message queues, which we have already integrated in our platform. The AWS Simple Queue System will work as our message broker, facilitating the communication between processes.

Celery provides the means to execute code blocks sequentially with the chain method. When the execution chain is started, every block of code runs as a task, and its result passed as the input to the next block.

Every task can be configured to retry when exceptions are raised inside them. It is possible to provide a maximum number of retries, so tasks don't keep failing indefinitely when recovery is not possible.

It is also possible to set an interval between retries, to avoid repeating executions too close in time. A well-known strategy to calculate the interval between retries is to have an exponential back-off, so tasks are allowed a longer interval as more retries are fired. The algorithm to generate such intervals was introduced in this post. Celery version 4 introduces a native auto-retry option with exponential backoff.

The code

We have briefly exposed the main concepts on the implementation of async task queues. Now is the time to see what all that looks like in terms of code:

To add the auto-retry configuration and a custom on_failure handler we override the celery.Task class with our own InsuramiTask class.