By Elan Hasson
Senior Software Engineer
Imagine a deployed application consisting of a single process hosting a web service and background job processor within a single service component. The background job processor accesses shared resources, such as tasks stored in a database table acting as a work queue. However, when multiple replicas of the application run simultaneously, each replica may attempt to process the same task, causing race conditions or duplicated work. These problems are compounded when the application is implemented as a monolith and cannot split the worker code from the web service or reduce the number of replicas, making resource coordination even more challenging.
While different solutions exist, leases offer a simple and scalable solution to resource management challenges. In this article, we’ll cover how to use leases to share resources in multi-instance environments to ensure high availability and scalability in modern applications.
Leases provide temporary, time-bound access to resources, automatically revoking access after a fixed duration unless explicitly renewed. This approach prevents race conditions and ensures that resources are released even in the case of failure.
The sample application demonstrates an end-to-end implementation of a centralized lease service, a task management service, and task processors. Also included is a tool that generates fake work (random sleep intervals) for the processors to consume.
Below we explore the various components of the sample application that demonstrate how leases can be used in a Task Processor service deployed on DigitalOcean’s App Platform.
We are using PostgreSQL because it is a battle-tested implementation of a fully ACID-compliant transactional database with strong guarantees. Instead of implementing ACID ourselves, we delegate the complexity of the locking mechanism to the database.
The application uses Prisma to:
The Prisma abstractions do not support all of the Postgres-specific functionality we’re using and in those cases we’re using queryRaw.
This service is crucial for managing the lifecycle of worker processes, ensuring that only one instance of a worker can hold a lease on a task (resource) at any given time, thus preventing race conditions and ensuring safe state transitions.
The implementation found in the repo is designed to be generalized and has no knowledge of any of the task processing code.This service is essential for managing worker processes’ lifecycles. It ensures that only one worker instance can hold a lease on a task (resource) at a time, preventing race conditions and ensuring safe state transitions. The API routes enable the creation, renewal, release, and retrieval of leases.
The leases client is a crucial component of the lease management service, responsible for interacting with the lease API to acquire, renew, and release leases. It provides a structured way to manage leases, ensuring that resources are properly leased and renewed as needed. An added bonus feature of the leases client is it can optionally automatically renew the lease for you.
Below is the schema for the leases
table:
CREATE TABLE IF NOT EXISTS public.leases
(
id integer NOT NULL DEFAULT nextval('leases_id_seq'::regclass),
resource text COLLATE pg_catalog."default" NOT NULL,
holder text COLLATE pg_catalog."default",
created_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
renewed_at timestamp(3) without time zone,
released_at timestamp(3) without time zone,
expires_at timestamp(3) without time zone,
CONSTRAINT leases_pkey PRIMARY KEY (id)
)
/api/leases
GET
: Fetches the status of the worker and the list of all leases.
POST
: Creates a new lease or updates an existing one if it has expired.
Request body:
{
"resource": "resource_name",
"holder": "holder_name"
}
The POST
method in the lease service is designed to create a new lease or update an existing one if it has expired. Here’s how it works and why it’s implemented this way:
INSERT INTO leases (resource, holder, expires_at)
VALUES (${resource}, ${holder}, NOW() + INTERVAL '30 seconds')
ON CONFLICT (resource)
DO UPDATE
SET
holder = ${holder},
created_at = NOW(),
renewed_at = null,
released_at = null,
expires_at = NOW() + INTERVAL '30 seconds'
WHERE leases.expires_at <= NOW()
RETURNING *;
INSERT INTO
leases statement attempts to insert a new lease with the specified resource, holder, and an expiration time set to 30 seconds from the current time (NOW() + INTERVAL '30 seconds')
.ON CONFLICT
(resource) clause specifies that if a lease with the same resource already exists, the conflict should be resolved by updating the existing lease.DO UPDAT
E clause updates the existing lease with the new holder, resets the created_at
timestamp to the current time, and clears the renewed_at
and released_at
fields. The expires_at
field is also updated to 30 seconds from the current time.WHERE leases.expires_at <= NOW()
condition ensures that the update only occurs if the existing lease has already expired. This prevents overwriting an active lease.RETURNING *
clause returns the newly inserted or updated lease record.ON CONFLICT
ensures that the creation or update of a lease is atomic. This means that the operation is completed in a single step, reducing the risk of race conditions.ON CONFLICT
clause allows the system to handle cases where multiple requests might try to create a lease for the same resource simultaneously. By updating the existing lease if it has expired, we ensure that only one active lease exists for a resource at any given time.WHERE leases.expires_at <= NOW()
condition ensures that only expired leases are updated. This prevents active leases from being prematurely overwritten, maintaining the integrity of the lease system.This approach ensures that the lease service can reliably create and update leases while preventing race conditions and maintaining the integrity of the lease system.
/api/leases/active
GET
: Fetches the list of active leases.
/api/leases/expired
GET
: Fetches the list of expired leases.
/api/leases/renew
PUT
: Renews a lease by extending its expiration time.
Body:
{
"resource": "resource_name",
"holder": "holder_name"
}
The PUT
method in the lease service is designed to renew an existing lease by extending its expiration time. Here’s how it works and why it’s implemented this way:
UPDATE leases
SET
renewed_at = NOW(),
expires_at = NOW() + INTERVAL '30 seconds'
WHERE
holder = ${holder}
AND resource = ${resource}
AND expires_at > NOW()
AND released_at is null
RETURNING *;
UPDATE
leases statement specifies that we are updating the leases table.renewed_at
is set to the current timestamp (NOW())
, indicating when the lease was last renewed.expires_at
is set to 30 seconds from the current timestamp (NOW() + INTERVAL '30 seconds'
), extending the lease’s expiration time.WHERE holder = ${holder} AND resource = ${resource} AND expires_at > NOW() AND released_at is null:
holder = ${holder}
: Matches the lease with the specified holder.resource = ${resource}
: Matches the lease with the specified resource.expires_at > NOW()
: Ensures that only leases that have not yet expired are updated.released_at is null
: Ensures that only leases that have not been releeased are updated.RETURNING *
: Returns the updated lease record. This is useful for confirming the update and providing the updated lease details in the response.WHERE
clause ensures that only leases that have not already been released are updated. This prevents the release of leases that are already marked as released.released_at
and expires_at
timestamps ensure that the lease’s release time and expiration time are accurately tracked.RETURNING *
clause provides immediate feedback on the updated lease, allowing the application to respond with the updated lease details./api/leases/renewed
GET
: Fetches the list of renewed leases.
/api/leases/released
GET
: Fetches the list of released leases.
/api/leases/[id]
GET
: Fetches a lease by its ID.
DELETE
: Releases (deletes) a lease by its ID. The behavior is the same as the DELETE /api/leases/release
route, except the lease’s ID is included in the WHERE
clause of the UPDATE
statement.
Body:
{
"resource": "resource_name",
"holder": "holder_name"
}
The Task Service is responsible for managing tasks within the application. It provides various API endpoints to create, update, retrieve, and manage the lifecycle of tasks. The Task Service encapsulates Leases API to guarantee a single task can only be processed by a single worker. Of course, this assumes that the workers obey the rules of the API and do not perform work on the task once the lease has expired.
Below is the database schema for the tasks
table:
CREATE TABLE IF NOT EXISTS public.tasks
(
id integer NOT NULL DEFAULT nextval('tasks_id_seq'::regclass),
task_data jsonb NOT NULL,
scheduled_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
processor text COLLATE pg_catalog."default",
last_heartbeat_at timestamp(3) without time zone,
must_heartbeat_before timestamp(3) without time zone,
started_at timestamp(3) without time zone,
processed_at timestamp(3) without time zone,
task_output jsonb,
CONSTRAINT tasks_pkey PRIMARY KEY (id)
)
POST /api/tasks/next
The POST
method in this route retrieves the next available tasks for processing, ensurring that a lease is acquired for the task.
The Get Next Task
query is designed to retrieve the next available task for processing:
SELECT *
FROM tasks
WHERE
processed_at is null
ORDER BY scheduled_at ASC
LIMIT 1
OFFSET ${tasksToSkip}
FOR UPDATE;
SELECT * FROM tasks
WHERE processed_at is null
processed_at
column is null
for tasks that are still pending processing.ORDER BY scheduled_at ASC
scheduled_at
timestamp in ascending order. This ensures that tasks scheduled earlier are prioritized for processing.LIMIT 1
OFFSET ${tasksToSkip}
${tasksToSkip}
variable is used to dynamically adjust the number of tasks to skip. This is useful for iterating through tasks if the first few are already leased or otherwise unavailable.FOR UPDATE
scheduled_at
timestamp, the query ensures that tasks scheduled earlier are processed first. This helps in maintaining the correct order of task execution.FOR UPDATE
clause is crucial for preventing race conditions. It ensures that once a task is selected, it is locked for the current transaction, preventing other processors from picking it up simultaneously, allowing us to acquire a lease for the resource (task).OFFSET ${tasksToSkip}
clause allows the query to skip tasks that might already be leased or otherwise unavailable. This helps in efficiently finding the next available task without repeatedly querying the same tasks./api/tasks/[id]/heartbeat
The PUT method route is used to update the heartbeat of a task. This ensures that the task is still being processed by the correct processor and renews the lease to prevent it from being picked up by another processor. If a task processor fails to heartbeat in time, the underlying lease expires, and the processor should abandon and discard all work and get a new task to process.
$transaction
method to ensure atomicity and consistency.FOR UPDATE
clause to lock the row.200 OK
response with a message indicating that the task is not assigned to the processor.409 Conflict
response with a message indicating that the task has already been processed.PUT
request to the lease service to renew the lease for the task.404
, it returns a 500 Internal Server Error
response with an appropriate error message.409 Conflict
response with a message indicating that the task lease has expired.lastHeartBeatAt
and mustHeartBeatBefore
fields with the renewed lease timestamps.202 Accepted
status.FOR UPDATE
clause locks the task row, preventing other transactions from modifying it simultaneously.PUT /api/tasks/[id]/complete
The PUT method route is used to mark a task as completed. It ensures that the task is assigned to the correct processor, renews the lease to prevent other processors from picking it up, and updates the task’s status.
$transaction
method to ensure atomicity and consistency.FOR UPDATE
clause to lock the row.200 OK
response with a message indicating that the task was not found.200 OK
response with a message indicating that the task is not assigned to the processor.409 Conflict
response with a message indicating that the task has already been processed.PUT
request to the lease service to renew the lease for the task.500 Internal Server Error
response with an appropriate error message.409 Conflict
response with a message indicating that the task lease has expired.lastHeartBeatAt
, mustHeartBeatBefore
, processedAt
, and taskOutput
fields with the renewed lease timestamps and the provided task output.202 Accepted
status.FOR UPDATE
clause locks the task row, preventing other transactions from modifying it simultaneously.FOR UPDATE
, allowing the task processor to retry and not worry about the lease expiring out from under it.These routes use simple queries to work with the task list:
/api/tasks/[id]
)/api/tasks/processed
)/api/tasks/started
)/api/tasks
)The task worker is a worker process used to continuously fetch, process, and complete tasks from the task service. It ensures that tasks are processed reliably by periodically sending heartbeats to renew leases and handling potential errors gracefully.
The worker follows a structured loop to continuously fetch and process tasks while ensuring safe execution through a leasing mechanism. This prevents multiple workers from handling the same task simultaneously and ensures that tasks are completed reliably. Below, we break down how the worker operates.
The worker begins by querying the task queue to retrieve the next available task. If no task is found, it waits for a short period before retrying. This prevents unnecessary load on the task queue while ensuring new tasks are picked up promptly.
Once a task is assigned, the worker starts a heartbeat loop to maintain its lease. This lease prevents other workers from claiming the same task while it’s being processed. The worker must send periodic heartbeat signals to renew its lease. If it fails to do so, the lease expires, and the task becomes available for reassignment.
During execution, the worker processes the task step by step. It introduces:
If the worker encounters a simulated failure, it exits immediately, forcing the task to be retried later. Otherwise, it continues processing until completion.
Once the worker finishes executing the task:
This design ensures:
By using this approach, we achieve fault-tolerant, distributed task execution that can scale efficiently.
It’s essential to test how workers handle tasks under real-world conditions. The Task Generator is designed to simulate a workload by continuously creating tasks with random execution times. These tasks mimic real jobs that a worker might process, such as resizing images, transcoding video, sending emails, or any other work that must be done reliably.
By using a lease-based mechanism, we ensure that only one instance of the generator is actively producing tasks at any time, preventing duplication while still allowing multiple instances to be deployed. This setup helps us observe system behavior, validate worker performance, and uncover potential issues in task execution and lease management.
In a distributed system where multiple instances of a service are running behind a load balancer, maintaining a singleton process (one that should run only once at a time) is challenging. The Task Generator demonstrates this problem by ensuring that only one instance can actively generate tasks while all instances respond to status queries, albeit incorrectly.
The task generator is responsible for:
{"sleep_duration_seconds": 5}
) to define processing duration.When checking the generator’s status, you may notice inconsistent reports:
STARTED
, indicating an instance has the lease.STOPPED
, because that request hit an instance without the lease.This behavior highlights a fundamental challenge of running a singleton service in a load-balanced environment.
To stop the generator, the request must reach the instance holding the lease. Since API requests are distributed randomly:
The complete application is available in the GitHub repository. To deploy the application on DigitalOcean’s App Platform:
Deploy to DO button
in the repository’s README.Note: Be sure to delete the app when you’re finished. Not doing so will result in real charges to your DigitalOcean account.
Leases provide an elegant solution to managing shared resources in multi-instance environments. By implementing leases with PostgreSQL, you can achieve safe, time-bound access to resources, automated expiration, and simplified recovery from failures. DigitalOcean’s managed database offerings make adopting this approach straightforward and reliable. Try out the sample application to experience the benefits of lease-based resource management firsthand.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Building App Platform.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.