Separate cache key storage from result storage
To store cache records separately from the cached value, you can configure a cache policy to use a custom storage location. Here’s an example of a cache policy configured to store cache records in a local directory:~/prefect/storage
.
To store cache records in a remote object store such as S3, pass a storage block instead:
Isolate cache access
You can control concurrent access to cache records by setting theisolation_level
parameter on the cache policy. Prefect supports two isolation levels: READ_COMMITTED
and SERIALIZABLE
.
By default, cache records operate with a READ_COMMITTED
isolation level. This guarantees that reading a cache record will see the latest committed cache value,
but allows multiple executions of the same task to occur simultaneously.
Consider the following example:
SERIALIZABLE
isolation level. This ensures that only one execution of a task occurs at a time for a given cache
record via a locking mechanism.
When setting isolation_level
to SERIALIZABLE
, you must also provide a lock_manager
that implements locking logic for your system.
Here’s an updated version of the previous example that uses SERIALIZABLE
isolation:
Locking in a distributed settingTo manage locks in a distributed setting, you will need to use a storage system for locks that is accessible by all of your execution infrastructure.We recommend using the
RedisLockManager
provided by prefect-redis
in conjunction with a shared Redis instance:Coordinate caching across multiple tasks
To coordinate cache writes across tasks, you can run multiple tasks within a single transaction.process_data
task after the load_data
task has succeeded.
However, because caches are only written to when a transaction is committed, the load_data
task will not write a result to its cache key location until
the process_data
task succeeds as well.
On a subsequent run with fail=False
, both tasks will be re-executed and the results will be cached.
Handling Non-Serializable Objects
You may have task inputs that can’t (or shouldn’t) be serialized as part of the cache key. There are two direct approaches to handle this, both of which based on the same idea. You can adjust the serialization logic to only serialize certain properties of an input:- Using a custom cache key function:
- Using Pydantic’s custom serialization on your input types:
- Use Pydantic models when you want consistent serialization across your application
- Use custom cache key functions when you need different caching logic for different tasks