tasks
prefect.tasks
Module containing the base workflow task class and decorator - for most use cases, using the @task
decorator is preferred.
Functions
task_input_hash
A task cache key implementation which hashes all inputs to the task using a JSON or cloudpickle serializer. If any arguments are not JSON serializable, the pickle serializer is used as a fallback. If cloudpickle fails, this will return a null key indicating that a cache key could not be generated for the given inputs.
Args:
context
: the activeTaskRunContext
arguments
: a dictionary of arguments to be passed to the underlying task
Returns:
- a string hash if hashing succeeded, else
None
exponential_backoff
A task retry backoff utility that configures exponential backoff for task retries. The exponential backoff design matches the urllib3 implementation.
Args:
backoff_factor
: the base delay for the first retry, subsequent retries will increase the delay time by powers of 2.
Returns:
- a callable that can be passed to the task constructor
task
Decorator to designate a function as a task in a Prefect workflow.
This decorator may be used for asynchronous or synchronous functions.
Args:
name
: An optional name for the task; if not provided, the name will be inferred from the given function.description
: An optional string description for the task.tags
: An optional set of tags to be associated with runs of this task. These tags are combined with any tags defined by aprefect.tags
context at task runtime.version
: An optional string specifying the version of this task definitioncache_key_fn
: An optional callable that, given the task run context and call parameters, generates a string key; if the key matches a previous completed state, that state result will be restored instead of running the task again.cache_expiration
: An optional amount of time indicating how long cached states for this task should be restorable; if not provided, cached states will never expire.task_run_name
: An optional name to distinguish runs of this task; this name can be provided as a string template with the task’s keyword arguments as variables, or a function that returns a string.retries
: An optional number of times to retry on task run failureretry_delay_seconds
: Optionally configures how long to wait before retrying the task after failure. This is only applicable ifretries
is nonzero. This setting can either be a number of seconds, a list of retry delays, or a callable that, given the total number of retries, generates a list of retry delays. If a number of seconds, that delay will be applied to all retries. If a list, each retry will wait for the corresponding delay before retrying. When passing a callable or a list, the number of configured retry delays cannot exceed 50.retry_jitter_factor
: An optional factor that defines the factor to which a retry can be jittered in order to avoid a “thundering herd”.persist_result
: A toggle indicating whether the result of this task should be persisted to result storage. Defaults toNone
, which indicates that the global default should be used (which isTrue
by default).result_storage
: An optional block to use to persist the result of this task. Defaults to the value set in the flow the task is called in.result_storage_key
: An optional key to store the result in storage at when persisted. Defaults to a unique identifier.result_serializer
: An optional serializer to use to serialize the result of this task for persistence. Defaults to the value set in the flow the task is called in.timeout_seconds
: An optional number of seconds indicating a maximum runtime for the task. If the task exceeds this runtime, it will be marked as failed.log_prints
: If set,print
statements in the task will be redirected to the Prefect logger for the task run. Defaults toNone
, which indicates that the value from the flow should be used.refresh_cache
: If set, cached results for the cache key are not used. Defaults toNone
, which indicates that a cached result from a previous execution with matching cache key is used.on_failure
: An optional list of callables to run when the task enters a failed state.on_completion
: An optional list of callables to run when the task enters a completed state.retry_condition_fn
: An optional callable run when a task run returns a Failed state. Should returnTrue
if the task should continue to its retry policy (e.g.retries=3
), andFalse
if the task should end as failed. Defaults toNone
, indicating the task should always continue to its retry policy.viz_return_value
: An optional value to return when the task dependency tree is visualized.asset_deps
: An optional list of upstream assets that this task depends on.
Returns:
- A callable
Task
object which, when called, will submit the task for execution.
Examples:
Define a simple task
Define an async task
Define a task with tags and a description
Define a task with a custom name
Define a task that retries 3 times with a 5 second delay between attempts
Define a task that is cached for a day based on its inputs
Classes
TaskRunNameCallbackWithParameters
Methods:
is_callback_with_parameters
TaskOptions
A TypedDict representing all available task configuration options.
This can be used with Unpack
to provide type hints for **kwargs.
Task
A Prefect task definition.
Wraps a function with an entrypoint to the Prefect engine. Calling this class within a flow function creates a new task run.
To preserve the input and output types, we use the generic type variables P and R for “Parameters” and “Returns” respectively.
Args:
fn
: The function defining the task.name
: An optional name for the task; if not provided, the name will be inferred from the given function.description
: An optional string description for the task.tags
: An optional set of tags to be associated with runs of this task. These tags are combined with any tags defined by aprefect.tags
context at task runtime.version
: An optional string specifying the version of this task definitioncache_policy
: A cache policy that determines the level of caching for this taskcache_key_fn
: An optional callable that, given the task run context and call parameters, generates a string key; if the key matches a previous completed state, that state result will be restored instead of running the task again.cache_expiration
: An optional amount of time indicating how long cached states for this task should be restorable; if not provided, cached states will never expire.task_run_name
: An optional name to distinguish runs of this task; this name can be provided as a string template with the task’s keyword arguments as variables, or a function that returns a string.retries
: An optional number of times to retry on task run failure.retry_delay_seconds
: Optionally configures how long to wait before retrying the task after failure. This is only applicable ifretries
is nonzero. This setting can either be a number of seconds, a list of retry delays, or a callable that, given the total number of retries, generates a list of retry delays. If a number of seconds, that delay will be applied to all retries. If a list, each retry will wait for the corresponding delay before retrying. When passing a callable or a list, the number of configured retry delays cannot exceed 50.retry_jitter_factor
: An optional factor that defines the factor to which a retry can be jittered in order to avoid a “thundering herd”.persist_result
: A toggle indicating whether the result of this task should be persisted to result storage. Defaults toNone
, which indicates that the global default should be used (which isTrue
by default).result_storage
: An optional block to use to persist the result of this task. Defaults to the value set in the flow the task is called in.result_storage_key
: An optional key to store the result in storage at when persisted. Defaults to a unique identifier.result_serializer
: An optional serializer to use to serialize the result of this task for persistence. Defaults to the value set in the flow the task is called in.timeout_seconds
: An optional number of seconds indicating a maximum runtime for the task. If the task exceeds this runtime, it will be marked as failed.log_prints
: If set,print
statements in the task will be redirected to the Prefect logger for the task run. Defaults toNone
, which indicates that the value from the flow should be used.refresh_cache
: If set, cached results for the cache key are not used. Defaults toNone
, which indicates that a cached result from a previous execution with matching cache key is used.on_failure
: An optional list of callables to run when the task enters a failed state.on_completion
: An optional list of callables to run when the task enters a completed state.on_commit
: An optional list of callables to run when the task’s idempotency record is committed.on_rollback
: An optional list of callables to run when the task rolls back.retry_condition_fn
: An optional callable run when a task run returns a Failed state. Should returnTrue
if the task should continue to its retry policy (e.g.retries=3
), andFalse
if the task should end as failed. Defaults toNone
, indicating the task should always continue to its retry policy.viz_return_value
: An optional value to return when the task dependency tree is visualized.asset_deps
: An optional list of upstream assets that this task depends on.
Methods:
ismethod
isclassmethod
isstaticmethod
with_options
Create a new task from the current object, updating provided options.
Args:
name
: A new name for the task.description
: A new description for the task.tags
: A new set of tags for the task. If given, existing tags are ignored, not merged.cache_key_fn
: A new cache key function for the task.cache_expiration
: A new cache expiration time for the task.task_run_name
: An optional name to distinguish runs of this task; this name can be provided as a string template with the task’s keyword arguments as variables, or a function that returns a string.retries
: A new number of times to retry on task run failure.retry_delay_seconds
: Optionally configures how long to wait before retrying the task after failure. This is only applicable ifretries
is nonzero. This setting can either be a number of seconds, a list of retry delays, or a callable that, given the total number of retries, generates a list of retry delays. If a number of seconds, that delay will be applied to all retries. If a list, each retry will wait for the corresponding delay before retrying. When passing a callable or a list, the number of configured retry delays cannot exceed 50.retry_jitter_factor
: An optional factor that defines the factor to which a retry can be jittered in order to avoid a “thundering herd”.persist_result
: A new option for enabling or disabling result persistence.result_storage
: A new storage type to use for results.result_serializer
: A new serializer to use for results.result_storage_key
: A new key for the persisted result to be stored at.timeout_seconds
: A new maximum time for the task to complete in seconds.log_prints
: A new option for enabling or disabling redirection ofprint
statements.refresh_cache
: A new option for enabling or disabling cache refresh.on_completion
: A new list of callables to run when the task enters a completed state.on_failure
: A new list of callables to run when the task enters a failed state.retry_condition_fn
: An optional callable run when a task run returns a Failed state. Should returnTrue
if the task should continue to its retry policy, andFalse
if the task should end as failed. Defaults toNone
, indicating the task should always continue to its retry policy.viz_return_value
: An optional value to return when the task dependency tree is visualized.
Returns:
- A new
Task
instance.
Examples:
Create a new task from an existing task and update the name:
Create a new task from an existing task and update the retry settings:
Use a task with updated options within a flow:
on_completion
on_failure
on_commit
on_rollback
submit
submit
submit
submit
submit
submit
Submit a run of the task to the engine.
Will create a new task run in the backing API and submit the task to the flow’s task runner. This call only blocks execution while the task is being submitted, once it is submitted, the flow function will continue executing.
This method is always synchronous, even if the underlying user function is asynchronous.
Args:
*args
: Arguments to run the task withreturn_state
: Return the result of the flow run wrapped in a Prefect State.wait_for
: Upstream task futures to wait for before starting the task**kwargs
: Keyword arguments to run the task with
Returns:
- If
return_state
is False a future allowing asynchronous access to the state of the task - If
return_state
is True a future wrapped in a Prefect State allowing asynchronous access to the state of the task
Examples:
Define a task
Run a task in a flow
Wait for a task to finish
Use the result from a task in a flow
Run an async task in an async flow
Run a sync task in an async flow
Enforce ordering between tasks that do not exchange data
map
map
map
map
map
map
map
Submit a mapped run of the task to a worker.
Must be called within a flow run context. Will return a list of futures that should be waited on before exiting the flow context to ensure all mapped tasks have completed.
Must be called with at least one iterable and all iterables must be the same length. Any arguments that are not iterable will be treated as a static value and each task run will receive the same value.
Will create as many task runs as the length of the iterable(s) in the backing API and submit the task runs to the flow’s task runner. This call blocks if given a future as input while the future is resolved. It also blocks while the tasks are being submitted, once they are submitted, the flow function will continue executing.
This method is always synchronous, even if the underlying user function is asynchronous.
Args:
*args
: Iterable and static arguments to run the tasks withreturn_state
: Return a list of Prefect States that wrap the results of each task run.wait_for
: Upstream task futures to wait for before starting the task**kwargs
: Keyword iterable arguments to run the task with
Returns:
- A list of futures allowing asynchronous access to the state of the
- tasks
Examples:
Define a task
Create mapped tasks
Wait for all mapped tasks to finish
Use the result from mapped tasks in a flow
Enforce ordering between tasks that do not exchange data
Use a non-iterable input as a constant across mapped tasks
Use unmapped
to treat an iterable argument as a constant
apply_async
Create a pending task run for a task worker to execute.
Args:
args
: Arguments to run the task withkwargs
: Keyword arguments to run the task with
Returns:
- A PrefectDistributedFuture object representing the pending task run
Examples:
Define a task
Create a pending task run for the task
Wait for a task to finish
TODO: Enforce ordering between tasks that do not exchange data
delay
An alias for apply_async
with simpler calling semantics.
Avoids having to use explicit “args” and “kwargs” arguments. Arguments will pass through as-is to the task.
Examples:
Define a task
Create a pending task run for the task
Wait for a task to finish
Use the result from a task in a flow
MaterializingTask
A task that materializes Assets.
Args:
assets
: List of Assets that this task materializes (can be str or Asset)materialized_by
: An optional tool that materialized the asset e.g. “dbt” or “spark”**task_kwargs
: All other Task arguments
Methods: