Experimental: This feature is experimental and may change in the future.
Prerequisites
- Prefect Client Version 3.1.12 or later
- Prefect Cloud account (SLAs are only available in Prefect Cloud)
- (If using Terraform) Prefect Terraform Provider version 2.22.0 or later
Service Level Agreements
Service Level Agreements (SLAs) help you set and monitor performance standards for your data stack. By establishing specific thresholds for flow runs on your Deployments, you can automatically detect when your system isn’t meeting expectations. When you set up an SLA, you define specific performance criteria - such as a maximum runtime of 10 minutes for a flow. If a flow run exceeds this threshold, the system generates an alert event. You can then use these events to trigger automated responses, whether that’s sending notifications to your team or initiating other corrective actions through automations.Defining SLAs
To define an SLA you can add it to the deployment through aprefect.yaml
file, a .deploy
method, or the CLI:
Defining SLAs in your prefect.yaml file
Defining SLAs in your prefect.yaml file
prefect.yaml SLA
Defining SLAs using a .deploy method
Defining SLAs using a .deploy method
.deploy SLA
Defining SLAs with the Prefect CLI
Defining SLAs with the Prefect CLI
CLI SLA
Defining SLAs with the Terraform Provider
Defining SLAs with the Terraform Provider
Terraform Provider
Types of SLAs
Time to Completion SLA
A Time to Completion SLA describes how long flow runs should take and the severity of going over that duration. The SLA is triggered when a flow run takes longer than the specified duration to complete. The Time to Completion SLA requires one unique parameter:duration
: The maximum allowed duration in seconds before the SLA is violated
duration
of 600 seconds (10 minutes) the backend will emit an prefect.sla.violation
event if a flow run does not complete within that timeframe.
Frequency SLA
A Frequency SLA describes the interval a deployment should run at and the severity of missing that interval. The Frequency SLA requires one unique parameter:stale_after
: The maximum allowed time between flow runs (e.g.1 hour
for hourly jobs)
stale_after
of 1 hour.
This SLA triggers when more than an hour passes between Completed
flow runs.
Lateness SLA
A Lateness SLA describes the severity of a flow run failing to start at its scheduled time. The Lateness SLA requires one unique parameter:within
: The amount of startup time allotted to a flow run before the SLA is violated.
Monitoring SLAs
You can monitor SLAs in the Prefect Cloud UI. On the runs page you can see the SLA status in the top level metrics:
Setting up an automation
To set up an automation to notify a team or to take other actions when an SLA is triggered, you can use the automations feature. To create the automation first you’ll need to create a trigger.- Choose trigger type ‘Custom’.
- Choose any event matching:
prefect.sla.violation
- For “From the Following Resources” choose:
prefect.flow-run.*
