Prefect 3 represents a significant leap forward in workflow orchestration, bringing a host of new features, performance improvements, and expanded capabilities to enhance your data engineering experience. Let’s explore the exciting new additions and enhancements in this release.

Most Prefect 2 users can upgrade without changes to their existing workflows. Please review the upgrade guide for more information.

Open source events and automation system

One of the most anticipated features in Prefect 3 is the introduction of the events and automation system to the open-source package. Previously exclusive to Prefect Cloud, this powerful system now allows all users to create sophisticated, event-driven workflows.

With this new capability, you can trigger actions based on specific event payloads, cancel runs if certain conditions aren’t met, or automate workflow runs based on external events. For instance, you could initiate a data processing pipeline automatically when a new file lands in an S3 bucket. The system also enables you to receive notifications for various system health events, giving you greater visibility and control over your workflows.

New transactional interface

Another major addition in Prefect 3 is the new transactional interface. This powerful feature makes it easier than ever to build resilient and idempotent pipelines. With the transactional interface, you can group tasks into transactions, automatically roll back side effects on failure, and significantly improve your pipeline’s idempotency and resilience.

For example, you can define rollback behaviors for your tasks, ensuring that any side effects are cleanly reversed if a transaction fails. This is particularly useful for maintaining data consistency in complex workflows involving multiple steps or external systems.

Flexible task execution

Prefect 3 has no restrictions on where tasks can run. Tasks can be nested within other tasks, allowing for more flexible and modular workflows; they can also be called outside of flows, essentially enabling Prefect to function as a background task service. You can now run tasks autonomously, apply them asynchronously, or delay their execution as needed. This flexibility opens up new possibilities for task management and execution strategies in your data pipelines.

Enhanced client-side engine

Prefect 3 comes with a thoroughly reworked client-side engine that brings several improvements to the table. You can now nest tasks within other tasks, adding a new level of modularity to your workflows. The engine also supports generator tasks, allowing for more flexible and efficient handling of iterative processes.

One of the most significant changes is that all code now runs on the main thread by default. This change improves performance and leads to more intuitive behavior, especially when dealing with shared resources or non-thread-safe operations.

Improved artifacts and variables

Prefect 3 enhances the artifacts system with new types, including progress bars and image artifacts. These additions allow for richer, more informative task outputs, improving the observability of your workflows.

The variables system has also been upgraded to support arbitrary JSON, not just strings. This expansion allows for more complex and structured data to be stored and retrieved as variables, increasing the flexibility of your workflow configurations.

Workers

Workers were first introduced in Prefect 2 as next-generation agents, and are now standard in Prefect 3. Workers offer a stronger governance model for infrastructure, improved monitoring of jobs and work pool/queue health, and more flexibility in choosing compute layers, resulting in a more robust and scalable solution for managing the execution of your workflows across various environments.

Performance enhancements

Prefect 3 doesn’t just bring new features; it also delivers significant performance improvements. Users running massively parallel workflows on distributed systems such as Dask and Ray will notice substantial speedups. In some benchmark cases, we’ve observed up to a 98% reduction in runtime overhead. These performance gains translate directly into faster execution times and more efficient resource utilization for your data pipelines.