Learn how to use Prefect to move data to and from AWS, Azure, and GCP blob storage.

Prerequisites

Install relevant Prefect integration library

In the CLI, install the Prefect integration library for your cloud provider:

prefect-aws provides blocks for interacting with AWS services.

pip install -U prefect-aws

Register the block types

Register the new block types with Prefect Cloud (or with your self-hosted Prefect server instance):

prefect block register -m prefect_aws  

A confirmation message in the CLI shows that several block types were registered. The UI shows the new block types listed.

Create a storage bucket

Create a storage bucket in the cloud provider account. Ensure the bucket is publicly accessible, or create a user or service account with the appropriate permissions to fetch and write data to the bucket.

Create a credentials block

If the bucket is private, there are several options to authenticate:

  • At deployment runtime, ensure the runtime environment is authenticated.
  • Create a block with configuration details and reference it when creating the storage block.

If you saved credential details in a block, you can use a credentials block specific to the cloud provider or use a more generic secret block. You can create blocks through the UI or Python code.

The example below uses Python code to create a credentials block for your cloud provider.

Credentials safety

Don’t store credential values in public locations such as public git platform repositories. The examples below use environment variables to store credential values.

import os
from prefect_aws import AwsCredentials


my_aws_creds = AwsCredentials(
    aws_access_key_id="123abc",
    aws_secret_access_key=os.environ.get("MY_AWS_SECRET_ACCESS_KEY"),
)
my_aws_creds.save(name="my-aws-creds-block", overwrite=True)

Run the code to create the block. You should see a message that the block was created.

Create a storage block

You can create a block for the chosen cloud provider using Python code or the UI. This example uses Python code.

Note that the S3Bucket block is not the same as the S3 block that ships with Prefect. The S3Bucket block used in this example is part of the prefect-aws library and provides additional capabilities.

Next, reference the credentials block created above.

from prefect_aws import S3Bucket


s3bucket = S3Bucket.create(
    bucket="my-bucket-name",
    credentials="my-aws-creds-block"
    )
s3bucket.save(name="my-s3-bucket-block", overwrite=True)

Run the code to create the block. You should see a message that the block was created.

Write data

Use your new block inside a flow to write data to your cloud provider.

from pathlib import Path
from prefect import flow
from prefect_aws.s3 import S3Bucket


@flow()
def upload_to_s3():
    """Flow function to upload data"""
    path = Path("my_path_to/my_file.parquet")
    aws_block = S3Bucket.load("my-s3-bucket-block")
    aws_block.upload_from_path(from_path=path, to_path=path)


if __name__ == "__main__":
    upload_to_s3()

Read data

Use your block to read data from your cloud provider inside a flow.

from prefect import flow
from prefect_aws import S3Bucket


@flow
def download_from_s3():
    """Flow function to download data"""
    s3_block = S3Bucket.load("my-s3-bucket-block")
    s3_block.get_directory(
        from_path="my_path_to/my_file.parquet", 
        local_path="my_path_to/my_file.parquet"
    )


if __name__ == "__main__":
    download_from_s3()

Next steps

Check out the prefect-aws, prefect-azure, and prefect-gcp docs to see additional methods for interacting with cloud storage providers. Each library also contains blocks for interacting with other cloud-provider services.