Manage cloud-hosted data
Move data to and from cloud provider storage.
Learn how to use Prefect to move data to and from AWS, Azure, and GCP blob storage.
Prerequisites
- Prefect installed
- Authenticated with Prefect Cloud (or self-hosted Prefect server instance)
- A cloud provider account
Install relevant Prefect integration library
In the CLI, install the Prefect integration library for your cloud provider:
prefect-aws provides blocks for interacting with AWS services.
pip install -U prefect-aws
Register the block types
Register the new block types with Prefect Cloud (or with your self-hosted Prefect server instance):
prefect block register -m prefect_aws
A confirmation message in the CLI shows that several block types were registered. The UI shows the new block types listed.
Create a storage bucket
Create a storage bucket in the cloud provider account. Ensure the bucket is publicly accessible, or create a user or service account with the appropriate permissions to fetch and write data to the bucket.
Create a credentials block
If the bucket is private, there are several options to authenticate:
- At deployment runtime, ensure the runtime environment is authenticated.
- Create a block with configuration details and reference it when creating the storage block.
If you saved credential details in a block, you can use a credentials block specific to the cloud provider or use a more generic secret block. You can create blocks through the UI or Python code.
The example below uses Python code to create a credentials block for your cloud provider.
Credentials safety
Don’t store credential values in public locations such as public git platform repositories. The examples below use environment variables to store credential values.
import os
from prefect_aws import AwsCredentials
my_aws_creds = AwsCredentials(
aws_access_key_id="123abc",
aws_secret_access_key=os.environ.get("MY_AWS_SECRET_ACCESS_KEY"),
)
my_aws_creds.save(name="my-aws-creds-block", overwrite=True)
Run the code to create the block. You should see a message that the block was created.
Create a storage block
You can create a block for the chosen cloud provider using Python code or the UI. This example uses Python code.
Note that the S3Bucket
block is not the same as the S3
block that ships with Prefect.
The S3Bucket
block used in this example is part of the prefect-aws
library and provides additional capabilities.
Next, reference the credentials block created above.
from prefect_aws import S3Bucket
s3bucket = S3Bucket.create(
bucket="my-bucket-name",
credentials="my-aws-creds-block"
)
s3bucket.save(name="my-s3-bucket-block", overwrite=True)
Run the code to create the block. You should see a message that the block was created.
Write data
Use your new block inside a flow to write data to your cloud provider.
from pathlib import Path
from prefect import flow
from prefect_aws.s3 import S3Bucket
@flow()
def upload_to_s3():
"""Flow function to upload data"""
path = Path("my_path_to/my_file.parquet")
aws_block = S3Bucket.load("my-s3-bucket-block")
aws_block.upload_from_path(from_path=path, to_path=path)
if __name__ == "__main__":
upload_to_s3()
Read data
Use your block to read data from your cloud provider inside a flow.
from prefect import flow
from prefect_aws import S3Bucket
@flow
def download_from_s3():
"""Flow function to download data"""
s3_block = S3Bucket.load("my-s3-bucket-block")
s3_block.get_directory(
from_path="my_path_to/my_file.parquet",
local_path="my_path_to/my_file.parquet"
)
if __name__ == "__main__":
download_from_s3()
Next steps
Check out the prefect-aws
, prefect-azure
, and prefect-gcp
docs to see additional methods for interacting with cloud storage providers.
Each library also contains blocks for interacting with other cloud-provider services.
Was this page helpful?