API-sourced ETL with pandas
Build a small ETL pipeline that fetches JSON from a public API, transforms it with pandas, and writes a CSV – all orchestrated by Prefect.
Prefect turns everyday Python into production-grade workflows with zero boilerplate.
When you pair Prefect with pandas you get a versatile ETL toolkit:
- Python supplies a rich ecosystem of connectors and libraries for virtually every data source and destination.
- pandas gives you lightning-fast, expressive transforms that turn raw bits into tidy DataFrames.
- Prefect wraps the whole thing in battle-tested orchestration: automatic retries, scheduling, and observability , so you don’t have to write reams of defensive code.
The result? You spend your time thinking about what you want to build, not how to keep it alive. Point this trio at any API, database, or file system and it will move the data where you need it while handling the messy details for you.
In this article you will:
- Extract JSON from the public Dev.to REST API.
- Transform it into an analytics-friendly pandas
DataFrame
. - Load the result to a CSV – ready for your BI tool of choice.
This example demonstrates these Prefect features:
@task
– wrap any function in retries & observability.log_prints
– surfaceprint()
logs automatically.- Automatic retries with back-off, no extra code.
Rapid analytics from a public API
Your data team wants engagement metrics from Dev.to articles, daily. You need a quick, reliable pipeline that anyone can run locally and later schedule in Prefect Cloud.
The Solution
Write three small Python functions (extract, transform, load), add two decorators, and let Prefect handle retries, concurrency, and logging. No framework-specific hoops, just Python the way you already write it.
For more background on Prefect’s design philosophy, check out our blog post: Built to Fail: Design Patterns for Resilient Data Pipelines
Watch as Prefect orchestrates the ETL pipeline with automatic retries and logging. The flow fetches multiple pages of articles, transforms them into a structured DataFrame, and saves the results to CSV. This pattern is highly adaptable - use it to build pipelines that move data between any sources and destinations:
- APIs → Databases (Postgres, MySQL, etc.)
- APIs → Cloud Storage (S3, GCS, Azure)
- APIs → Data Warehouses (Snowflake, BigQuery, Redshift, etc.)
- And many more combinations
Code walkthrough
- Imports – Standard libraries for HTTP + pandas.
fetch_page
task – Downloads a single page with retries.to_dataframe
task – Normalises JSON to a pandas DataFrame.save_csv
task – Persists the DataFrame and logs a peek.etl
flow – Orchestrates the tasks sequentially for clarity.- Execution – A friendly
if __name__ == "__main__"
with some basic configurations kicks things off.
Extract – fetch a single page of articles
Transform – convert list[dict] ➜ pandas DataFrame
Load – save DataFrame to CSV (or print preview)
Flow – orchestrate the ETL with optional concurrency
Run it!
What just happened?
- Prefect registered a flow run and three task runs (
fetch_page
,to_dataframe
,save_csv
). - Each
fetch_page
call downloaded a page and, if it failed, would automatically retry. - The raw JSON pages were combined into a single pandas DataFrame.
- The CSV was written to disk and a preview printed locally (the flow’s
log_prints=True
flag logs messages inside the flow body; prints inside tasks are displayed in the console). - You can view run details, timings, and logs in the Prefect UI.
Key Takeaways
- Pure Python, powered-up – Decorators add retries and logging without changing your logic.
- Observability first – Each task run (including every page fetch) is logged and can be viewed in the UI if you have a Prefect Cloud account or a local Prefect server running.
- Composable – Swap
save_csv
for a database loader or S3 upload with one small change. - Reusable – Import the
etl
flow and run it with different parameters from another flow.
Prefect lets you focus on data, not orchestration plumbing – happy ETL-ing! 🎉