Simple web scraper
Learn how to scrape article content from web pages with Prefect tasks, retries, and automatic logging.
This example shows how Prefect enhances regular Python code without getting in its way. You’ll write code exactly as you normally would, and Prefect’s decorators add production-ready features with zero boilerplate.
In this example you will:
- Write regular Python functions for web scraping
- Add production features (retries, logging) with just two decorators:
@task
- Turn any function into a retryable, observable unit@flow
- Compose tasks into a reliable pipeline
- Keep your code clean and Pythonic - no framework-specific patterns needed
The Power of Regular Python
Notice how the code below is just standard Python with two decorators. You could remove the decorators and the code would still work - Prefect just makes it more resilient.
- Regular Python functions? ✓
- Standard libraries (requests, BeautifulSoup)? ✓
- Normal control flow (if/else, loops)? ✓
- Prefect’s magic? Just two decorators! ✓
Defining tasks
We separate network IO from parsing so both pieces can be retried or cached independently.
Defining a flow
@flow
elevates a function to a flow – the orchestration nucleus that can call
tasks, other flows, and any Python you need. We enable log_prints=True
so each
print()
surfaces in Prefect Cloud or the local API.
Run it!
Feel free to tweak the URL list or the regex and re-run. Prefect hot-reloads your code instantly – no container builds required.
What just happened?
When you ran this script, Prefect did a few things behind the scenes:
- Turned each decorated function into a task run or flow run with structured state.
- Applied retry logic to the network call – a flaky connection would auto-retry up to 3 times.
- Captured all
print()
statements so you can view them in the Prefect UI or logs. - Passed the HTML between tasks in memory – no external storage required.
Yet the code itself is standard Python. You could copy-paste the body of fetch_html
or
parse_article
into a notebook and they’d work exactly the same.
Key Takeaways
- Less boilerplate, more Python – You focus on the scraping logic, Prefect adds production features.
- Observability out of the box – Every run is tracked, making debugging and monitoring trivial.
- Portability – The same script runs on your laptop today and on Kubernetes tomorrow.
- Reliability – Retries, timeouts, and state management are just one decorator away.
Happy scraping – and happy orchestrating! 🎉