Your First Scraper
Let's build a simple scraper to extract quotes from quotes.toscrape.com.
Create a file named my_scraper.py and add the following code:
python
import asyncio
from stepwright import run_scraper, TabTemplate, BaseStep
async def main():
# 1. Define your scraping workflow
templates = [
TabTemplate(
tab="quotes",
steps=[
# Navigate to website
BaseStep(
id="navigate",
action="navigate",
value="https://quotes.toscrape.com"
),
# Extract quotes using a foreach loop
BaseStep(
id="extract_quotes",
action="foreach",
object_type="class",
object="quote",
subSteps=[
BaseStep(
id="get_text",
action="data",
object_type="class",
object="text",
key="quote_text",
data_type="text"
),
BaseStep(
id="get_author",
action="data",
object_type="class",
object="author",
key="author",
data_type="text"
)
]
)
]
)
]
# 2. Run the engine
results = await run_scraper(templates)
# 3. Print results (automatically aggregated!)
for i, quote in enumerate(results, 1):
print(f"{i}. \"{quote['quote_text']}\" - {quote['author']}")
if __name__ == "__main__":
asyncio.run(main())Run the script from your terminal:
bash
python my_scraper.pyUnderstanding the Code
TabTemplate: We define a single "Tab" meaning this entire sequence will run inside one browser page.navigateaction: Instructs the browser to go to the URL. Remember, StepWright handles thenetworkidlewaits for you.foreachaction: We tell StepWright to find every element with the classquoteand loop over them.dataaction: Inside the loop, we extract the inner text of the.textand.authorelements, assigning them dictionary keysquote_textandauthor.
Notice how you didn't have to write any manual browser launching code, wait commands, or empty array initializations!
