What's New in StepWright โ
๐ Version 1.1.0 - The Concurrency & Data Flow Update โ
StepWright 1.1.0 is a massive update that evolves the library from a sequential task runner into a high-performance concurrent scraping engine. This release introduces parallel execution, parameterized templates, and advanced data flow capabilities, allowing you to treat web scraping like a standard data engineering pipeline.
โก Parallel & Concurrent Execution โ
Run multiple independent scraping workflows simultaneously across different browser pages (tabs) within the exact same browser context.
ParallelTemplate: Group multiple disparateTabTemplateobjects into a single execution context. StepWright will useasyncio.gatherto execute them all at once.ParameterizedTemplate: The ultimate scaling tool. Provide a single baseTabTemplatefilled withsyntax. Pass a list of values (e.g., 50 stock ticker symbols), and StepWright will instantly clone the template 50 times, inject the specific ticker into each clone, and run all 50 extractions concurrently.
# Search for exactly 4 items at the SAME time!
task = ParameterizedTemplate(
template=search_tmpl, # Contains {{term}} in navigate/input steps
parameter_key="term",
values=["SSD", "RAM", "GPU", "CPU"]
)
results = await run_scraper([task])
print(f"Collected total: {len(results)} items")๐ Advanced Data Flows (Pipeline I/O) โ
Integrate external datasets directly into your scraping sequence using built-in file handlers, eliminating the need to write custom Python file-parsing wrappers.
readDataAction: Import seeds or targets from JSON, CSV, Excel (.xlsx), or Plain Text files into the internal collector memory.- Enhanced
foreachLoops: Theforeachloop can now iterate over string lists loaded into memory viareadData, executing a sub-workflow for every item in the file (like searching every keyword from a CSV). writeDataAction: Programmatically dump the contents of the collector back out to the filesystem in JSON, CSV, or Text format as a formal step in the workflow.
# Load a CSV into the collector under the key 'keywords'
BaseStep(id="load", action="readData", value="keywords.csv", data_type="csv", key="keywords")
# Iterate over that data directly inside the browser interaction flow!
BaseStep(
id="loop",
action="foreach",
value="{{keywords}}",
subSteps=[ ... ]
)๐ ๏ธ Custom Callbacks & Closure Hooks โ
Complete flexibility for advanced edge cases. StepWright no longer traps you in the configuration dictionary.
- Custom Actions: Execute arbitrary Python functions as a
BaseStep. The callback is invoked with the Playwrightpage, thecollector, and thestepobject itself, allowing you to execute raw Playwright commands or complex network interceptors. - Custom Formats: Don't use CSV or JSON? Read from an XML file by passing
data_type="custom"and providing a Python callback that parses the file and returns a list of targets to the engine.
def xml_reader(filepath, step_config):
import xml.etree.ElementTree as ET
return [el.text for el in ET.parse(filepath).findall('.//item')]
BaseStep(id="load-xml", action="readData", value="data.xml", data_type="custom", callback=xml_reader, key="items")What's New in StepWright 1.0.0 โ
๐ New Features โ
๐ Retry Logic โ
Automatically retry failed steps with configurable delays to handle flaky networks and dynamic content.
BaseStep(
id="click_button",
action="click",
object_type="id",
object="flaky-button",
retry=3, # Retry up to 3 times
retryDelay=1000 # Wait 1 second between retries
)Benefits:
- Handle transient failures automatically
- Reduce manual intervention for flaky operations
- Configurable retry count and delay
๐๏ธ Conditional Execution โ
Execute or skip steps based on JavaScript conditions for dynamic workflow control.
# Skip step if condition is true
BaseStep(
id="optional_click",
action="click",
skipIf="document.querySelector('.modal').classList.contains('hidden')"
)
# Execute only if condition is true
BaseStep(
id="conditional_data",
action="data",
onlyIf="document.querySelector('#dynamic-content') !== null"
)Benefits:
- Dynamic workflow adaptation
- Handle different page states gracefully
- Reduce unnecessary operations
โณ Smart Waiting โ
Wait for specific selectors to appear or change state before performing actions.
BaseStep(
id="click_after_load",
action="click",
object_type="id",
object="target-button",
waitForSelector="#loading-indicator",
waitForSelectorTimeout=5000,
waitForSelectorState="hidden" # Wait until hidden
)Benefits:
- Handle dynamic content loading
- Prevent race conditions
- More reliable element interactions
๐ Fallback Selectors โ
Provide multiple selector options for increased robustness when dealing with variable page structures.
BaseStep(
id="click_with_fallback",
action="click",
object_type="id",
object="primary-button",
fallbackSelectors=[
{"object_type": "class", "object": "btn-primary"},
{"object_type": "xpath", "object": "//button[contains(text(), 'Submit')]"}
]
)Benefits:
- Handle page structure variations
- Increase scraping success rate
- Support multiple page layouts
๐ฑ๏ธ Enhanced Click Interactions โ
Support for double-click, right-click, modifier keys, and force clicks.
# Double click
BaseStep(id="double_click", action="click", doubleClick=True)
# Right click (context menu)
BaseStep(id="right_click", action="click", rightClick=True)
# Modifier keys (Ctrl/Cmd+Click)
BaseStep(id="multi_select", action="click", clickModifiers=["Control"])
# Force click hidden elements
BaseStep(id="force_click", action="click", forceClick=True)Benefits:
- Handle complex UI interactions
- Support multi-select scenarios
- Click elements that aren't immediately visible
โจ๏ธ Input Enhancements โ
More control over input behavior with clearing and human-like typing delays.
# Clear before input (default: True)
BaseStep(
id="clear_and_input",
action="input",
clearBeforeInput=True
)
# Human-like typing with delays
BaseStep(
id="human_like_input",
action="input",
inputDelay=100 # 100ms delay between each character
)Benefits:
- Better form interaction
- Mimic human behavior
- Handle pre-filled fields correctly
๐ Advanced Data Extraction โ
Regex extraction, JavaScript transformations, required fields, and default values.
# Extract with regex
BaseStep(
id="extract_price",
action="data",
regex=r"\$(\d+\.\d+)",
regexGroup=1
)
# Transform with JavaScript
BaseStep(
id="transform_data",
action="data",
transform="value.toUpperCase().trim()"
)
# Required field with default
BaseStep(
id="get_required_data",
action="data",
required=True,
defaultValue="N/A"
)Benefits:
- Extract structured data from unstructured text
- Transform data on-the-fly
- Handle missing data gracefully
โ Element State Validation โ
Ensure elements are visible and enabled before performing actions.
BaseStep(
id="click_visible",
action="click",
requireVisible=True,
requireEnabled=True
)Benefits:
- Prevent errors from invalid interactions
- Ensure elements are ready before actions
- More reliable scraping
๐ค Human-like Behavior โ
Add random delays to mimic human interaction patterns.
BaseStep(
id="human_like_action",
action="click",
randomDelay={"min": 500, "max": 2000}
)Benefits:
- Reduce detection by anti-bot systems
- More natural browsing patterns
- Better for testing user interactions
๐ New Page Actions โ
Comprehensive set of page manipulation and information retrieval actions.
Reload Page โ
BaseStep(id="reload", action="reload", waitUntil="networkidle")Get Current URL โ
BaseStep(id="get_url", action="getUrl", key="current_url")Get Page Title โ
BaseStep(id="get_title", action="getTitle", key="page_title")Meta Tags Management โ
# Get specific meta tag
BaseStep(id="get_description", action="getMeta", object="description", key="meta")
# Get all meta tags
BaseStep(id="get_all_meta", action="getMeta", key="all_meta")Cookies Management โ
# Get all cookies
BaseStep(id="get_cookies", action="getCookies", key="cookies")
# Get specific cookie
BaseStep(id="get_session", action="getCookies", object="session_id", key="session")
# Set cookie
BaseStep(id="set_cookie", action="setCookies", object="preference", value="dark_mode")LocalStorage & SessionStorage โ
# Get/Set localStorage
BaseStep(id="get_storage", action="getLocalStorage", object="key", key="value")
BaseStep(id="set_storage", action="setLocalStorage", object="key", value="value")
# Get/Set sessionStorage
BaseStep(id="get_session", action="getSessionStorage", object="key", key="value")
BaseStep(id="set_session", action="setSessionStorage", object="key", value="value")Viewport Operations โ
# Get viewport size
BaseStep(id="get_viewport", action="getViewportSize", key="viewport")
# Set viewport size
BaseStep(id="set_viewport", action="setViewportSize", value="1920x1080")Enhanced Screenshot โ
# Full page screenshot
BaseStep(id="screenshot", action="screenshot", value="./page.png", data_type="full")
# Element screenshot
BaseStep(id="element_screenshot", action="screenshot", object_type="id", object="content")Wait for Selector (Explicit Action) โ
BaseStep(
id="wait_for_element",
action="waitForSelector",
object_type="id",
object="dynamic-content",
value="visible",
wait=5000
)Evaluate JavaScript โ
BaseStep(
id="custom_js",
action="evaluate",
value="() => document.querySelector('.counter').textContent",
key="counter_value"
)Benefits:
- Complete page manipulation capabilities
- Extract comprehensive page information
- Support advanced testing scenarios
๐ก๏ธ Enhanced Error Handling โ
New options for graceful error handling and continuation.
# Skip step if error occurs
BaseStep(
id="optional_step",
action="click",
skipOnError=True
)
# Continue even if element not found
BaseStep(
id="optional_data",
action="data",
continueOnEmpty=True
)Benefits:
- More resilient scraping workflows
- Handle optional elements gracefully
- Reduce workflow failures
๐ฆ Code Organization Improvements โ
Modular Handler Architecture โ
Action handlers have been reorganized into a dedicated handlers/ subfolder for better maintainability:
data_handlers.py- Data extraction logic with transformationsfile_handlers.py- File download and PDF operationsloop_handlers.py- Foreach loops and new tab/window handlingpage_actions.py- Page-related actions (reload, getUrl, cookies, storage, etc.)
Benefits:
- Better code organization
- Easier maintenance and testing
- Clear separation of concerns
๐งช Testing Enhancements โ
Comprehensive Test Coverage โ
- New test file
test_new_features.pywith 28+ test cases - Enhanced test page
test_page_enhanced.htmlwith various scenarios - Tests cover all new features including edge cases
Benefits:
- Higher code quality
- Regression prevention
- Confidence in new features
๐ API Changes โ
Backward Compatibility โ
All new features are 100% backward compatible. Existing code will continue to work without modifications.
New Optional Fields โ
All new BaseStep fields are optional, maintaining backward compatibility:
retry,retryDelayskipIf,onlyIfwaitForSelector,waitForSelectorTimeout,waitForSelectorStatefallbackSelectorsclickModifiers,doubleClick,forceClick,rightClickclearBeforeInput,inputDelayrequired,defaultValue,regex,regexGroup,transformtimeout,waitUntilrandomDelayrequireVisible,requireEnabledskipOnError,continueOnEmpty
New Actions โ
Added to the action field type:
reload,getUrl,getTitle,getMetagetCookies,setCookiesgetLocalStorage,setLocalStoragegetSessionStorage,setSessionStoragegetViewportSize,setViewportSizescreenshot,waitForSelector,evaluate
๐ฏ Use Cases โ
Real-World Scenarios Enabled โ
E-commerce Scraping
- Handle dynamic product loading with
waitForSelector - Extract prices with regex:
regex=r"\$(\d+\.\d+)" - Retry flaky add-to-cart buttons
- Handle dynamic product loading with
Form Automation
- Fill forms with human-like typing delays
- Handle conditional form fields with
skipIf/onlyIf - Validate form state before submission
Social Media Scraping
- Handle infinite scroll with fallback selectors
- Extract metadata with
getMeta - Manage authentication with cookies/localStorage
Testing Scenarios
- Test different viewport sizes
- Capture screenshots at different stages
- Evaluate custom JavaScript for assertions
Robust Scraping
- Retry failed operations automatically
- Handle missing elements gracefully
- Adapt to different page layouts
๐ง Migration Guide โ
No Migration Required! โ
Since all new features are optional and backward compatible, no code changes are required.
Optional: Adopt New Features โ
You can gradually adopt new features as needed:
# Old way (still works)
BaseStep(id="click", action="click", object_type="id", object="button")
# New way (with enhancements)
BaseStep(
id="click",
action="click",
object_type="id",
object="button",
retry=2,
waitForSelector="#loading",
requireVisible=True
)Version: 1.0.0
Release Date: Oct 31, 2025 Compatibility: Python 3.8+
