TableExtraction
Fetch a web page, extract the main data table, and save as CSV.
Usage
python TableExtraction.py --url "<URL>" [--timeout-ms 15000]
Parameters
- •--url (required): The web page URL to analyze
- •--timeout-ms (optional, default 15000): Network timeout in milliseconds
Requirements
- •Python 3.8+
- •Dependencies: requests, beautifulsoup4, lxml, pandas
Response
Returns JSON with fields:
- •url: The final URL (after redirects)
- •csv_path: Path to the saved CSV file
- •rows: Number of data rows extracted (excluding header)
- •status: "ok" or "error"
- •error: Error message (if status is "error")
How It Works
- •Fetches the webpage HTML
- •Uses pandas to find all HTML tables
- •Selects the largest table (by row × column count)
- •Converts to CSV format
- •Saves to datastore as {PageName}_table.csv
Logging
CSV files are saved to: {OPENCLAW_OUTPUT_ROOT}/datastore/YYYY/MM/DD/{PageName}_table.csv
If OPENCLAW_OUTPUT_ROOT is not set, defaults to ~/.openclaw/data
Examples
Extract table from FTSE-100 page: python TableExtraction.py --url "https://www.hl.co.uk/shares/stock-market-summary/ftse-100"
Custom timeout: python TableExtraction.py --url "https://example.com/data" --timeout-ms 30000