Scrape and Extract
Extract data from a single URL or a small set of elements.
Trigger
The user wants to:
- •Get the text of an element (h1, p, div, etc.)
- •Extract all links from a page
- •Get image URLs or alt text
- •Extract table data into JSON or CSV
- •Pull specific attributes like
href,src, ordata-* - •Get clean Markdown or stripped text from a page
- •Extract structured data (JSON-LD, metadata, forms)
Workflow
- •
Navigate and Extract: Use
webscraper textorwebscraper extractwith the--urlflag.bashwebscraper text "h1" --url "https://example.com"
- •
Format Selection: Choose between
json,csv,plain, ortableusing--format.bashwebscraper extract links --url "https://example.com" --format csv
- •
Multiple Selectors: Use
batch selectorsfor pulling multiple different elements at once.bashwebscraper batch selectors "h1,p,a" --url "https://example.com"
- •
Structured Content: Use
extractsubcommands for specialized extractions.bashwebscraper extract markdown --url "https://example.com" webscraper extract schema --url "https://example.com" webscraper extract forms --url "https://example.com" webscraper extract meta --url "https://example.com" webscraper extract xpath "//div/@class" --url "https://example.com" webscraper extract regex "\d{3}-\d{4}" --url "https://example.com" - •
Proxy/User-Agent: Use global options for sites that block default requests.
bashwebscraper --user-agent "MyBot/2.0" --proxy "http://proxy:8080" extract links --url "URL"
Output
- •Extracted data in the requested format (stdout)
- •Error message with suggestion if selectors are not found or navigation fails