Apify Scrapers
Overview
Scrape content from major social platforms using Apify actors. Each platform has optimized settings for cost and quality.
Quick Decision Tree
What do you want to scrape?
│
├── Social Media Posts
│ ├── Twitter/X → references/twitter.md
│ │ └── Script: scripts/scrape_twitter_ai_trends.py
│ │
│ ├── Reddit → references/reddit.md
│ │ └── Script: scripts/scrape_reddit_ai_tech.py
│ │
│ ├── LinkedIn → references/linkedin.md
│ │ └── Script: scripts/scrape_linkedin_posts.py
│ │
│ ├── Instagram → references/instagram.md
│ │ └── Script: scripts/scrape_instagram.py
│ │ └── Modes: profile, posts, hashtag, reels, comments
│ │
│ ├── Facebook → references/facebook.md
│ │ └── Script: scripts/scrape_facebook.py
│ │ └── Modes: page, posts, reviews, groups, marketplace
│ │
│ ├── TikTok → references/multi-platform.md
│ │ └── Script: scripts/scrape_multi_platform.py
│ │
│ └── YouTube → references/multi-platform.md
│ └── Script: scripts/scrape_multi_platform.py
│
├── Business/Places
│ ├── Google Maps businesses → references/google-maps.md
│ │ └── Script: scripts/scrape_google_maps.py
│ │ └── Modes: search, place, reviews
│ │
│ └── Contact info from websites → references/contact-enrichment.md
│ └── Script: scripts/scrape_contact_info.py
│ └── Extract: emails, phone numbers, social profiles
│
├── Auto-detect URL type → references/url-detect.md
│ └── Script: scripts/scrape_content_by_url.py
│
├── Trend Analysis (NEW)
│ └── Enriched trend analysis → workflows/trend-analysis.md
│ └── Script: scripts/analyze_trends.py
│ └── Features: velocity scoring, lifecycle staging, opportunity scoring
│
└── Workflows (multi-step)
├── Lead generation → workflows/lead-generation.md
├── Influencer discovery → workflows/influencer-discovery.md
├── Competitor analysis → workflows/competitor-intel.md
├── Trend analysis → workflows/trend-analysis.md
└── Competitor Ads Intelligence (NEW) → workflows/competitor-ads.md
└── Script: scripts/scrape_competitor_ads.py
└── Platforms: Facebook Ads Library, Google Ads Transparency
└── Features: Spend estimates, creative analysis, benchmarking
Environment Setup
# Required in .env APIFY_TOKEN=apify_api_xxxxx
Get your API key: https://console.apify.com/account/integrations
Common Usage Patterns
Scrape Twitter Trends
python scripts/scrape_twitter_ai_trends.py --query "AI agents" --max-tweets 50
Scrape Reddit Discussions
python scripts/scrape_reddit_ai_tech.py --subreddits "MachineLearning,LocalLLaMA" --max-posts 100
Scrape LinkedIn Author
python scripts/scrape_linkedin_posts.py author "https://linkedin.com/in/username" --max-posts 30
Auto-detect and Scrape URL
python scripts/scrape_content_by_url.py "https://x.com/user/status/123456"
Scrape Instagram Profile
python scripts/scrape_instagram.py profile "https://instagram.com/username" --max-posts 20
Scrape Instagram Hashtag
python scripts/scrape_instagram.py hashtag "#artificialintelligence" --max-posts 50
Scrape Instagram Reels
python scripts/scrape_instagram.py reels "https://instagram.com/username" --max-reels 30
Scrape Facebook Page
python scripts/scrape_facebook.py page "https://facebook.com/pagename" --max-posts 50
Scrape Facebook Reviews
python scripts/scrape_facebook.py reviews "https://facebook.com/pagename" --max-reviews 100
Scrape Facebook Marketplace
python scripts/scrape_facebook.py marketplace "laptops in san francisco" --max-items 30
Scrape Google Maps Businesses
python scripts/scrape_google_maps.py search "AI consulting firms in New York" --max-results 50
Scrape Google Maps Reviews
python scripts/scrape_google_maps.py reviews "ChIJN1t_tDeuEmsRUsoyG83frY4" --max-reviews 100
Extract Contact Info from Websites
python scripts/scrape_contact_info.py "https://example.com" --depth 2
Bulk Contact Enrichment
python scripts/scrape_contact_info.py --urls-file companies.txt --output contacts.json
Scrape Competitor Ads (Single Competitor)
python scripts/scrape_competitor_ads.py "Nike" --platforms facebook google --country US --days 30
Compare Multiple Competitors' Ads
python scripts/scrape_competitor_ads.py "Nike" "Adidas" "Puma" --compare --output comparison.json
Discover Advertisers by Keyword
python scripts/scrape_competitor_ads.py --search "running shoes" --country US --max-ads 200
Filter Competitor Ads by Media Type
python scripts/scrape_competitor_ads.py "Netflix" "Disney+" --platforms facebook --media-types video --days 7
Analyze Trends (NEW)
# Analyze specific topic with enrichments python scripts/analyze_trends.py "artificial intelligence" --sources google instagram tiktok --days 90 # Discover trending topics in category python scripts/analyze_trends.py --category technology --discover --top 50 # Compare multiple trends python scripts/analyze_trends.py "AI" "blockchain" "metaverse" --compare # Export HTML trend report python scripts/analyze_trends.py "sustainable fashion" --format html --output trend_report.html
Cost Estimates
| Platform | Actor | Cost per Item |
|---|---|---|
| kaitoeasyapi/twitter-x-data-tweet-scraper | ~$0.00025 | |
| trudax/reddit-scraper | ~$0.001-0.005 | |
| harvestapi/linkedin-post-search | ~$0.01-0.05 | |
| YouTube | streamers/youtube-scraper | ~$0.01-0.05 |
| TikTok | clockworks/tiktok-scraper | ~$0.005 |
| Instagram (profile) | apify/instagram-profile-scraper | ~$0.005 |
| Instagram (posts) | apify/instagram-post-scraper | ~$0.002-0.005 |
| Instagram (hashtag) | apify/instagram-hashtag-scraper | ~$0.002-0.005 |
| Instagram (reels) | apify/instagram-reel-scraper | ~$0.005-0.01 |
| Instagram (comments) | apify/instagram-comment-scraper | ~$0.001-0.003 |
| Facebook (page) | apify/facebook-pages-scraper | ~$0.005-0.01 |
| Facebook (posts) | apify/facebook-posts-scraper | ~$0.003-0.005 |
| Facebook (reviews) | apify/facebook-reviews-scraper | ~$0.002-0.005 |
| Facebook (groups) | apify/facebook-groups-scraper | ~$0.005-0.01 |
| Facebook (marketplace) | apify/facebook-marketplace-scraper | ~$0.005-0.01 |
| Google Maps (search) | compass/crawler-google-places | ~$0.01-0.02 |
| Google Maps (place) | compass/google-maps-business-scraper | ~$0.01 |
| Google Maps (reviews) | compass/google-maps-reviews-scraper | ~$0.003-0.005 |
| Contact Enrichment | lukaskrivka/contact-info-scraper | ~$0.01-0.03 |
| Google Trends | apify/google-trends-scraper | ~$0.01 |
| Trend Analysis (multi) | Multiple actors | ~$0.50-1.50/run |
| Facebook Ads Library | apify/facebook-ads-scraper | ~$0.75/1K ads |
| Facebook Ads (alt) | curious_coder/facebook-ads-library-scraper | ~$0.50/1K ads |
| Google Ads Transparency | lexis-solutions/google-ads-scraper | ~$1.00/1K ads |
| Google Ads (alt) | xtech/google-ad-transparency-scraper | ~$0.80/1K ads |
Output Location
All scraped data saves to .tmp/ with timestamped filenames:
- •
.tmp/twitter_ai_trends_YYYYMMDD.json - •
.tmp/reddit_ai_tech_YYYYMMDD.json - •
.tmp/linkedin_posts_YYYYMMDD_HHMMSS.json
Security Notes
Credential Handling
- •Store
APIFY_TOKENin.envfile (never commit to git) - •Rotate API tokens periodically via Apify Console
- •Never log or print API tokens in script output
- •Use environment variables, not hardcoded values
Data Privacy
- •Scraped data contains only publicly available content
- •Social media posts may include PII (names, handles, profile info)
- •Data is stored locally in
.tmp/directory - •No data is retained by Apify after actor run completes
- •Consider data minimization - only scrape what you need
Access Scopes
- •Apify tokens have full account access (no granular scopes)
- •Use separate Apify accounts for different projects if needed
- •Monitor usage via Apify Console dashboard
Compliance Considerations
- •Terms of Service: Respect each platform's ToS (Twitter, Reddit, LinkedIn)
- •Rate Limiting: Actors have built-in rate limiting to avoid bans
- •Robots.txt: Some actors may bypass robots.txt - use responsibly
- •GDPR: Scraped PII may be subject to GDPR if EU residents
- •Ethical Use: Only scrape public data; never bypass authentication
- •Proxy Ethics: Residential proxies should be used ethically
Troubleshooting
Common Issues
Issue: Actor run failed
Symptoms: Script terminates with "Actor run failed" or timeout error Cause: Invalid actor ID, insufficient proxy credits, or actor configuration issue Solution:
- •Verify the actor ID is correct in the script
- •Check Apify Console for actor run logs
- •Ensure proxy settings match actor requirements
- •Try running with default proxy settings first
Issue: Empty results returned
Symptoms: Script completes but returns 0 items Cause: Content blocked by platform, invalid query, or proxy being detected Solution:
- •Try a different proxy type (residential vs datacenter)
- •Simplify the search query
- •Reduce the number of results requested
- •Check if the platform is blocking scraping attempts
Issue: Rate limited by platform
Symptoms: Script fails with 429 errors or "rate limited" messages Cause: Too many requests in a short time period Solution:
- •Add delays between requests (actor settings)
- •Reduce concurrent requests
- •Use proxy rotation
- •Wait and retry after a cooldown period
Issue: Invalid API token
Symptoms: Authentication error or "invalid token" message Cause: Token expired, revoked, or incorrectly set Solution:
- •Regenerate API token in Apify Console
- •Verify token is correctly set in
.envfile - •Check for leading/trailing whitespace in token
- •Ensure
APIFY_TOKENenvironment variable is loaded
Issue: Proxy connection errors
Symptoms: Connection timeout or proxy errors Cause: Proxy pool exhausted or geo-restriction issues Solution:
- •Switch proxy type (basic, residential, or datacenter)
- •Verify proxy credit balance in Apify Console
- •Try a different proxy country/region
- •Disable proxy to test if that's the root cause
Resources
Platform References
- •references/twitter.md - Twitter/X scraping details
- •references/reddit.md - Reddit scraping with subreddit targeting
- •references/linkedin.md - LinkedIn post scraping (author or search mode)
- •references/instagram.md - Instagram profile, posts, hashtag, reels, and comments scraping
- •references/facebook.md - Facebook page, posts, reviews, groups, and marketplace scraping
- •references/multi-platform.md - TikTok and YouTube scraping
- •references/url-detect.md - Auto-detect URL type and scrape
Business/Places References
- •references/google-maps.md - Google Maps business search, place details, and reviews
- •references/contact-enrichment.md - Extract emails, phone numbers, and social profiles from websites
Workflow References
- •workflows/lead-generation.md - Multi-step lead generation workflow
- •workflows/influencer-discovery.md - Find and analyze influencers across platforms
- •workflows/competitor-intel.md - Competitive intelligence gathering workflow
- •workflows/trend-analysis.md - Enriched multi-platform trend analysis with scoring
Integration Patterns
Scrape and Enrich
Skills: apify-scrapers → parallel-research Use case: Scrape social media posts, then enrich with deep research Flow:
- •Scrape Twitter/Reddit for mentions of a topic
- •Extract company names or URLs from posts
- •Use parallel-research to get detailed info on each company
Scrape and Summarize
Skills: apify-scrapers → content-generation Use case: Create newsletter content from social media trends Flow:
- •Scrape trending AI posts from Twitter
- •Pass scraped data to content-generation summarize
- •Generate a formatted newsletter section
Scrape and Archive
Skills: apify-scrapers → google-workspace Use case: Save scraped data to Google Drive for team access Flow:
- •Scrape LinkedIn posts from target accounts
- •Format data as CSV or JSON
- •Upload to Google Drive client folder via google-workspace
Trend Analysis + Content Strategy
Skills: apify-scrapers (trend-analysis) → content-generation Use case: Identify trending topics and create content strategy Flow:
- •Run trend analysis:
python scripts/analyze_trends.py "AI productivity" --sources all - •Review lifecycle stage and opportunity score
- •Use content-generation to create content for high-opportunity trends
- •Focus on emerging trends with high velocity scores
Competitive Trend Monitoring
Skills: apify-scrapers (trend-analysis) → parallel-research Use case: Monitor competitor visibility in trending topics Flow:
- •Analyze industry trends:
python scripts/analyze_trends.py --category "your-industry" --discover - •Compare your brand vs competitors in those trends
- •Use parallel-research for deep dive on gaps
- •Generate competitive intelligence report