Faker: Synthetic Data Generation
Overview
Faker is a Python library that generates realistic fake data for testing and development. It supports hundreds of data types across dozens of locales, including names, addresses, emails, company information, and free-form text.
Quick Start
bash
python scripts/generate_users.py --output users.csv --count 100
Scripts
scripts/generate_users.py
Generate fake user records with Username and Email fields.
bash
python scripts/generate_users.py --output <output.csv> --count <n>
Parameters:
- •
--output— Output CSV file path - •
--count— Number of records to generate (default: 100)
Output columns: Username, Email
scripts/generate_companies.py
Generate fake company profiles with name, address, and phone.
bash
python scripts/generate_companies.py --output <output.csv> --count <n>
Parameters:
- •
--output— Output CSV file path - •
--count— Number of records (default: 5)
Output columns: Company Name, Address, Phone
scripts/replace_text.py
Replace real text content with synthetic fake text while preserving paragraph structure.
bash
python scripts/replace_text.py --input <original.txt> --output <anonymized.txt>
Parameters:
- •
--input— Source text file - •
--output— Output file with replaced content
Important Notes
- •Randomized output — Results differ on each run (controlled via seed if needed)
- •Locale support — Default is
en_US; other locales available via Faker API - •Dependency — Requires
fakerpackage (pip install faker)