AgentSkillsCN

faker-synthetic-data-generation

生成逼真的虚假数据,涵盖用户名、电子邮件、公司简介以及经过匿名化处理的文本内容。适用于创建测试数据集、向数据库填充模拟记录、替换文档中的个人身份信息,或生成包含合成用户/公司数据的 CSV 文件。

SKILL.md
--- frontmatter
name: faker-synthetic-data-generation
description: Generate realistic fake data including usernames, emails, company profiles, and anonymized text. Use when creating test datasets, populating databases with dummy records, replacing PII in documents, or generating CSV files with synthetic user/company data.

Faker: Synthetic Data Generation

Overview

Faker is a Python library that generates realistic fake data for testing and development. It supports hundreds of data types across dozens of locales, including names, addresses, emails, company information, and free-form text.

Quick Start

bash
python scripts/generate_users.py --output users.csv --count 100

Scripts

scripts/generate_users.py

Generate fake user records with Username and Email fields.

bash
python scripts/generate_users.py --output <output.csv> --count <n>

Parameters:

  • --output — Output CSV file path
  • --count — Number of records to generate (default: 100)

Output columns: Username, Email

scripts/generate_companies.py

Generate fake company profiles with name, address, and phone.

bash
python scripts/generate_companies.py --output <output.csv> --count <n>

Parameters:

  • --output — Output CSV file path
  • --count — Number of records (default: 5)

Output columns: Company Name, Address, Phone

scripts/replace_text.py

Replace real text content with synthetic fake text while preserving paragraph structure.

bash
python scripts/replace_text.py --input <original.txt> --output <anonymized.txt>

Parameters:

  • --input — Source text file
  • --output — Output file with replaced content

Important Notes

  • Randomized output — Results differ on each run (controlled via seed if needed)
  • Locale support — Default is en_US; other locales available via Faker API
  • Dependency — Requires faker package (pip install faker)