AgentSkillsCN

fabricate-trading-day

为未完成的交易日生成包含虚假交易者的 Pickle 文件。当用户提出“创建虚假数据”“生成带有假数据的 Pickle”“添加虚假交易者”“伪造交易日”,或需要根据不完整的 CSV 鲸鱼数据(在完整 A5 Excel 数据可用之前)创建 /db23/parsed_excel_files/{day}.pickle 时,可使用此技能。通过添加均衡的虚假交易者与 PT 交易,从约 300 笔鲸鱼交易中生成 3–4 万行的数据集。

SKILL.md
--- frontmatter
name: fabricate-trading-day
description: Generate pickle files with fake players for incomplete trading days. Use when user asks to "create fake data", "generate pickle with fakes", "add fakers", "fabricate trading day", or needs to create /db23/parsed_excel_files/{day}.pickle from incomplete CSV whale data (before complete A5 Excel available). Creates 30-40K row datasets from ~300 whale transactions by adding balanced fake players and PT transactions.

Fabricate Trading Day

Generate /db23/parsed_excel_files/{day}.pickle from incomplete CSV whale data.

Quick Start

bash
python scripts/generate_pickle.py \
  --csv /path/to/whale_data.csv \
  --day 2026_01_06

Output: /db23/parsed_excel_files/2026_01_06.pickle (30-40K rows, balanced, ready for pipeline)

What It Does

Before market open, only partial CSV (~300 whale transactions) is available. This skill creates a complete dataset (30-40K rows) for dashboards by:

  1. Loading CSV whale data (real transactions)
  2. Adding fake players from reusable pool to balance books
  3. Creating PT transactions (3 pairs per stock, all fake)
  4. Padding to 30-40K rows
  5. Verifying all requirements
  6. Saving pickle file

Your deliverable: ONE pickle file User handles: Pipeline steps 2-6 to process your pickle

CSV Format

Required columns: Stock, Account, Name, Buy Order, Buy, Sell Order, Sell, Date

Number format: Plain integers/floats - NO period as thousands separator

  • ✓ Correct: 670000 or 670000.0
  • ✗ Wrong: 670.000 (Vietnamese format)

Empty cells: Treated as 0

Script Options

  • --csv: Input CSV path (required)
  • --day: Day string YYYY_MM_DD (required)
  • --fake-pool: Fake pool path (default: /db23/parsed_excel_files/fake_player_pool.pickle)
  • --output: Output path (default: /db23/parsed_excel_files/{day}.pickle)

Data Requirements

The script enforces these requirements:

  1. Row count: 30,000-40,000
  2. Balance: sum(buy) == sum(sell) per stock AND total
  3. PT: 3 pairs per stock (6 rows each stock), all fake, 1000 shares each
  4. Types: int32 for volumes, int64 for is_pt, float64 for price
  5. No NaN in: stk, name, id, address
  6. Order >= matched: buy_order >= buy, sell_order >= sell

Verification

Script automatically verifies and reports:

  • Whale sums match CSV input exactly
  • Each stock balanced individually
  • Total balanced
  • Row count in 30-40K range
  • No NaN in critical fields
  • Correct data types

All checks must pass before pickle is saved.

Critical: NO Scaling

The script parses CSV values as-is. Numbers are already correct.

Previous bug to avoid:

  • Old scripts used .replace('.', '') for Vietnamese format (2.466.500)
  • When CSV has float format (2466500.0), this creates 10x error: "2466500.0".replace('.', '') → "24665000"
  • Solution: Parse as int(float(value)) - NO string replacement

Pipeline Context

Your pickle replaces Step 1 of the 6-step pipeline:

StepScriptYour Role
1parse_excel_file.pyYOU REPLACE THIS
2label_parsed_excel_file_new.pyUser runs
3-6...User runs

Pipeline location: /Users/sotola/PycharmProjects/mac_local_m4

Your boundary: Generate pickle → Stop. User handles rest.

Related Documentation

  • Full onboarding: /Users/sotola/PycharmProjects/db23/docs/onboarding-incomplete-data-processing.md
  • Pipeline guide: /Users/sotola/PycharmProjects/db23/sops/six-step-ingestion-pipeline-doc.md
  • Detailed instructions: /Users/sotola/PycharmProjects/db23/ai/generated_doc/generate-incomplete-day-pickle-instructions.md