Fabricate Trading Day

Generate /db23/parsed_excel_files/{day}.pickle from incomplete CSV whale data.

Quick Start

bash

python scripts/generate_pickle.py \
  --csv /path/to/whale_data.csv \
  --day 2026_01_06

Output: /db23/parsed_excel_files/2026_01_06.pickle (30-40K rows, balanced, ready for pipeline)

Before market open, only partial CSV (~300 whale transactions) is available. This skill creates a complete dataset (30-40K rows) for dashboards by:

Your deliverable: ONE pickle file User handles: Pipeline steps 2-6 to process your pickle

Required columns: Stock, Account, Name, Buy Order, Buy, Sell Order, Sell, Date

Number format: Plain integers/floats - NO period as thousands separator

Empty cells: Treated as 0

•--csv: Input CSV path (required)
•--day: Day string YYYY_MM_DD (required)
•--fake-pool: Fake pool path (default: /db23/parsed_excel_files/fake_player_pool.pickle)
•--output: Output path (default: /db23/parsed_excel_files/{day}.pickle)

The script enforces these requirements:

Script automatically verifies and reports:

All checks must pass before pickle is saved.

The script parses CSV values as-is. Numbers are already correct.

Previous bug to avoid:

•Old scripts used .replace('.', '') for Vietnamese format (2.466.500)
•When CSV has float format (2466500.0), this creates 10x error: "2466500.0".replace('.', '') → "24665000"
•Solution: Parse as int(float(value)) - NO string replacement

Your pickle replaces Step 1 of the 6-step pipeline:

Step	Script	Your Role
1	`parse_excel_file.py`	← YOU REPLACE THIS
2	`label_parsed_excel_file_new.py`	User runs
3-6	...	User runs

Pipeline location: /Users/sotola/PycharmProjects/mac_local_m4

Your boundary: Generate pickle → Stop. User handles rest.