Data Research Protocol

Name: research
Rating: 78
Author: dmitryprg-ai

Principle: DATA FIRST, CODE SECOND.

Workflow

•LOAD -- Load data, verify accessibility
•SCHEMA -- Show structure (types, shape, samples)
•PROFILE -- Find risks (nulls, duplicates, anomalies)
•HYPOTHESIS -- What do we want to prove?
•EXPERIMENT -- One small test
•DOCUMENT -- Record findings per 5W+H format

Schema Analysis (MANDATORY before any conclusions)

python

print(f"Shape: {df.shape}")
print(f"dtypes:\n{df.dtypes}")
print(f"head:\n{df.head()}")
print(f"nunique:\n{df.nunique()}")
print(f"nulls:\n{df.isnull().sum()}")

Risk Profiling

Risk	Check	Action
Missing data	`df.isnull().sum()`	Document, decide handling
Duplicates	`df.duplicated().sum()`	Investigate
Wrong types	Manual inspection	Convert types
Outliers	`df.describe()`	Investigate

Mini-Experiment Protocol

python

# EXPERIMENT: [Description]
# HYPOTHESIS: [What we expect]
result = df[df['column'] == 'value'].shape[0]
print(f"Result: {result}")
print(f"Expected: {expected}")
print(f"Status: {'PASS' if result == expected else 'FAIL'}")

Rules:

•One question per experiment
•Fast (< 30 seconds)
•Logged (print results)
•Compared with expectation

Cognitive Bias Prevention

•Do NOT analyze only first N records (survivorship bias)
•Do NOT look only for confirmations (confirmation bias)
•Analyze ALL data
•Actively look for DISPROOF of hypothesis