Folder Organization Best Practices
Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.
When to Use This Skill
- •Setting up new projects
- •Reorganizing existing projects
- •Establishing team conventions
- •Creating reproducible research structures
- •Managing data-intensive projects
Core Principles
- •Predictability - Standard locations for common file types
- •Scalability - Structure grows gracefully with project
- •Discoverability - Easy for others (and future you) to navigate
- •Separation of Concerns - Code, data, documentation, outputs separated
- •Version Control Friendly - Large/generated files excluded appropriately
Standard Project Structure
Research/Analysis Projects
code
project-name/
├── README.md # Project overview and getting started
├── .gitignore # Exclude data, outputs, env files
├── environment.yml # Conda environment (or requirements.txt)
├── data/ # Input data (often gitignored)
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data
├── notebooks/ # Jupyter notebooks for exploration
│ ├── 01-exploration.ipynb
│ ├── 02-analysis.ipynb
│ └── figures/ # Notebook-generated figures
├── src/ # Source code (reusable modules)
│ ├── __init__.py
│ ├── data_processing.py
│ ├── analysis.py
│ └── visualization.py
├── scripts/ # Standalone scripts and workflows
│ ├── download_data.sh
│ └── run_pipeline.py
├── tests/ # Unit tests
│ └── test_analysis.py
├── docs/ # Documentation
│ ├── methods.md
│ └── references.md
├── results/ # Analysis outputs (gitignored)
│ ├── figures/
│ ├── tables/
│ └── models/
└── config/ # Configuration files
└── analysis_config.yaml
Development Projects
code
project-name/
├── README.md
├── .gitignore
├── setup.py # Package configuration
├── requirements.txt # or pyproject.toml
├── src/
│ └── package_name/
│ ├── __init__.py
│ ├── core.py
│ └── utils.py
├── tests/
│ ├── test_core.py
│ └── test_utils.py
├── docs/
│ ├── api.md
│ └── usage.md
├── examples/ # Example usage
│ └── example_workflow.py
└── .github/ # CI/CD workflows
└── workflows/
└── tests.yml
Bioinformatics/Workflow Projects
code
project-name/ ├── README.md ├── data/ │ ├── raw/ # Raw sequencing data │ ├── reference/ # Reference genomes, annotations │ └── processed/ # Workflow outputs ├── workflows/ # Galaxy .ga or Snakemake files │ ├── preprocessing.ga │ └── assembly.ga ├── config/ │ ├── workflow_params.yaml │ └── sample_sheet.tsv ├── scripts/ # Helper scripts │ ├── submit_workflow.py │ └── quality_check.py ├── results/ # Final outputs │ ├── figures/ │ ├── tables/ │ └── reports/ └── logs/ # Workflow execution logs
File Naming Conventions
General Rules
- •
Use lowercase with hyphens or underscores
- •✅
data-analysis.pyordata_analysis.py - •❌
DataAnalysis.pyordata analysis.py
- •✅
- •
Be descriptive but concise
- •✅
process-telomere-data.py - •❌
script.pyorprocess_all_the_telomere_sequencing_data_from_experiments.py
- •✅
- •
Use consistent separators
- •Choose either hyphens or underscores and stick with it
- •Convention: hyphens for file names, underscores for Python modules
- •
Include version/date for important outputs
- •✅
report-2026-01-23.pdformodel-v2.pkl - •❌
report-final-final-v3.pdf
- •✅
Numbered Sequences
For sequential files (notebooks, scripts), use zero-padded numbers:
code
notebooks/ ├── 01-data-exploration.ipynb ├── 02-quality-control.ipynb ├── 03-statistical-analysis.ipynb └── 04-visualization.ipynb
Data Files
Include metadata in filename when possible:
code
data/raw/ ├── sample-A_hifi_reads_2026-01-15.fastq.gz ├── sample-B_hifi_reads_2026-01-15.fastq.gz └── reference_genome_v3.fasta
Directory Management Best Practices
What to Version Control
DO commit:
- •Source code
- •Documentation
- •Configuration files
- •Small test datasets (<1MB)
- •Requirements/environment files
- •README files
DON'T commit:
- •Large data files (use
.gitignore) - •Generated outputs
- •Environment directories (
venv/,conda-env/) - •Logs
- •Temporary files
- •API keys/secrets
.gitignore Template
gitignore
# Python __pycache__/ *.py[cod] *$py.class .venv/ venv/ *.egg-info/ # Jupyter .ipynb_checkpoints/ *.ipynb_checkpoints # Data data/raw/ data/processed/ *.fastq.gz *.bam *.vcf.gz # Outputs results/ outputs/ *.png *.pdf *.html # Logs logs/ *.log # Environment .env environment.local.yml # OS .DS_Store Thumbs.db
Data Organization
Raw Data is Sacred
- •Never modify raw data - Always keep originals untouched
- •Store in
data/raw/and make it read-only if possible - •Document data provenance (where it came from, when downloaded)
Processed Data Hierarchy
code
data/ ├── raw/ # Original, immutable ├── interim/ # Intermediate processing steps ├── processed/ # Final, analysis-ready data └── external/ # Third-party data
Documentation Standards
README.md Essentials
Every project should have a README with:
markdown
# Project Name Brief description ## Installation How to set up the environment ## Usage How to run the analysis/code ## Project Structure Brief overview of directories ## Data Where data lives and how to access it ## Results Where to find outputs
Code Documentation
- •Docstrings for all functions/classes
- •Comments for complex logic
- •CHANGELOG.md for tracking changes
- •TODO.md for tracking work (gitignored or removed before merge)
Common Anti-Patterns to Avoid
❌ Flat structure with everything in root
code
project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx
❌ Ambiguous naming
code
notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb
❌ Mixed concerns
code
project/ ├── src/ │ ├── analysis.py │ ├── data.csv # Data in source code directory │ └── figure1.png # Output in source code directory
Cleanup and Maintenance
Regular Maintenance Tasks
- •Archive old branches - Delete merged feature branches
- •Clean temp files - Remove
TODO.md,NOTES.mdfrom completed work - •Update documentation - Keep README current with changes
- •Review .gitignore - Ensure large files aren't tracked
- •Organize notebooks - Rename/renumber as project evolves
End-of-Project Checklist
- • README complete and accurate
- • Code documented
- • Tests passing
- • Large files gitignored
- • Working files removed (TODO.md, scratch notebooks)
- • Final outputs in
results/ - • Environment files current
- • License added (if applicable)
Integration with Other Skills
This skill works well with:
- •python-environment - Environment setup and management
- •claude-collaboration - Team workflow best practices
- •jupyter-notebook-analysis - Notebook organization standards
Templates and Tools
Quick Project Setup
bash
# Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config
touch README.md .gitignore environment.yml
Cookiecutter Templates
Consider using cookiecutter for standardized project templates:
- •
cookiecutter-data-science- Data science projects - •
cookiecutter-research- Research projects - •Custom team templates