Folder Organization Best Practices

Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.

When to Use This Skill

•Setting up new projects
•Reorganizing existing projects
•Establishing team conventions
•Creating reproducible research structures
•Managing data-intensive projects

Core Principles

•Predictability - Standard locations for common file types
•Scalability - Structure grows gracefully with project
•Discoverability - Easy for others (and future you) to navigate
•Separation of Concerns - Code, data, documentation, outputs separated
•Version Control Friendly - Large/generated files excluded appropriately

Standard Project Structure

Research/Analysis Projects

code

project-name/
├── README.md                 # Project overview and getting started
├── .gitignore               # Exclude data, outputs, env files
├── environment.yml          # Conda environment (or requirements.txt)
├── data/                    # Input data (often gitignored)
│   ├── raw/                # Original, immutable data
│   ├── processed/          # Cleaned, transformed data
│   └── external/           # Third-party data
├── notebooks/               # Jupyter notebooks for exploration
│   ├── 01-exploration.ipynb
│   ├── 02-analysis.ipynb
│   └── figures/            # Notebook-generated figures
├── src/                     # Source code (reusable modules)
│   ├── __init__.py
│   ├── data_processing.py
│   ├── analysis.py
│   └── visualization.py
├── scripts/                 # Standalone scripts and workflows
│   ├── download_data.sh
│   └── run_pipeline.py
├── tests/                   # Unit tests
│   └── test_analysis.py
├── docs/                    # Documentation
│   ├── methods.md
│   └── references.md
├── results/                 # Analysis outputs (gitignored)
│   ├── figures/
│   ├── tables/
│   └── models/
└── config/                  # Configuration files
    └── analysis_config.yaml

Development Projects

code

project-name/
├── README.md
├── .gitignore
├── setup.py                 # Package configuration
├── requirements.txt         # or pyproject.toml
├── src/
│   └── package_name/
│       ├── __init__.py
│       ├── core.py
│       └── utils.py
├── tests/
│   ├── test_core.py
│   └── test_utils.py
├── docs/
│   ├── api.md
│   └── usage.md
├── examples/                # Example usage
│   └── example_workflow.py
└── .github/                 # CI/CD workflows
    └── workflows/
        └── tests.yml

Bioinformatics/Workflow Projects

code

project-name/
├── README.md
├── data/
│   ├── raw/                # Raw sequencing data
│   ├── reference/          # Reference genomes, annotations
│   └── processed/          # Workflow outputs
├── workflows/               # Galaxy .ga or Snakemake files
│   ├── preprocessing.ga
│   └── assembly.ga
├── config/
│   ├── workflow_params.yaml
│   └── sample_sheet.tsv
├── scripts/                # Helper scripts
│   ├── submit_workflow.py
│   └── quality_check.py
├── results/                # Final outputs
│   ├── figures/
│   ├── tables/
│   └── reports/
└── logs/                   # Workflow execution logs

File Naming Conventions

General Rules

•
Use lowercase with hyphens or underscores
- •✅ data-analysis.py or data_analysis.py
- •❌ DataAnalysis.py or data analysis.py
•
Be descriptive but concise
- •✅ process-telomere-data.py
- •❌ script.py or process_all_the_telomere_sequencing_data_from_experiments.py
•
Use consistent separators
- •Choose either hyphens or underscores and stick with it
- •Convention: hyphens for file names, underscores for Python modules
•
Include version/date for important outputs
- •✅ report-2026-01-23.pdf or model-v2.pkl
- •❌ report-final-final-v3.pdf

Numbered Sequences

For sequential files (notebooks, scripts), use zero-padded numbers:

code

notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb

Data Files

Include metadata in filename when possible:

code

data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta

Directory Management Best Practices

What to Version Control

DO commit:

•Source code
•Documentation
•Configuration files
•Small test datasets (<1MB)
•Requirements/environment files
•README files

DON'T commit:

•Large data files (use .gitignore)
•Generated outputs
•Environment directories (venv/, conda-env/)
•Logs
•Temporary files
•API keys/secrets

.gitignore Template

gitignore

# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/

# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints

# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz

# Outputs
results/
outputs/
*.png
*.pdf
*.html

# Logs
logs/
*.log

# Environment
.env
environment.local.yml

# OS
.DS_Store
Thumbs.db

Data Organization

Raw Data is Sacred

•Never modify raw data - Always keep originals untouched
•Store in data/raw/ and make it read-only if possible
•Document data provenance (where it came from, when downloaded)

Processed Data Hierarchy

code

data/
├── raw/                    # Original, immutable
├── interim/                # Intermediate processing steps
├── processed/              # Final, analysis-ready data
└── external/               # Third-party data

Documentation Standards

README.md Essentials

Every project should have a README with:

markdown

# Project Name

Brief description

## Installation

How to set up the environment

## Usage

How to run the analysis/code

## Project Structure

Brief overview of directories

## Data

Where data lives and how to access it

## Results

Where to find outputs

Code Documentation

•Docstrings for all functions/classes
•Comments for complex logic
•CHANGELOG.md for tracking changes
•TODO.md for tracking work (gitignored or removed before merge)

Common Anti-Patterns to Avoid

❌ Flat structure with everything in root

code

project/
├── script1.py
├── script2.py
├── data.csv
├── output1.png
├── output2.png
└── final_really_final_v3.xlsx

❌ Ambiguous naming

code

notebooks/
├── notebook1.ipynb
├── test.ipynb
├── analysis.ipynb
└── analysis_new.ipynb

❌ Mixed concerns

code

project/
├── src/
│   ├── analysis.py
│   ├── data.csv          # Data in source code directory
│   └── figure1.png       # Output in source code directory

Cleanup and Maintenance

Regular Maintenance Tasks

•Archive old branches - Delete merged feature branches
•Clean temp files - Remove TODO.md, NOTES.md from completed work
•Update documentation - Keep README current with changes
•Review .gitignore - Ensure large files aren't tracked
•Organize notebooks - Rename/renumber as project evolves

End-of-Project Checklist

• README complete and accurate
• Code documented
• Tests passing
• Large files gitignored
• Working files removed (TODO.md, scratch notebooks)
• Final outputs in results/
• Environment files current
• License added (if applicable)

Integration with Other Skills

This skill works well with:

•python-environment - Environment setup and management
•claude-collaboration - Team workflow best practices
•jupyter-notebook-analysis - Notebook organization standards

Templates and Tools

Quick Project Setup

bash

# Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config
touch README.md .gitignore environment.yml

Cookiecutter Templates

Consider using cookiecutter for standardized project templates:

•cookiecutter-data-science - Data science projects
•cookiecutter-research - Research projects
•Custom team templates

folder-organization

Folder Organization Best Practices

When to Use This Skill

Core Principles

Standard Project Structure

Research/Analysis Projects

Development Projects

Bioinformatics/Workflow Projects

File Naming Conventions

General Rules

Numbered Sequences

Data Files

Directory Management Best Practices

What to Version Control

.gitignore Template

Data Organization

Raw Data is Sacred

Processed Data Hierarchy

Documentation Standards

README.md Essentials

Code Documentation

Common Anti-Patterns to Avoid

Cleanup and Maintenance

Regular Maintenance Tasks

End-of-Project Checklist

Integration with Other Skills

Templates and Tools

Quick Project Setup

Cookiecutter Templates

References and Resources