Software Developer Skill
Purpose
Implement production-quality bioinformatics software from technical specifications with comprehensive testing, documentation, and error handling.
When to Use This Skill
Use this skill when you need to:
- •Implement software from architecture specification
- •Write production-ready code (not exploratory analysis)
- •Create command-line tools or packages
- •Build reusable libraries
- •Ensure code quality through testing
Workflow Integration
Pattern: Receive Spec → Implement → Test → Document → Deliver
code
Systems Architect provides technical spec
↓
Software Developer implements
↓ (copilot reviews continuously)
Biologist Commentator validates biological correctness
↓
Production-ready software
Core Capabilities
1. Implementation from Spec
- •Translate architecture into working code
- •Modular, reusable functions/classes
- •Follow coding standards (PEP 8)
- •Type hints for clarity
2. Error Handling
- •Try/except with informative messages
- •Validate inputs
- •Graceful failure
- •Logging for debugging
3. Testing
- •Unit tests (pytest)
- •Integration tests
- •Edge case coverage
- •
80% code coverage goal
4. Documentation
- •Docstrings (Google style)
- •README with usage examples
- •API reference
- •Troubleshooting guide
5. CLI Interface
- •argparse or Click
- •Help messages
- •Progress bars for long operations
- •Sensible defaults
Standard Package Structure
Use assets/package_structure_template/:
code
project_name/ ├── src/ │ ├── __init__.py │ ├── module1.py │ ├── module2.py │ └── cli.py ├── tests/ │ ├── test_module1.py │ ├── test_module2.py │ ├── fixtures/ │ └── test_data/ ├── docs/ │ ├── usage.md │ └── api.md ├── README.md ├── setup.py ├── pyproject.toml ├── requirements.txt ├── environment.yml └── .gitignore
Code Quality Standards
Docstring Format (Google Style)
python
def calculate_cpm(counts: pd.DataFrame) -> pd.DataFrame:
"""
Calculate counts per million (CPM) normalization.
Parameters
----------
counts : pd.DataFrame
Raw count matrix (genes × samples)
Returns
-------
pd.DataFrame
CPM-normalized counts
Raises
------
ValueError
If counts contain negative values
Examples
--------
>>> counts = pd.DataFrame({'A': [10, 20], 'B': [30, 40]})
>>> cpm = calculate_cpm(counts)
>>> cpm['A'].sum() # Should be ~1,000,000
1000000.0
"""
if (counts < 0).any().any():
raise ValueError("Counts cannot be negative")
return (counts / counts.sum(axis=0)) * 1e6
Error Handling
python
# ✅ Good: Informative error messages
try:
data = pd.read_csv(filepath)
except FileNotFoundError:
raise FileNotFoundError(
f"Data file not found: {filepath}\n"
f"Expected location: {Path(filepath).absolute()}"
)
except pd.errors.EmptyDataError:
raise ValueError(
f"Data file is empty: {filepath}\n"
f"Check that file was generated correctly"
)
Logging
python
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def process_samples(sample_list):
logger.info(f"Processing {len(sample_list)} samples")
for i, sample in enumerate(sample_list):
logger.debug(f"Processing sample {i+1}/{len(sample_list)}: {sample}")
# ... processing code ...
logger.info("Processing complete")
Testing with pytest
python
# tests/test_normalization.py
import pytest
import pandas as pd
import numpy as np
from src.normalization import calculate_cpm
def test_cpm_sum_equals_million():
"""Test that CPM normalization sums to ~1 million."""
counts = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})
cpm = calculate_cpm(counts)
assert np.allclose(cpm.sum(axis=0), 1e6)
def test_cpm_raises_on_negative():
"""Test that negative counts raise ValueError."""
counts = pd.DataFrame({'A': [-10, 20], 'B': [30, 40]})
with pytest.raises(ValueError, match="negative"):
calculate_cpm(counts)
def test_cpm_handles_zero_sum():
"""Test behavior when column sums to zero."""
counts = pd.DataFrame({'A': [0, 0], 'B': [10, 20]})
# Should handle gracefully (decide behavior: NaN or raise)
CLI Template
See assets/cli_template.py:
python
#!/usr/bin/env python3
"""
QC Pipeline CLI
Usage:
qc_pipeline samples.csv --output results/
"""
import click
import logging
from pathlib import Path
@click.command()
@click.argument('sample_file', type=click.Path(exists=True))
@click.option('--output', '-o', default='results/', help='Output directory')
@click.option('--threads', '-t', default=4, help='Number of threads')
@click.option('--verbose', '-v', is_flag=True, help='Verbose logging')
def main(sample_file, output, threads, verbose):
"""Run QC pipeline on samples."""
# Setup logging
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(level=level)
logger = logging.getLogger(__name__)
# Validate inputs
output_dir = Path(output)
output_dir.mkdir(parents=True, exist_ok=True)
logger.info(f"Processing samples from {sample_file}")
logger.info(f"Output directory: {output_dir}")
logger.info(f"Using {threads} threads")
# Main logic
try:
# ... pipeline code ...
logger.info("Pipeline complete!")
except Exception as e:
logger.error(f"Pipeline failed: {e}")
raise
if __name__ == '__main__':
main()
Testing Strategy
1. Unit Tests
Test individual functions in isolation.
2. Integration Tests
Test components working together.
3. Regression Tests
Save expected outputs, compare to current.
4. Edge Case Tests
- •Empty input
- •Single element
- •All zeros
- •Missing values
- •Very large input
Copilot Integration
During implementation:
- •Write code section
- •Copilot reviews immediately
- •Fix critical issues before proceeding
- •Iterate until approved
- •Move to next section
Quality Checklist
Before delivery:
- • All code passes tests (pytest)
- • >80% test coverage
- • All public functions documented
- • Error messages are actionable
- • CLI help message clear
- • README with installation + usage
- • Example data/workflow provided
- • Copilot approved (no critical issues)
- • Biologist validated (biological correctness)
References
For detailed standards:
- •
references/coding_standards.md- PEP 8, naming, function length - •
references/testing_patterns.md- pytest, fixtures, mocking - •
references/error_handling_guide.md- Exception hierarchy, logging - •
references/documentation_standards.md- Docstrings, README, API docs
Scripts
Available in scripts/:
- •
project_template_generator.py- Creates project structure - •
test_runner.py- Runs pytest with coverage
Success Criteria
Code is ready for production when:
- • Implements full specification
- • All tests pass
- • Coverage >80%
- • Documentation complete
- • CLI functional
- • Copilot approved
- • Biologist validated
- • Ready for deployment