Project Architecture Patterns
Purpose
Establish consistent architecture patterns for the rRNA-Phylo project, covering backend design, service organization, API conventions, and testing strategies.
When to Use
This skill activates when:
- •Setting up new services or modules
- •Designing API endpoints
- •Implementing background tasks
- •Working with configuration or dependencies
- •Writing tests
- •Integrating external tools (HMMER, BLAST, alignment tools)
Tech Stack
Core Backend
- •FastAPI: Modern async web framework
- •Pydantic: Data validation and settings
- •SQLAlchemy: ORM for database operations
- •Celery: Distributed task queue for long-running jobs
- •Redis: Message broker and cache
- •PostgreSQL: Primary database (SQLite for dev)
Scientific Computing
- •Biopython: Sequence parsing and analysis
- •NumPy/Pandas: Numerical computing and data manipulation
- •scikit-learn: ML baseline models
- •PyTorch: Deep learning (Phase 2+)
Testing & Quality
- •pytest: Testing framework
- •pytest-asyncio: Async test support
- •pytest-cov: Coverage reporting
- •httpx: Async HTTP client for API tests
- •ruff: Fast linter
- •black: Code formatter
Project Structure
backend/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI application │ ├── config.py # Settings (via Pydantic) │ │ │ ├── models/ # SQLAlchemy ORM models │ │ ├── __init__.py │ │ ├── base.py # Base model class │ │ ├── job.py # Job model │ │ ├── sequence.py # Sequence model │ │ └── tree.py # Phylogenetic tree model │ │ │ ├── schemas/ # Pydantic schemas (API contracts) │ │ ├── __init__.py │ │ ├── common.py # Shared schemas │ │ ├── rrna.py # rRNA detection schemas │ │ ├── phylo.py # Phylogenetics schemas │ │ └── job.py # Job status schemas │ │ │ ├── api/ # API routes │ │ ├── __init__.py │ │ ├── deps.py # Dependency injection │ │ └── v1/ │ │ ├── __init__.py │ │ ├── router.py # Main router │ │ ├── rrna.py # rRNA endpoints │ │ ├── phylo.py # Phylogenetics endpoints │ │ └── jobs.py # Job management │ │ │ ├── services/ # Business logic layer │ │ ├── rrna/ # rRNA detection service │ │ ├── phylo/ # Phylogenetics service │ │ └── sequences/ # Sequence processing │ │ │ ├── workers/ # Celery tasks │ │ ├── __init__.py │ │ ├── celery_app.py # Celery config │ │ └── tasks.py # Task definitions │ │ │ ├── core/ # Core utilities │ │ ├── __init__.py │ │ ├── errors.py # Custom exceptions │ │ └── logging.py # Logging setup │ │ │ └── db/ # Database utilities │ ├── __init__.py │ ├── session.py # DB session management │ └── migrations/ # Alembic migrations │ ├── tests/ │ ├── conftest.py # Pytest fixtures │ ├── unit/ # Unit tests │ ├── integration/ # Integration tests │ └── fixtures/ # Test data │ ├── requirements.txt ├── pyproject.toml └── Dockerfile
Core Patterns
1. Configuration Management
Pattern: Use Pydantic Settings for type-safe configuration.
# app/config.py
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
"""Application settings."""
# API Settings
api_v1_prefix: str = "/api/v1"
project_name: str = "rRNA-Phylo"
debug: bool = False
# Database
database_url: str = "sqlite:///./rrna_phylo.db"
# Celery
celery_broker_url: str = "redis://localhost:6379/0"
celery_result_backend: str = "redis://localhost:6379/0"
# External Tools
hmmer_path: str = "/usr/local/bin/hmmsearch"
blast_path: str = "/usr/local/bin/blastn"
# Data Paths
hmm_profiles_dir: str = "./data/hmm_profiles"
blast_db_dir: str = "./data/blast_dbs"
# ML Models
ml_models_dir: str = "./models"
class Config:
env_file = ".env"
case_sensitive = False
@lru_cache()
def get_settings() -> Settings:
"""Get cached settings instance."""
return Settings()
Usage in dependencies:
# app/api/deps.py
from fastapi import Depends
from app.config import Settings, get_settings
async def get_config() -> Settings:
return get_settings()
2. Service Layer Pattern
Pattern: Encapsulate business logic in service classes.
# app/services/rrna/detector.py
from typing import List, Optional
from app.schemas.rrna import RRNADetectionResult, RRNAType
from app.config import Settings
class RRNADetectorService:
"""Service for detecting rRNA in sequences."""
def __init__(self, settings: Settings):
self.settings = settings
self.hmm_detector = HMMDetector(settings.hmm_profiles_dir)
self.pattern_detector = PatternDetector()
async def detect(
self,
sequence: str,
methods: List[str] = ["hmm", "pattern"],
min_confidence: float = 0.8
) -> List[RRNADetectionResult]:
"""
Detect rRNA in sequence using multiple methods.
Args:
sequence: DNA/RNA sequence string
methods: Detection methods to use
min_confidence: Minimum confidence threshold
Returns:
List of detection results
"""
results = []
if "hmm" in methods:
hmm_results = await self.hmm_detector.detect(sequence)
results.extend(hmm_results)
if "pattern" in methods:
pattern_results = await self.pattern_detector.detect(sequence)
results.extend(pattern_results)
# Filter by confidence
results = [r for r in results if r.confidence >= min_confidence]
# Deduplicate and merge
return self._merge_results(results)
def _merge_results(
self,
results: List[RRNADetectionResult]
) -> List[RRNADetectionResult]:
"""Merge overlapping detections."""
# Implementation here
pass
Usage in API:
# app/api/v1/rrna.py
from fastapi import APIRouter, Depends
from app.services.rrna.detector import RRNADetectorService
from app.api.deps import get_rrna_detector
router = APIRouter()
@router.post("/detect")
async def detect_rrna(
sequence: str,
detector: RRNADetectorService = Depends(get_rrna_detector)
):
"""Detect rRNA in sequence."""
results = await detector.detect(sequence)
return {"results": results}
3. Pydantic Schemas (API Contracts)
Pattern: Define clear input/output schemas with validation.
# app/schemas/rrna.py
from pydantic import BaseModel, Field, validator
from typing import List, Optional
from enum import Enum
class RRNAType(str, Enum):
"""Supported rRNA types."""
SSU_16S = "16S"
SSU_18S = "18S"
LSU_23S = "23S"
LSU_28S = "28S"
FIVE_S = "5S"
FIVE_EIGHT_S = "5.8S"
class DetectionMethod(str, Enum):
"""Detection methods."""
HMM = "hmm"
BLAST = "blast"
PATTERN = "pattern"
ML = "ml"
class RRNADetectionRequest(BaseModel):
"""Request schema for rRNA detection."""
sequence: str = Field(..., min_length=100, max_length=100000)
methods: List[DetectionMethod] = Field(
default=[DetectionMethod.HMM, DetectionMethod.PATTERN]
)
min_confidence: float = Field(default=0.8, ge=0.0, le=1.0)
@validator("sequence")
def validate_sequence(cls, v):
"""Ensure sequence contains only valid nucleotides."""
valid_chars = set("ACGTUNacgtun-")
if not set(v).issubset(valid_chars):
raise ValueError("Sequence contains invalid characters")
return v.upper()
class ConservedRegion(BaseModel):
"""Conserved region within detected rRNA."""
name: str
start: int
end: int
sequence: str
confidence: float
class RRNADetectionResult(BaseModel):
"""Result schema for rRNA detection."""
rrna_type: RRNAType
start: int
end: int
length: int
confidence: float
method: DetectionMethod
score: float
# Quality metrics
completeness: float = Field(..., ge=0.0, le=1.0)
quality: str = Field(..., pattern="^(high|medium|low|very_low)$")
# Optional details
conserved_regions: Optional[List[ConservedRegion]] = None
secondary_structure: Optional[str] = None
class Config:
json_schema_extra = {
"example": {
"rrna_type": "16S",
"start": 0,
"end": 1542,
"length": 1542,
"confidence": 0.95,
"method": "hmm",
"score": 1250.5,
"completeness": 0.98,
"quality": "high"
}
}
4. Dependency Injection
Pattern: Use FastAPI's dependency injection for services.
# app/api/deps.py
from typing import Generator
from fastapi import Depends
from sqlalchemy.orm import Session
from app.db.session import SessionLocal
from app.config import Settings, get_settings
from app.services.rrna.detector import RRNADetectorService
# Database dependency
def get_db() -> Generator:
"""Get database session."""
db = SessionLocal()
try:
yield db
finally:
db.close()
# Service dependencies
def get_rrna_detector(
settings: Settings = Depends(get_settings)
) -> RRNADetectorService:
"""Get rRNA detector service."""
return RRNADetectorService(settings)
def get_phylo_service(
settings: Settings = Depends(get_settings),
db: Session = Depends(get_db)
):
"""Get phylogenetic analysis service."""
from app.services.phylo.tree_builder import PhyloService
return PhyloService(settings, db)
5. Error Handling
Pattern: Use custom exceptions and FastAPI exception handlers.
# app/core/errors.py
class RRNAPhyloException(Exception):
"""Base exception for rRNA-Phylo."""
pass
class SequenceValidationError(RRNAPhyloException):
"""Raised when sequence validation fails."""
pass
class DetectionError(RRNAPhyloException):
"""Raised when rRNA detection fails."""
pass
class AlignmentError(RRNAPhyloException):
"""Raised when sequence alignment fails."""
pass
class TreeBuildingError(RRNAPhyloException):
"""Raised when tree building fails."""
pass
# app/main.py
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from app.core.errors import RRNAPhyloException
app = FastAPI()
@app.exception_handler(RRNAPhyloException)
async def rrna_phylo_exception_handler(
request: Request,
exc: RRNAPhyloException
):
"""Handle custom exceptions."""
return JSONResponse(
status_code=400,
content={
"error": exc.__class__.__name__,
"message": str(exc)
}
)
6. Async Task Processing (Celery)
Pattern: Use Celery for long-running tasks.
# app/workers/celery_app.py
from celery import Celery
from app.config import get_settings
settings = get_settings()
celery_app = Celery(
"rrna_phylo",
broker=settings.celery_broker_url,
backend=settings.celery_result_backend
)
celery_app.conf.update(
task_serializer="json",
accept_content=["json"],
result_serializer="json",
timezone="UTC",
enable_utc=True,
task_track_started=True,
task_time_limit=3600, # 1 hour max
)
# app/workers/tasks.py
from app.workers.celery_app import celery_app
from app.services.phylo.tree_builder import PhyloService
@celery_app.task(bind=True)
def build_phylogenetic_tree(
self,
sequences: List[dict],
method: str,
parameters: dict
):
"""
Build phylogenetic tree (long-running task).
Args:
self: Task instance (for progress updates)
sequences: List of sequences
method: Tree building method
parameters: Method parameters
"""
try:
# Update progress
self.update_state(state="PROGRESS", meta={"stage": "alignment"})
service = PhyloService()
# Align sequences
alignment = service.align_sequences(sequences)
self.update_state(state="PROGRESS", meta={"stage": "tree_building"})
# Build tree
tree = service.build_tree(alignment, method, parameters)
return {"tree": tree, "status": "completed"}
except Exception as e:
self.update_state(state="FAILURE", meta={"error": str(e)})
raise
API integration:
# app/api/v1/phylo.py
from fastapi import APIRouter, BackgroundTasks
from app.workers.tasks import build_phylogenetic_tree
router = APIRouter()
@router.post("/tree")
async def create_tree(request: TreeBuildRequest):
"""Submit tree building job."""
# Submit Celery task
task = build_phylogenetic_tree.delay(
sequences=request.sequences,
method=request.method,
parameters=request.parameters
)
return {
"job_id": task.id,
"status": "submitted"
}
@router.get("/tree/{job_id}")
async def get_tree_status(job_id: str):
"""Get tree building job status."""
task = build_phylogenetic_tree.AsyncResult(job_id)
return {
"job_id": job_id,
"status": task.state,
"result": task.result if task.ready() else None
}
Testing Patterns
1. Test Structure
# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from app.main import app
from app.db.session import Base, get_db
# Test database
SQLALCHEMY_TEST_URL = "sqlite:///./test.db"
@pytest.fixture(scope="session")
def test_engine():
"""Create test database engine."""
engine = create_engine(SQLALCHEMY_TEST_URL)
Base.metadata.create_all(bind=engine)
yield engine
Base.metadata.drop_all(bind=engine)
@pytest.fixture(scope="function")
def test_db(test_engine):
"""Create test database session."""
TestSessionLocal = sessionmaker(bind=test_engine)
db = TestSessionLocal()
try:
yield db
finally:
db.close()
@pytest.fixture
def client(test_db):
"""Create test client."""
def override_get_db():
yield test_db
app.dependency_overrides[get_db] = override_get_db
yield TestClient(app)
app.dependency_overrides.clear()
@pytest.fixture
def sample_16s_sequence():
"""Sample 16S rRNA sequence for testing."""
return "AGAGTTTGATCCTGGCTCAG..." # Truncated
2. Unit Test Example
# tests/unit/test_rrna_detector.py
import pytest
from app.services.rrna.detector import RRNADetectorService
@pytest.mark.asyncio
async def test_detect_16s_rrna(sample_16s_sequence, get_settings):
"""Test 16S rRNA detection."""
detector = RRNADetectorService(get_settings())
results = await detector.detect(sample_16s_sequence)
assert len(results) > 0
assert results[0].rrna_type == "16S"
assert results[0].confidence >= 0.8
3. Integration Test Example
# tests/integration/test_api.py
def test_detect_rrna_endpoint(client, sample_16s_sequence):
"""Test rRNA detection API endpoint."""
response = client.post(
"/api/v1/rrna/detect",
json={
"sequence": sample_16s_sequence,
"methods": ["hmm", "pattern"],
"min_confidence": 0.8
}
)
assert response.status_code == 200
data = response.json()
assert "results" in data
assert len(data["results"]) > 0
API Design Conventions
1. RESTful Endpoints
# rRNA Detection
POST /api/v1/rrna/detect # Detect rRNA
GET /api/v1/rrna/types # List supported types
# Phylogenetics
POST /api/v1/phylo/align # Align sequences
POST /api/v1/phylo/tree # Build tree
POST /api/v1/phylo/bootstrap # Bootstrap analysis
POST /api/v1/phylo/consensus # Consensus tree
# Jobs
GET /api/v1/jobs # List jobs
GET /api/v1/jobs/{id} # Get job status
DELETE /api/v1/jobs/{id} # Cancel job
# Health
GET /health # Health check
GET /metrics # Prometheus metrics
2. Response Format
# Success response
{
"success": true,
"data": { ... },
"metadata": {
"timestamp": "2025-11-20T12:00:00Z",
"version": "1.0.0"
}
}
# Error response
{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid sequence format",
"details": { ... }
}
}
Best Practices
✅ DO
- •Use type hints everywhere
- •Write docstrings for public APIs
- •Use Pydantic for validation
- •Implement proper logging
- •Write tests for new features
- •Use async/await for I/O operations
- •Separate concerns (routes, services, models)
- •Use dependency injection
- •Handle errors gracefully
- •Document API with examples
❌ DON'T
- •Mix business logic with API routes
- •Use synchronous I/O in async functions
- •Hardcode configuration values
- •Skip input validation
- •Ignore exceptions
- •Write god classes
- •Couple services tightly
- •Skip tests
- •Use global state
- •Block the event loop
External Tool Integration Pattern
# app/services/rrna/hmm.py
import subprocess
from pathlib import Path
from typing import Optional
class HMMDetector:
"""HMM-based rRNA detection using HMMER."""
def __init__(self, profiles_dir: str, hmmsearch_path: str = "hmmsearch"):
self.profiles_dir = Path(profiles_dir)
self.hmmsearch_path = hmmsearch_path
async def detect(self, sequence: str, rrna_type: str) -> dict:
"""Run HMMER search."""
# Create temp files
with tempfile.NamedTemporaryFile(mode='w', suffix='.fasta') as seq_file:
seq_file.write(f">query\n{sequence}\n")
seq_file.flush()
profile = self.profiles_dir / f"{rrna_type}.hmm"
# Run HMMER
result = subprocess.run(
[
self.hmmsearch_path,
"--tblout", "/dev/stdout",
str(profile),
seq_file.name
],
capture_output=True,
text=True,
timeout=300
)
if result.returncode != 0:
raise DetectionError(f"HMMER failed: {result.stderr}")
# Parse output
return self._parse_hmmer_output(result.stdout)
Related Skills: rRNA-prediction-patterns, ml-integration-patterns
Line Count: < 500 lines ✅