Research Software Development Skill

This skill provides guidance for research-driven software development, where the primary goals are scientific correctness, reproducibility, and clarity, while applying engineering rigor selectively when software evolves toward long-lived tools, shared codebases, or community-facing packages.

It draws inspiration from Clean Architecture and Domain-Driven Design (DDD), but adapts them to the realities of scientific computing, numerical modeling, and exploratory research workflows (e.g., InSAR/GNSS processing, geophysical modeling, and data analysis pipelines).

Guiding Philosophy

•Science first, engineering second: prioritize correctness, traceability, and interpretability over premature abstraction.
•Reproducibility over cleverness: code should make experiments easy to reproduce and audit.
•Progressive rigor: notebooks and scripts are acceptable early; architectural structure increases as code matures.
•Models are research assets: algorithms, physical models, and inversion logic are more valuable than infrastructure glue.

Code Style Rules

General Principles

•
Early return pattern: prefer early returns over deeply nested conditionals to improve readability.
•
Avoid copy-paste logic in scientific workflows; extract reusable functions for (for example):
- •forward models
- •kernels / Green's functions
- •misfit and regularization terms
•
Functions exceeding 80 lines should be decomposed when possible.
•
Files exceeding 200 lines should be split only if it improves conceptual clarity (not just to satisfy style).
•
Prefer pure functions for scientific computation (inputs → outputs, no hidden state).

Numerical and Scientific Code Practices

•Make units, coordinate systems, and conventions explicit in names and documentation.
•Avoid magic numbers; define physical constants and hyperparameters clearly.
•Favor readability over micro-optimizations unless the code is on a proven performance-critical path.
•Numerical stability and physical interpretability take precedence over abstraction purity.

Library-First (but Research-Aware) Approach

Preferred Strategy

•
ALWAYS check existing scientific libraries before writing custom implementations:
- •numerical linear algebra (e.g., BLAS/LAPACK wrappers)
- •optimization and inversion libraries
- •geospatial and remote-sensing toolkits
•
Reuse well-tested libraries for:
- •IO, file formats, coordinate transforms
- •plotting and visualization
- •parallelization and acceleration

When Custom Code Is Justified

Custom implementations are appropriate when:

•The logic encodes domain-specific physical models (e.g., fault slip, viscoelastic relaxation).
•Existing libraries do not support required assumptions or geometries.
•The code represents a research contribution rather than infrastructure.
•Full transparency is required for peer review and reproducibility.
•Performance-critical kernels require tailored optimization.

Scientific novelty is a valid reason to write custom code.

Architecture and Structure

Research-Oriented Clean Architecture

•
Keep scientific core logic independent of:
- •plotting frameworks
- •file system layout
- •command-line interfaces
•
Separate clearly:
- •physical / mathematical models
- •numerical solvers
- •data preparation and visualization

Typical layering (conceptual, not rigid):

•Domain layer: equations, physical assumptions, forward/inverse models
•Application layer: experiment setup, parameter sweeps, inversion workflows
•Infrastructure layer: IO, plotting, parallel execution, external tools

Naming Conventions (Critical for Research Code)

•
AVOID vague names: utils, helpers, misc, test2
•
USE names that encode scientific meaning:
- •TriangularDislocationKernel
- •AfterSlipInversion
- •ViscoelasticRelaxationModel
•
Prefer names that would still make sense 5 years later or to another researcher.

Separation of Concerns (Applied Pragmatically)

•Do NOT mix physical equations with plotting logic.
•Do NOT embed hard-coded experiment parameters inside core model functions.
•Keep inversion configuration separate from inversion algorithms.
•Allow notebooks/scripts to orchestrate experiments, but keep them thin.

Scripts orchestrate; libraries explain.

Anti-Patterns to Avoid in Research Software

•
Re-implementing standard numerical methods without justification.
•
Over-engineering early-stage exploratory code.
•
Monolithic scripts that mix:
- •data loading
- •modeling
- •inversion
- •visualization
•
"One-off" hacks silently becoming production research code.

Remember:

Undocumented scientific code is irreproducible science.

Code Quality and Reproducibility

•
Ensure deterministic behavior where possible (random seeds, fixed solvers).
•
Use clear error messages for invalid physical assumptions or parameter ranges.
•
Keep functions under 50 lines when feasible, but allow exceptions for mathematically cohesive blocks.
•
Prefer explicit over implicit behavior, even if slightly verbose.
•
Minimal but meaningful documentation:
- •what problem is solved
- •what assumptions are made
- •what each parameter represents

Transition to Engineering-Grade Software

When research code evolves into:

•shared group tools
•open-source packages
•long-term modeling frameworks

then progressively introduce:

•stronger module boundaries
•stricter testing
•clearer public APIs
•more conventional Clean Architecture discipline

Core Principle

In research software, clarity and correctness define quality; architecture exists to preserve them as complexity grows.