AgentSkillsCN

foundry

Foundry 蛋白质设计工具包参考。当用户谈及蛋白质设计、结构预测、逆向折叠、RFdiffusion3、RosettaFold3、ProteinMPNN、LigandMPNN、SolubleMPNN、Enhanced MPNN,或希望在 GPU 上运行 Foundry 工具时,该技能将自动调用。

SKILL.md
--- frontmatter
name: foundry
description: "Foundry protein design toolkit reference. Auto-invoked when the user discusses protein design, structure prediction, inverse folding, RFdiffusion3, RosettaFold3, ProteinMPNN, LigandMPNN, SolubleMPNN, Enhanced MPNN, or running Foundry tools on GPUs."
user-invocable: false
allowed-tools: Bash, Read, Write, Glob, Grep

Foundry Protein Design Toolkit Reference

Foundry provides core protein design and prediction tools. All tools require GPU compute and are designed to run on Vast.ai instances using a pre-built Docker image that includes all tools, checkpoints, and dependencies.

Tools Overview

ToolCLI CommandPurposeGPU VRAM
RFdiffusion3 (RFD3)rfd3 designAll-atom generative protein structure design24-48GB+
RosettaFold3 (RF3)rf3 foldBiomolecular structure prediction24-48GB+
ProteinMPNN/LigandMPNNmpnnFixed-backbone inverse folding sequence design16-24GB
Enhanced MPNNmpnnDesignability-optimized inverse folding (fine-tuned LigandMPNN)16-24GB

Docker Image

A pre-built Docker image based on vastai/base-image:cuda-12.4.1-auto is available with everything pre-installed:

  • Python 3.12, PyTorch with CUDA 12.4
  • Foundry with all model extras (rc-foundry[all])
  • All checkpoints pre-downloaded to /root/.foundry/checkpoints
  • Enhanced MPNN weights included

When launching Vast.ai instances, use this image with:

bash
vastai create instance <OFFER_ID> --image <FOUNDRY_DOCKER_IMAGE> --disk 64 --ssh

No --onstart-cmd is needed — the image is ready to use immediately. The working directory is /workspace.

Available Checkpoints

All checkpoints are pre-installed in the Docker image at /root/.foundry/checkpoints:

NameFileTool
rfd3rfd3_latest.ckptRFdiffusion3
rf3rf3_foundry_01_24_latest_remapped.ckptRosettaFold3 (latest, recommended)
rf3_preprint_921rf3_foundry_09_21_preprint_remapped.ckptRF3 (benchmark, 09/21 cutoff)
rf3_preprint_124rf3_foundry_01_24_preprint_remapped.ckptRF3 (preprint)
proteinmpnnproteinmpnn_v_48_020.ptProteinMPNN
ligandmpnnligandmpnn_v_32_010_25.ptLigandMPNN
solublempnnsolublempnn_v_48_020.ptSolubleMPNN
enhanced_mpnnenhanced_mpnn_step_80000.ptEnhanced MPNN

Checkpoint env: FOUNDRY_CHECKPOINT_DIRS=/root/.foundry/checkpoints


Tool 1: RFdiffusion3 (RFD3)

All-atom generative model for designing protein structures under complex constraints: protein binders, enzyme active sites, nucleic acid binders, small molecule binders, symmetric assemblies.

CLI

bash
rfd3 design out_dir=<OUTPUT_DIR> inputs=<INPUT_JSON> [OPTIONS]

Key Parameters

ParameterDefaultDescription
out_dirrequiredOutput directory
inputsrequiredInput JSON/YAML file
ckpt_pathrfd3Checkpoint (auto-resolved from registry)
skip_existingTrueSkip existing outputs
diffusion_batch_size8Designs per batch
n_batches1Number of batches
dump_trajectoriesFalseSave denoising trajectory (large files)
prevalidate_inputsFalseValidate inputs before loading model
low_memory_modeFalseMemory-efficient tokenization

Sampler Parameters

ParameterDefaultDescription
inference_sampler.num_timesteps200Diffusion steps
inference_sampler.step_scale1.5Diversity vs designability tradeoff
inference_sampler.noise_scale1.003Noise scaling
inference_sampler.use_classifier_free_guidanceFalseEnable CFG
inference_sampler.cfg_scale1.5CFG scale (if enabled)
inference_sampler.kinddefaultdefault or symmetry

Input JSON Format

The input is a JSON file mapping example names to design specifications:

json
{
    "design_name": {
        "input": "./path/to/target.pdb",
        "contig": "40-120,/0,A1-100",
        "length": "140-160",
        "ligand": "NAI,ACT",
        "unindex": "A108,A139",
        "select_fixed_atoms": {
            "A108": "ND2,CG",
            "A139": "OG,CB,CA"
        },
        "select_hotspots": {
            "E64": "CD2,CZ"
        },
        "is_non_loopy": true,
        "infer_ori_strategy": "hotspots",
        "dialect": 2
    }
}

Key input fields:

  • input: Path to target structure (PDB/CIF)
  • contig: Contig specification — defines which residues to keep/design and chain breaks
  • length: Length range for designed protein (e.g., "140-160")
  • ligand: Comma-separated ligand residue names to include
  • unindex: Residues to unindex (make designable in position)
  • select_fixed_atoms: Per-residue atom selections to fix
  • select_hotspots: Target hotspot residues for interface design
  • is_non_loopy: Disable loop-only design mode
  • infer_ori_strategy: How to determine orientation (hotspots)
  • dialect: Input dialect version (use 2 for latest)
  • partial_t: Partial diffusion timestep (for refinement)
  • ori_token: Orientation token indices

Output

  • {name}.cif — Designed all-atom structures
  • {name}.json — Full design metadata
  • Trajectory files (if dump_trajectories=True)

Tool 2: RosettaFold3 (RF3)

All-atom biomolecular structure prediction for proteins, nucleic acids, ligands, and complexes.

CLI

bash
rf3 fold inputs=<INPUT_FILE_OR_DIR> [OPTIONS]

Key Parameters

ParameterDefaultDescription
inputsrequiredJSON, CIF, PDB file, list, or directory
out_dir./Output directory
ckpt_pathautoCheckpoint path
n_recycles10Number of recycling iterations
diffusion_batch_size5Number of output structures
num_steps200Diffusion sampling steps (50 is faster, similar quality)
early_stopping_plddt_threshold0.5Skip low-confidence predictions
seednullRandom seed
dump_trajectoriesFalseSave denoising trajectories
skip_existingFalseSkip existing predictions
one_model_per_fileFalseSeparate files per model
annotate_b_factor_with_plddtFalsepLDDT as B-factors
template_noise_scale1e-5Template noise

Structural Control

  • template_selection — AtomSelection syntax for template regions (e.g., "[A, B/*/1-42]")
  • ground_truth_conformer_selection — Fix ligand conformations (e.g., "[C, D]")
  • cyclic_chains — List of chain IDs to cyclize

Input JSON Format

json
[
    {
        "name": "example_prediction",
        "components": [
            {
                "seq": "MTSENPLLALREK...",
                "chain_id": "A",
                "msa_path": "path/to/protein.a3m"
            },
            {
                "ccd_code": "MG"
            },
            {
                "smiles": "[nH]1cc[nH+]c1"
            },
            {
                "path": "path/to/ligand.sdf"
            }
        ],
        "template_selection": ["A"],
        "ground_truth_conformer_selection": ["C"]
    }
]

Component types:

  • seq + optional msa_path — Protein/nucleic acid sequence (supports non-canonical: (PTM))
  • ccd_code — CCD compound code (e.g., MG, NAG)
  • smiles — Small molecule SMILES string
  • path — Structure file (CIF, PDB, SDF)

AtomSelection syntax: CHAIN/RES_NAME/RES_ID/ATOM_NAME

  • A — all atoms in chain A
  • A/*/5-10 — residues 5-10 in chain A
  • B/*/1-42, B/*/49-63 — multiple regions (CDR framework)

Output

  • {name}_metrics.csv — Overall confidence metrics (pTM, pLDDT, ipTM)
  • {name}.score — Granular per-atom metrics
  • {name}_model_0.cif.gz ... {name}_model_N.cif.gz — Predicted structures

Tool 3: ProteinMPNN / LigandMPNN / SolubleMPNN / Enhanced MPNN

Lightweight inverse-folding models for fixed-backbone protein sequence design.

CLI

bash
mpnn --model_type <MODEL_TYPE> --structure_path <STRUCTURE> [OPTIONS]

Or from JSON config:

bash
mpnn --config_json config.json

Model Variants

Variantmodel_typeCheckpointUse Case
ProteinMPNNprotein_mpnnproteinmpnn_v_48_020.ptStandard protein sequence design
LigandMPNNligand_mpnnligandmpnn_v_32_010_25.ptDesign around small molecules, DNA, ions
SolubleMPNNprotein_mpnnsolublempnn_v_48_020.ptSolubility-optimized design
Enhanced MPNNligand_mpnnenhanced_mpnn_step_80000.ptDesignability-optimized (highest success rate)

Enhanced MPNN

Enhanced MPNN is a fine-tuned LigandMPNN trained with ResiDPO (Residue-level Designability Preference Optimization). Instead of optimizing for native sequence recovery like standard MPNN, it directly optimizes for designability — whether designed sequences fold into the target structure with high confidence (as measured by AlphaFold2 pLDDT scores).

Performance improvements over standard LigandMPNN:

  • Enzyme design: ~2.7x higher success rate (17.6% vs 6.6%)
  • Binder design: ~2.3x higher success rate (16.1% vs 7.1%)
  • Makes previously "undesignable" backbones designable — doubles the fraction of viable backbone scaffolds

How it works:

  • ResiDPO decouples preference learning from KL regularization at the residue level
  • Residue-level Preference Learning (RPL) improves positions where pLDDT is low
  • Residue-level Constraint Learning (RCL) preserves knowledge at positions already working well
  • Trained on 19k PDB structures with AF2 pLDDT as the reward signal

When to use Enhanced MPNN (recommended for most de novo design):

  • Binder or enzyme design pipelines (RFD3 → MPNN → RF3 validation)
  • Any scenario where standard MPNN yields low designability/validation rates
  • Maximizing the fraction of designs that pass structure validation

When to prefer standard MPNN/LigandMPNN:

  • When sequence recovery (similarity to native) matters more than designability
  • Benchmarking against published results using original models

Usage — same CLI as LigandMPNN, just swap the checkpoint:

bash
mpnn --model_type ligand_mpnn \
  --checkpoint_path /root/.foundry/checkpoints/enhanced_mpnn_step_80000.pt \
  --structure_path backbone.cif \
  --temperature 0.1 \
  --number_of_batches 8

Key Parameters

Global:

  • --model_typeprotein_mpnn or ligand_mpnn
  • --checkpoint_path — Path to model weights
  • --is_legacy_weights — Set True for original repository weights (not needed for Enhanced MPNN)
  • --out_directory — Output directory
  • --write_fasta — Write FASTA output (default: True)
  • --write_structures — Write designed structures (default: True)

Per-input:

  • --structure_path — Input structure (CIF/PDB)
  • --batch_size — Sequences per batch (default: 1)
  • --number_of_batches — Number of batches (default: 1)
  • --temperature — Sampling temperature, controls diversity (default: 0.1)
  • --seed — Random seed
  • --designed_chains — Chains to redesign
  • --fixed_chains — Chains to keep fixed
  • --designed_residues — Specific residues to design
  • --fixed_residues — Specific residues to fix

Advanced:

  • --omit — Amino acids to exclude (default: ["UNK"])
  • --bias — Per-residue logit bias
  • --structure_noise — Noise level (default: 0.0)
  • --symmetry_residues — Residues for symmetric design
  • --homo_oligomer_chains — Homo-oligomer chains

JSON Config Format

json
{
    "checkpoint_path": "enhanced_mpnn_step_80000.pt",
    "model_type": "ligand_mpnn",
    "out_directory": "./outputs/",
    "inputs": [
        {
            "structure_path": "complex.pdb",
            "name": "example",
            "seed": 42,
            "batch_size": 1,
            "number_of_batches": 5,
            "temperature": 0.1,
            "fixed_chains": ["A"],
            "designed_chains": ["B"]
        }
    ]
}

Output

  • {name}_sequences_*.fasta — Designed sequences
  • {name}_*.cif — Designed structures (if write_structures=True)

Common Design Workflows

Protein Binder Design Pipeline

  1. RFD3: Generate binder backbones targeting a protein → outputs CIF structures
  2. Enhanced MPNN: Design sequences for the generated backbones (~2.3x better success) → outputs FASTA sequences
  3. RF3: Validate designed sequences fold correctly → outputs confidence metrics

Enzyme Design Pipeline

  1. RFD3: Design enzyme scaffolds around a ligand/substrate
  2. Enhanced MPNN: Design ligand-aware sequences (~2.7x better designability) → outputs FASTA sequences
  3. RF3: Validate predicted structure matches design

Sequence Optimization

  1. Start with an existing structure (PDB/CIF)
  2. ProteinMPNN/SolubleMPNN: Redesign sequences for stability/solubility
  3. RF3: Predict structure of redesigned sequences

Vast.ai GPU Recommendations

ToolMin VRAMRecommended GPUNotes
RFD324 GBA100 40GB, RTX 4090Large designs need 48GB+
RF324 GBA100 40GB, RTX 4090Multi-chain complexes need more
MPNN (all variants)8 GBRTX 4090, RTX 3090Very lightweight

Use the pre-built Foundry Docker image — no setup commands needed. Instance is ready to run immediately after launch.