AgentSkillsCN

gcell-protein

利用 gcell 进行蛋白质结构与相互作用分析。当用户提出以下问题时,可运用此技能: - 根据基因名称获取蛋白质序列 - 利用 AlphaFold2 预测蛋白质结构并获取 pLDDT 评分 - 查阅 UniProt 蛋白质信息 - 可视化蛋白质三维结构 - 利用 STRING 数据库分析蛋白质—蛋白质相互作用 触发条件:蛋白质结构、AlphaFold、pLDDT、UniProt、蛋白质序列、三维结构、蛋白质相互作用、STRING

SKILL.md
--- frontmatter
name: gcell-protein
description: |
  Protein structure and interaction analysis using gcell. Use this skill when users ask about:
  - Protein sequences from gene names
  - AlphaFold2 structure predictions and pLDDT scores
  - UniProt protein information
  - 3D protein structure visualization
  - Protein-protein interactions (STRING database)
  Triggers: protein structure, AlphaFold, pLDDT, UniProt, protein sequence, 3D structure, protein interaction, STRING

Protein Structure Analysis

Get Protein Sequences

python
from gcell.protein.data import (
    get_seq_from_gene_name,
    get_uniprot_from_gene_name
)

# Get protein sequence from gene name
seq = get_seq_from_gene_name('TP53')
seq = get_seq_from_gene_name('EGFR')
seq = get_seq_from_gene_name('BRCA1')

# Get UniProt accession
uniprot_id = get_uniprot_from_gene_name('TP53')

AlphaFold2 Confidence Scores

python
from gcell.protein.data import get_lddt_from_gene_name

# Get pLDDT (predicted local distance difference test) scores
# Higher scores = higher confidence in structure prediction
plddt = get_lddt_from_gene_name('TP53')
plddt = get_lddt_from_gene_name('EGFR')

# pLDDT interpretation:
# > 90: Very high confidence
# 70-90: Confident
# 50-70: Low confidence
# < 50: Very low confidence (likely disordered)

Full Protein Analysis

python
from gcell.protein.protein import Protein

# Load protein from gene name
protein = Protein.from_gene_name('EGFR')
protein = Protein.from_gene_name('TP53')

# Access protein data
print(protein.sequence)
print(protein.length)
print(protein.plddt)  # AlphaFold confidence

# 3D structure visualization
protein.plot_structure()  # Interactive 3D view

Protein-Protein Interactions

python
from gcell.protein.string import get_string_interactions

# Get interactions from STRING database
interactions = get_string_interactions('TP53')

# Filter by confidence score
high_conf = interactions[interactions['score'] > 0.7]

Key Classes and Functions

NamePurpose
ProteinFull protein analysis class
get_seq_from_gene_name()Get amino acid sequence
get_uniprot_from_gene_name()Get UniProt ID
get_lddt_from_gene_name()Get AlphaFold pLDDT scores
get_string_interactions()Get protein interactions

Data Sources

  • Protein sequences: UniProt
  • Structures: AlphaFold Database
  • Interactions: STRING Database
  • Data cached in: ~/.gcell_data/cache/