Perform a multi-species comparative genomics analysis for: $ARGUMENTS
Use the MCP tools available to you to find orthologs, retrieve sequences, align them, and report on conservation. Follow the steps below in order. If a step fails for a particular species, note the gap and continue with the remaining species.
Input Parsing
Parse the user input to identify:
- •Gene identifier — a gene symbol (e.g.,
TP53), Ensembl gene ID (e.g.,ENSG00000141510), or NCBI Gene ID - •Species list — extract species names. Convert common names to Ensembl species names:
- •human → homo_sapiens
- •mouse → mus_musculus
- •rat → rattus_norvegicus
- •zebrafish → danio_rerio
- •chicken → gallus_gallus
- •frog → xenopus_tropicalis
- •fly / fruit fly → drosophila_melanogaster
- •worm → caenorhabditis_elegans
- •dog → canis_lupus_familiaris
- •cat → felis_catus
- •pig → sus_scrofa
- •cow → bos_taurus
If the user says "across vertebrates" or similar, use: human, mouse, zebrafish, chicken (4 representative species). If no species are specified, default to: human, mouse, zebrafish.
Data Gathering Steps
1. Reference Gene Information
- •If the input is a gene symbol, call
ensembl_lookup_genewith the symbol and specieshomo_sapiens(or the first species listed) to get the Ensembl gene ID. - •Call
datasets_summary_genewith the gene symbol (taxon: human) for NCBI gene metadata (full name, summary). - •Note the reference Ensembl gene ID for the next step.
2. Find Orthologs
- •Call
ensembl_get_homologswith the reference Ensembl gene ID,homology_type: "orthologues". - •From the results, extract the ortholog Ensembl gene IDs for each of the requested target species.
- •If a requested species has no ortholog in the results, note it as "No ortholog found."
- •Record the percent identity values reported by Ensembl for each ortholog pair.
3. Retrieve Protein Sequences
For the reference gene and each ortholog found:
- •Call
ensembl_get_sequencewith the Ensembl gene ID,seq_type: "protein",format: "json". - •Store the protein sequence and its length.
If a protein sequence is not available for a gene ID, try the canonical transcript ID instead.
4. Pairwise Alignments
For each ortholog protein sequence, align it against the reference (human) protein:
- •Call
sequence_alignwith the two protein sequences,sequence_type: "protein",mode: "global". - •Record: alignment score, percent identity, gap count, alignment length.
If there are 3+ species, also consider one key pairwise comparison between distant species (e.g., mouse vs zebrafish) to show the range of divergence.
5. Sequence Statistics (Optional)
For the reference protein:
- •Call
sequence_statswith the protein sequence to get molecular weight, amino acid composition. - •Note any unusual composition differences across species if evident from the alignments.
6. Domain Conservation Check (Optional)
- •Call
interpro_get_domainswith the UniProt accession (if known) or look up viauniprot_searchfor the reference gene. - •Note which key functional domains exist — these regions are expected to be highly conserved.
Report Format
Present the analysis as a structured comparative genomics report:
# Comparative Genomics Report: [GENE SYMBOL] ## Gene Overview - Full name, function summary (from NCBI/Ensembl) - Reference species and Ensembl gene ID - Number of species analyzed ## Ortholog Summary | Species | Ensembl Gene ID | Protein Length | % Identity to [Reference] | % Positives | |---------|-----------------|----------------|---------------------------|-------------| | Human (reference) | ENSG... | 393 aa | — | — | | Mouse | ENSMUSG... | 390 aa | 77.8% | 85.2% | | Zebrafish | ENSDARG... | 373 aa | 52.1% | 66.3% | ## Pairwise Alignments For each species pair aligned: - **[Reference] vs [Species]**: X% identity, Y gaps, alignment length Z - Key observations: conserved regions, notable insertions/deletions ## Conservation Analysis Summarize the overall conservation pattern: - Which regions are most conserved (relate to known domains if domain data was retrieved) - Which regions show the most divergence - Overall trend: is this gene highly conserved, moderately conserved, or rapidly evolving? - Note any species-specific insertions or deletions ## Functional Domain Context If domain data was retrieved: - List key domains with positions - Note whether these domains span the conserved regions ## Evolutionary Insights Brief interpretation: - What does the conservation pattern suggest about functional constraints? - Are there species-specific adaptations visible in the sequence differences? - How does the conservation level compare to expectations for this gene family? ## Data Sources List which databases were queried and whether each returned data successfully for each species.
Keep the report factual — only include data returned by the tools. Do not hallucinate sequences, identity scores, or ortholog relationships. If alignment data is unavailable for a species, note "Alignment not performed — sequence unavailable."