Perform a multi-species comparative genomics analysis for: $ARGUMENTS

Use the MCP tools available to you to find orthologs, retrieve sequences, align them, and report on conservation. Follow the steps below in order. If a step fails for a particular species, note the gap and continue with the remaining species.

Input Parsing

Parse the user input to identify:

•Gene identifier — a gene symbol (e.g., TP53), Ensembl gene ID (e.g., ENSG00000141510), or NCBI Gene ID
•
Species list — extract species names. Convert common names to Ensembl species names:
- •human → homo_sapiens
- •mouse → mus_musculus
- •rat → rattus_norvegicus
- •zebrafish → danio_rerio
- •chicken → gallus_gallus
- •frog → xenopus_tropicalis
- •fly / fruit fly → drosophila_melanogaster
- •worm → caenorhabditis_elegans
- •dog → canis_lupus_familiaris
- •cat → felis_catus
- •pig → sus_scrofa
- •cow → bos_taurus

If the user says "across vertebrates" or similar, use: human, mouse, zebrafish, chicken (4 representative species). If no species are specified, default to: human, mouse, zebrafish.

Data Gathering Steps

1. Reference Gene Information

•If the input is a gene symbol, call ensembl_lookup_gene with the symbol and species homo_sapiens (or the first species listed) to get the Ensembl gene ID.
•Call datasets_summary_gene with the gene symbol (taxon: human) for NCBI gene metadata (full name, summary).
•Note the reference Ensembl gene ID for the next step.

2. Find Orthologs

•Call ensembl_get_homologs with the reference Ensembl gene ID, homology_type: "orthologues".
•From the results, extract the ortholog Ensembl gene IDs for each of the requested target species.
•If a requested species has no ortholog in the results, note it as "No ortholog found."
•Record the percent identity values reported by Ensembl for each ortholog pair.

3. Retrieve Protein Sequences

For the reference gene and each ortholog found:

•Call ensembl_get_sequence with the Ensembl gene ID, seq_type: "protein", format: "json".
•Store the protein sequence and its length.

If a protein sequence is not available for a gene ID, try the canonical transcript ID instead.

4. Pairwise Alignments

For each ortholog protein sequence, align it against the reference (human) protein:

•Call sequence_align with the two protein sequences, sequence_type: "protein", mode: "global".
•Record: alignment score, percent identity, gap count, alignment length.

If there are 3+ species, also consider one key pairwise comparison between distant species (e.g., mouse vs zebrafish) to show the range of divergence.

5. Sequence Statistics (Optional)

For the reference protein:

•Call sequence_stats with the protein sequence to get molecular weight, amino acid composition.
•Note any unusual composition differences across species if evident from the alignments.

6. Domain Conservation Check (Optional)

•Call interpro_get_domains with the UniProt accession (if known) or look up via uniprot_search for the reference gene.
•Note which key functional domains exist — these regions are expected to be highly conserved.

Report Format

Present the analysis as a structured comparative genomics report:

code

# Comparative Genomics Report: [GENE SYMBOL]

## Gene Overview
- Full name, function summary (from NCBI/Ensembl)
- Reference species and Ensembl gene ID
- Number of species analyzed

## Ortholog Summary

| Species | Ensembl Gene ID | Protein Length | % Identity to [Reference] | % Positives |
|---------|-----------------|----------------|---------------------------|-------------|
| Human (reference) | ENSG... | 393 aa | — | — |
| Mouse | ENSMUSG... | 390 aa | 77.8% | 85.2% |
| Zebrafish | ENSDARG... | 373 aa | 52.1% | 66.3% |

## Pairwise Alignments

For each species pair aligned:
- **[Reference] vs [Species]**: X% identity, Y gaps, alignment length Z
- Key observations: conserved regions, notable insertions/deletions

## Conservation Analysis

Summarize the overall conservation pattern:
- Which regions are most conserved (relate to known domains if domain data was retrieved)
- Which regions show the most divergence
- Overall trend: is this gene highly conserved, moderately conserved, or rapidly evolving?
- Note any species-specific insertions or deletions

## Functional Domain Context
If domain data was retrieved:
- List key domains with positions
- Note whether these domains span the conserved regions

## Evolutionary Insights
Brief interpretation:
- What does the conservation pattern suggest about functional constraints?
- Are there species-specific adaptations visible in the sequence differences?
- How does the conservation level compare to expectations for this gene family?

## Data Sources
List which databases were queried and whether each returned data successfully for each species.

Keep the report factual — only include data returned by the tools. Do not hallucinate sequences, identity scores, or ortholog relationships. If alignment data is unavailable for a species, note "Alignment not performed — sequence unavailable."