TCGA Survival Analysis
Perform survival analysis on TCGA data using cBioPortal API + lifelines.
Step 1: Environment Check
Before starting the analysis, verify that required dependencies are available:
bash
# Check required packages
python -c "import pandas, numpy, matplotlib, requests, lifelines; print('All dependencies OK')"
Dependency check strategy (by priority):
- •Check the currently activated environment (or user-specified environment)
- •Missing packages → Install with
uv pip installfirst - •If uv unavailable → Use
pip install - •If installation fails or conflicts → Only then create a new conda environment
Required dependencies:
- •pandas
- •numpy
- •matplotlib
- •requests
- •lifelines
To install missing packages:
bash
# Prefer uv (faster) uv pip install lifelines # Or use pip pip install lifelines
Step 2: Quick Start
python
import pandas as pd import requests from lifelines import KaplanMeierFitter from lifelines.statistics import logrank_test # cBioPortal API CBIOPORTAL_API = "https://www.cbioportal.org/api" study_id = "luad_tcga_pan_can_atlas_2018" # LUAD Pan-Cancer Atlas # Fetch EGFR expression data # ... (see workflow.md for details) # Perform Kaplan-Meier analysis kmf = KaplanMeierFitter() kmf.fit(time, event, label='Gene High') kmf.plot_survival_function()
Workflow
See workflow.md for detailed workflow, including:
- •Environment check - Verify dependencies, install as needed
- •Data download - Fetch expression and clinical data from cBioPortal
- •Data processing - Merge expression and survival data
- •Survival analysis - Kaplan-Meier analysis and Log-rank test
- •Visualization - Survival curves and expression distribution plots
Key Functions
| Module | Function |
|---|---|
requests | cBioPortal API data retrieval |
lifelines | Kaplan-Meier and Cox regression |
pandas | Data processing and merging |
Supported Cancer Types
Common cBioPortal TCGA study IDs:
- •
luad_tcga_pan_can_atlas_2018- Lung Adenocarcinoma (LUAD) - •
lusc_tcga_pan_can_atlas_2018- Lung Squamous Cell Carcinoma (LUSC) - •
brca_tcga_pan_can_atlas_2018- Breast Cancer (BRCA) - •
coadread_tcga_pan_can_atlas_2018- Colorectal Cancer
Troubleshooting
- •Network issues: cBioPortal API requires stable network connection
- •Gene names: Use Entrez Gene ID (e.g., EGFR = 1956)
- •Survival data: Check if OS_MONTHS and OS_STATUS columns exist