AgentSkillsCN

research-plan

通过文献综述与数据可行性评估,对研究问题进行细化与优化,并最终形成结构化的研究计划。当用户已有研究假设或初步研究构想,希望在正式开展分析前将其完善为完整的研究方案时,可使用此技能。

SKILL.md
--- frontmatter
name: research-plan
description: Refine a research question through literature review and data feasibility checks, then produce a structured research plan. Use when the user has a hypothesis or research idea and wants to develop it into a full plan before analysis.
allowed-tools: Bash, Read, Write, WebSearch, AskUserQuestion
user-invocable: true

Research Plan Skill

Take a research question (from /hypothesis, user text, or docs/research_ideas.md), refine it through literature review and data feasibility checks, and produce a structured research plan document.

Workflow

Step 1: Accept Input

Identify the research question source:

  • If invoked after /hypothesis: use the generated hypothesis
  • If the user provides a question directly: use that
  • If neither: read docs/research_ideas.md and present PROPOSED ideas for the user to choose

Confirm you have:

  • A research question
  • A tentative hypothesis (H0 and H1)
  • A target organism, pathway, or data type

If any of these are missing, ask the user.

Step 2: Literature Check

Invoke /literature-review internally to search for existing work:

  1. Search for the specific research question / hypothesis
  2. Identify: prior results, methods used, organisms studied, gaps
  3. Present findings to the user: "Here's what's already known about this topic..."
  4. Store references in the project's references.md (created by /literature-review)

Step 3: Interactive Refinement Loop

This is the key differentiator — iterate with the user based on what the literature reveals:

  1. Present the literature context and ask: "Given what's already known, do you want to refine the hypothesis?"
  2. Offer concrete options:
    • Narrow scope: Focus on a specific organism, phylum, or gene category
    • Change organism: Switch to a species with better data coverage in BERDL
    • Adjust approach: Use a different statistical method or comparison
    • Pivot question: The literature reveals a more interesting gap to address
    • Proceed as-is: The original hypothesis is still novel and testable
  3. If the user refines, run additional targeted literature searches as needed
  4. Allow 1-3 iterations until the user is satisfied

Step 4: Data Feasibility Check

Verify the hypothesis can actually be tested with BERDL data:

  1. Table verification: Use the /berdl REST API (read-only) to confirm:
    • The required tables exist and have the expected columns
    • Use the schema endpoint to check column names and types
  2. Coverage check: Query row counts for the relevant tables:
    • How many species/genomes are available?
    • What fraction have the needed annotations? (e.g., "28% of genomes have environmental embeddings")
  3. Pitfall scan: Read docs/pitfalls.md and docs/performance.md for known issues with the target tables
  4. Performance tier: Estimate whether the analysis can be done via REST API or requires JupyterHub:
Expected ScaleTierRecommendation
< 100K rowsREST APIDirect queries, .toPandas() OK
100K – 10M rowsMixedFilter/aggregate in SQL, small results via REST
> 10M rowsJupyterHub onlyPySpark DataFrames, no .toPandas()

Present the feasibility summary to the user. If the data doesn't support the hypothesis, suggest alternatives.

Step 5: Produce Research Plan Document

Generate projects/{project_id}/research_plan.md:

markdown
# Research Plan: {Title}

## Research Question
{Refined question after literature review}

## Hypothesis
- **H0**: {Null hypothesis}
- **H1**: {Alternative hypothesis}

## Literature Context
{Summary of what's known, key references, identified gaps}
{Full references stored in projects/{id}/references.md}

## Query Strategy

### Tables Required
| Table | Purpose | Estimated Rows | Filter Strategy |
|---|---|---|---|
| {table} | {why needed} | {count} | {how to filter} |

### Key Queries
1. **{Description}**:
```sql
{query}
  1. ...

Performance Plan

  • Tier: {REST API / JupyterHub}
  • Estimated complexity: {simple / moderate / complex}
  • Known pitfalls: {list from pitfalls.md}

Analysis Plan

Notebook 1: Data Exploration

  • Goal: {what to verify/explore}
  • Expected output: {CSV/figures}

Notebook 2: Main Analysis

  • Goal: {core analysis}
  • Expected output: {CSV/figures}

Notebook 3: Visualization (if needed)

  • Goal: {figures for findings}

Expected Outcomes

  • If H1 supported: {interpretation}
  • If H0 not rejected: {interpretation}
  • Potential confounders: {list}

Authors

{from user or carried forward from /hypothesis}

code

### Step 6: Create Project Directory Structure

Create the project directory with initial files:

projects/{project_id}/ ├── research_plan.md # The plan document from Step 5 ├── references.md # Created by /literature-review in Step 2 ├── README.md # Skeleton with question/hypothesis filled in ├── notebooks/ # Empty, populated by /notebook ├── data/ # Empty, populated during analysis └── figures/ # Empty, populated during analysis

code

Generate a skeleton `README.md` following the structure of existing projects (see `projects/pangenome_openness/README.md` for reference):

```markdown
# {Title}

## Research Question
{Refined question}

## Hypothesis
{H0 and H1}

## Approach
{High-level approach from the research plan}

## Data Sources
- **Database**: {database name} on BERDL Delta Lakehouse
- **Tables**: {list of tables with brief descriptions}

## Key Findings
*TBD — run notebooks and use `/synthesize` to complete this section.*

## Notebooks
| Notebook | Purpose |
|----------|---------|
| *To be generated by `/notebook`* | |

## Visualizations
| Figure | Description |
|--------|-------------|
| *TBD* | |

## Data Files
| File | Description |
|------|-------------|
| *TBD* | |

## Related Projects
{Any prior projects this builds on}

## Authors
{Authors}

## Future Directions
*TBD — use `/synthesize` to complete this section.*

Step 7: Suggest Next Steps

After creating the plan, tell the user:

"Research plan created at projects/{project_id}/research_plan.md. Next steps:

  1. Use /notebook to generate analysis notebooks from this plan
  2. Upload notebooks to BERDL JupyterHub and run them
  3. Use /synthesize to interpret results and draft findings"

Integration

  • Reads from: /hypothesis output, docs/research_ideas.md, docs/pitfalls.md, docs/performance.md
  • Calls: /literature-review (for literature search), /berdl (read-only schema/count checks)
  • Produces: research_plan.md, skeleton README.md, project directory structure
  • Consumed by: /notebook (reads research_plan.md to generate notebooks)

Pitfall Detection

When you encounter errors, unexpected results, retry cycles, performance issues, or data surprises during this task, follow the pitfall-capture protocol. Read .claude/skills/pitfall-capture/SKILL.md and follow its instructions to determine whether the issue should be added to docs/pitfalls.md.