AgentSkillsCN

langgraph-testing

使用 pytest 为 LangGraph 的智能体与图结构编写测试用例。适用于状态机、图节点、智能体流程,或部分执行路径的测试场景。

SKILL.md
--- frontmatter
name: langgraph-testing
description: Writes tests for LangGraph agents and graphs with pytest. Use when testing state machines, graph nodes, agent flows, or partial execution paths.

LangGraph Testing with Pytest

Core principle: Test graph behavior at multiple levels — individual nodes, partial flows, and complete executions. Use MemorySaver for test isolation.

"Create your graph before each test where you use it, then compile it within tests with a new checkpointer instance." — LangGraph Docs

Why this matters: LangGraph agents are stateful. Each test needs isolated state to prevent cross-contamination. The checkpointer enables partial execution testing and state inspection.

Testing Levels

LevelWhat to TestTool
NodeIndividual node logicgraph.nodes["name"].invoke()
PartialSubset of graph flowupdate_state() + interrupt_after
End-to-EndComplete graph executiongraph.invoke() with thread_id
IntegrationReal LLMs with recordingpytest-recording / vcrpy

Graph Factory Pattern

Always create graphs fresh in tests:

python
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import TypedDict

def create_graph() -> StateGraph:
    class MyState(TypedDict):
        messages: list
        status: str

    graph = StateGraph(MyState)
    graph.add_node("process", process_node)
    graph.add_node("validate", validate_node)
    graph.add_edge(START, "process")
    graph.add_edge("process", "validate")
    graph.add_edge("validate", END)
    return graph

def test_graph_execution():
    checkpointer = MemorySaver()  # Fresh checkpointer per test
    graph = create_graph()
    compiled = graph.compile(checkpointer=checkpointer)

    result = compiled.invoke(
        {"messages": [], "status": "pending"},
        config={"configurable": {"thread_id": "test-1"}}
    )
    assert result["status"] == "completed"

When to Use Each Test Type

code
Testing individual node logic? → Node test (bypasses checkpointer)
Testing specific subflow? → Partial execution test
Testing complete workflow? → End-to-end test
Testing with real LLM? → Integration test with recording

Mocking Strategy

Default: Use GenericFakeChatModel for deterministic tests

Only use real LLMs:

  • Integration/E2E tests with HTTP recording
  • Validating prompt changes

Mock utilities:

  • GenericFakeChatModel — Mock text/tool responses
  • pytest-recording — Record/replay HTTP calls
  • MemorySaver — In-memory checkpointer for tests

Test Configuration

toml
# pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
testpaths = ["tests"]

Anti-Patterns

PatternFix
Shared checkpointer across testsCreate fresh MemorySaver per test
Hardcoded thread_idsUse unique IDs or parametrize
Testing node via full graphUse graph.nodes["name"].invoke()
No interrupt points in flow testUse interrupt_after for partial tests
Real LLM calls without recordingAdd @pytest.mark.vcr() decorator

Quality Checklist

  • Graph created fresh per test (factory pattern)
  • MemorySaver used for test checkpointing
  • Thread IDs unique per test
  • LLM responses mocked or recorded
  • Node tests isolated from graph flow
  • State transitions verified
  • Error conditions handled

Language-Specific Patterns


Remember: Isolated state. Fresh graphs. Test at the right level.