LangGraph Testing with Pytest
Core principle: Test graph behavior at multiple levels — individual nodes, partial flows, and complete executions. Use MemorySaver for test isolation.
"Create your graph before each test where you use it, then compile it within tests with a new checkpointer instance." — LangGraph Docs
Why this matters: LangGraph agents are stateful. Each test needs isolated state to prevent cross-contamination. The checkpointer enables partial execution testing and state inspection.
Testing Levels
| Level | What to Test | Tool |
|---|---|---|
| Node | Individual node logic | graph.nodes["name"].invoke() |
| Partial | Subset of graph flow | update_state() + interrupt_after |
| End-to-End | Complete graph execution | graph.invoke() with thread_id |
| Integration | Real LLMs with recording | pytest-recording / vcrpy |
Graph Factory Pattern
Always create graphs fresh in tests:
python
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import TypedDict
def create_graph() -> StateGraph:
class MyState(TypedDict):
messages: list
status: str
graph = StateGraph(MyState)
graph.add_node("process", process_node)
graph.add_node("validate", validate_node)
graph.add_edge(START, "process")
graph.add_edge("process", "validate")
graph.add_edge("validate", END)
return graph
def test_graph_execution():
checkpointer = MemorySaver() # Fresh checkpointer per test
graph = create_graph()
compiled = graph.compile(checkpointer=checkpointer)
result = compiled.invoke(
{"messages": [], "status": "pending"},
config={"configurable": {"thread_id": "test-1"}}
)
assert result["status"] == "completed"
When to Use Each Test Type
code
Testing individual node logic? → Node test (bypasses checkpointer) Testing specific subflow? → Partial execution test Testing complete workflow? → End-to-end test Testing with real LLM? → Integration test with recording
Mocking Strategy
Default: Use GenericFakeChatModel for deterministic tests
Only use real LLMs:
- •Integration/E2E tests with HTTP recording
- •Validating prompt changes
Mock utilities:
- •
GenericFakeChatModel— Mock text/tool responses - •
pytest-recording— Record/replay HTTP calls - •
MemorySaver— In-memory checkpointer for tests
Test Configuration
toml
# pyproject.toml [tool.pytest.ini_options] asyncio_mode = "auto" asyncio_default_fixture_loop_scope = "function" testpaths = ["tests"]
Anti-Patterns
| Pattern | Fix |
|---|---|
| Shared checkpointer across tests | Create fresh MemorySaver per test |
| Hardcoded thread_ids | Use unique IDs or parametrize |
| Testing node via full graph | Use graph.nodes["name"].invoke() |
| No interrupt points in flow test | Use interrupt_after for partial tests |
| Real LLM calls without recording | Add @pytest.mark.vcr() decorator |
Quality Checklist
- • Graph created fresh per test (factory pattern)
- • MemorySaver used for test checkpointing
- • Thread IDs unique per test
- • LLM responses mocked or recorded
- • Node tests isolated from graph flow
- • State transitions verified
- • Error conditions handled
Language-Specific Patterns
- •Python: See references/python.md
Remember: Isolated state. Fresh graphs. Test at the right level.