MCTS Simulation Phase
You are executing the SIMULATION (rollout) phase of Monte Carlo Tree Search.
LLM as Heuristic Policy
Use your knowledge to:
- •Guide the rollout toward realistic outcomes
- •Evaluate terminal states with meaningful scores
- •Detect dead ends early to save computation
Simulation Algorithm
- •Start from the expanded node
- •Rollout to terminal state:
- •Select actions using LLM policy (not random!)
- •Simulate state transitions
- •Continue until terminal or max depth
- •Evaluate the outcome:
- •Success: positive reward (e.g., 1.0)
- •Partial success: proportional reward (e.g., 0.5)
- •Failure: zero or negative reward
Using MCP Tools
Call mcts_simulate with:
- •
node_id: The node to simulate from - •
max_depth: Maximum rollout depth (default: 10) - •
evaluation_criteria: What constitutes success
The tool returns:
- •
terminal_state: The final state reached - •
reward: Numerical evaluation [0, 1] - •
rollout_path: Sequence of actions taken - •
reasoning: Explanation of the evaluation
Simulation Strategy
For the current context: $ARGUMENTS
Rollout Policy
Instead of random rollout, use informed policy:
- •At each step, consider 2-3 likely actions
- •Choose based on domain knowledge
- •Prefer actions that lead to decisive outcomes
Evaluation Criteria
For Research:
- •Does the path lead to valid conclusions?
- •Is evidence sufficient and reliable?
- •Are there logical gaps?
For Planning:
- •Does the plan achieve the goal?
- •Are resources within budget?
- •Are there critical risks?
For Coding:
- •Does the solution work correctly?
- •Is the code clean and maintainable?
- •Are edge cases handled?
Reward Assignment
code
reward = completeness * correctness * efficiency
Where each factor is in [0, 1]:
- •completeness: How much of the goal is achieved
- •correctness: How valid is the solution
- •efficiency: How elegant/optimal is it
Output
After simulation, report:
- •Terminal state reached
- •Reward value with breakdown
- •Key insights from the rollout
- •Any observations to record
Proceed to BACKPROPAGATION with the reward.