cuOpt Server Skill
Prerequisites: Read
cuopt-user-rules/SKILL.mdfirst for behavior rules.
Deploy and use the cuOpt REST server for production optimization workloads.
Before You Start: Required Questions
Ask these if not already clear:
- •
Problem type?
- •Routing (VRP/TSP/PDP)?
- •LP/MILP?
- •(Note: QP not supported via REST)
- •
Deployment target?
- •Local development?
- •Docker/Kubernetes?
- •Cloud service?
- •
Client preference?
- •curl (quick testing)
- •Python requests
- •cuopt-sh-client library
Server Capabilities
| Problem Type | REST Support |
|---|---|
| Routing | ✓ |
| LP | ✓ |
| MILP | ✓ |
| QP | ✗ |
Starting the Server
Direct (Development)
bash
python -m cuopt_server.cuopt_service --ip 0.0.0.0 --port 8000
Docker (Production)
bash
docker run --gpus all -d \ -p 8000:8000 \ -e CUOPT_SERVER_PORT=8000 \ --name cuopt-server \ nvidia/cuopt:latest-cuda12.9-py3.13
Verify Running
bash
curl http://localhost:8000/cuopt/health
# Expected: {"status": "healthy"}
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/cuopt/health | GET | Health check |
/cuopt/request | POST | Submit optimization request |
/cuopt/solution/{reqId} | GET | Get solution by request ID |
/cuopt.yaml | GET | OpenAPI specification |
/cuopt/docs | GET | Swagger UI |
Workflow
- •POST problem to
/cuopt/request→ getreqId - •Poll
/cuopt/solution/{reqId}until solution ready - •Parse response
Routing Request Example
curl
bash
REQID=$(curl -s -X POST "http://localhost:8000/cuopt/request" \
-H "Content-Type: application/json" \
-H "CLIENT-VERSION: custom" \
-d '{
"cost_matrix_data": {
"data": {"0": [[0,10,15],[10,0,12],[15,12,0]]}
},
"travel_time_matrix_data": {
"data": {"0": [[0,10,15],[10,0,12],[15,12,0]]}
},
"task_data": {
"task_locations": [1, 2],
"demand": [[10, 20]],
"task_time_windows": [[0, 100], [0, 100]],
"service_times": [5, 5]
},
"fleet_data": {
"vehicle_locations": [[0, 0]],
"capacities": [[50]],
"vehicle_time_windows": [[0, 200]]
},
"solver_config": {"time_limit": 5}
}' | jq -r '.reqId')
echo "Request ID: $REQID"
# Poll for solution
sleep 2
curl -s "http://localhost:8000/cuopt/solution/$REQID" \
-H "CLIENT-VERSION: custom" | jq .
Python
python
import requests
import time
SERVER = "http://localhost:8000"
HEADERS = {"Content-Type": "application/json", "CLIENT-VERSION": "custom"}
payload = {
"cost_matrix_data": {
"data": {"0": [[0,10,15],[10,0,12],[15,12,0]]}
},
"travel_time_matrix_data": {
"data": {"0": [[0,10,15],[10,0,12],[15,12,0]]}
},
"task_data": {
"task_locations": [1, 2],
"demand": [[10, 20]],
"task_time_windows": [[0, 100], [0, 100]], # optional
"service_times": [5, 5] # optional
},
"fleet_data": {
"vehicle_locations": [[0, 0]],
"capacities": [[50]],
"vehicle_time_windows": [[0, 200]] # optional
},
"solver_config": {"time_limit": 5}
}
# Submit
resp = requests.post(f"{SERVER}/cuopt/request", json=payload, headers=HEADERS)
req_id = resp.json()["reqId"]
# Poll
for _ in range(30):
resp = requests.get(f"{SERVER}/cuopt/solution/{req_id}", headers=HEADERS)
result = resp.json()
if "response" in result:
print(result["response"]["solver_response"])
break
time.sleep(1)
LP/MILP Request Example
bash
curl -s -X POST "http://localhost:8000/cuopt/request" \
-H "Content-Type: application/json" \
-H "CLIENT-VERSION: custom" \
-d '{
"csr_constraint_matrix": {
"offsets": [0, 2, 4],
"indices": [0, 1, 0, 1],
"values": [2.0, 3.0, 4.0, 2.0]
},
"constraint_bounds": {
"upper_bounds": [240.0, 200.0],
"lower_bounds": ["ninf", "ninf"]
},
"objective_data": {
"coefficients": [40.0, 30.0],
"scalability_factor": 1.0,
"offset": 0.0
},
"variable_bounds": {
"upper_bounds": ["inf", "inf"],
"lower_bounds": [0.0, 0.0]
},
"maximize": true,
"solver_config": {"time_limit": 60}
}'
Terminology: REST vs Python API
CRITICAL: REST API uses different terminology than Python API.
| Concept | Python API | REST API |
|---|---|---|
| Orders/Jobs | order_locations | task_locations |
| Time windows | set_order_time_windows() | task_time_windows |
| Service times | set_order_service_times() | service_times |
| Transit matrix | add_transit_time_matrix() | travel_time_matrix_data |
Common Payload Mistakes
Wrong field names
json
// ❌ WRONG
"transit_time_matrix_data": {...}
// ✅ CORRECT
"travel_time_matrix_data": {...}
Wrong capacity format
json
// ❌ WRONG - per vehicle "capacities": [[50], [50]] // ✅ CORRECT - per dimension across all vehicles "capacities": [[50, 50]]
Missing required fields
Routing requires at minimum:
- •
cost_matrix_data - •
task_data.task_locations - •
fleet_data.vehicle_locations - •
fleet_data.capacities
Response Structure
Routing Success
json
{
"reqId": "abc123",
"response": {
"solver_response": {
"status": 0,
"solution_cost": 45.0,
"vehicle_data": {
"0": {"route": [0, 1, 2, 0], "arrival_times": [...]}
}
}
}
}
LP/MILP Success
json
{
"reqId": "abc123",
"response": {
"status": "Optimal",
"objective_value": 1600.0,
"primal_solution": [30.0, 60.0]
}
}
Error Handling
422 Validation Error
Check the error message for field issues:
bash
curl ... | jq '.error'
Compare against OpenAPI spec at /cuopt.yaml
500 Server Error
- •Check server logs
- •Capture
reqIdfor debugging - •Try with smaller problem
Polling Returns Empty
- •Solution still computing - keep polling
- •Check
solver_config.time_limit
Server Configuration
Environment Variables
bash
CUOPT_SERVER_PORT=8000 CUOPT_SERVER_HOST=0.0.0.0
Command Line Options
bash
python -m cuopt_server.cuopt_service \ --ip 0.0.0.0 \ --port 8000 \ --workers 4
Production Considerations
Health Checks
bash
# Kubernetes liveness probe curl -f http://localhost:8000/cuopt/health # Readiness check curl -f http://localhost:8000/cuopt/health
Resource Limits
yaml
# Kubernetes example
resources:
limits:
nvidia.com/gpu: 1
memory: "32Gi"
requests:
memory: "16Gi"
Scaling
- •GPU is the bottleneck - one server per GPU
- •Use load balancer for multiple GPUs
- •Queue requests to avoid overwhelming
OpenAPI Specification
Full API spec available at:
- •Runtime:
http://localhost:8000/cuopt.yaml - •Source:
docs/cuopt/source/cuopt_spec.yaml - •Swagger UI:
http://localhost:8000/cuopt/docs
Examples
See resources/ for complete examples:
- •Routing examples — VRP, PDP via REST
- •LP/MILP examples — Linear programming via REST
When to Escalate
Switch to cuopt-debugging if:
- •Consistent 5xx errors
- •Unexpected solution results
Switch to cuopt-developer if:
- •Need to modify server behavior
- •Need new endpoints