AgentSkillsCN

napi-interop

在 duroxide-node 中构建 Rust↔JS 互操作架构。当您修改 napi 桥接、添加 ScheduledTask 类型、修复跨线程问题、调整追踪委托,或调试 block_in_place / ThreadsafeFunction 行为时,均可使用此技能。

SKILL.md
--- frontmatter
name: napi-interop
description: Rust↔JS interop architecture in duroxide-node. Use when modifying the napi bridge, adding ScheduledTask types, fixing cross-thread issues, changing tracing delegation, or debugging block_in_place / ThreadsafeFunction behavior.

napi-rs Interop Architecture

Overview

duroxide-node bridges Rust's duroxide runtime to Node.js via napi-rs. The interop has two distinct paths — orchestrations (generator-based, synchronous blocking) and activities (async Promise-based). Getting this wrong causes silent replay corruption, deadlocks, or dropped futures.

File Map

FileRole
src/handlers.rsCore interop — orchestration handler loop, activity invocation, global context maps, select/race/join, activity cancellation
src/types.rsScheduledTask enum — the protocol between JS and Rust
src/lib.rsnapi entry point, #[napi] trace functions
src/runtime.rsJsRuntime — wraps duroxide::Runtime, registers handlers
src/client.rsJsClient — wraps duroxide::Client
src/provider.rsJsSqliteProvider
src/pg_provider.rsJsPostgresProvider
lib/duroxide.jsJS generator driver, OrchestrationContext, ActivityContext

Orchestration Interop (Blocking Generator Loop)

The replay engine calls poll_once() on the handler future. If the future isn't ready in one poll, it's dropped. This means call_async (which returns a future awaiting a JS callback) would be dropped before the callback fires.

Solution: block_in_place + block_on

rust
fn call_create_blocking(&self, payload: String) -> Result<GeneratorStepResult, String> {
    let create_fn = self.create_fn.clone();
    tokio::task::block_in_place(|| {
        tokio::runtime::Handle::current().block_on(async {
            create_fn.call_async::<String>(payload).await
        })
    })?;
}

This blocks the tokio thread synchronously while waiting for the JS callback to complete on the Node event loop thread. block_in_place tells tokio the thread is doing blocking work.

Orchestration Handler Sequence

code
Rust (tokio thread)                         JS (Node event loop)
───────────────────                         ────────────────────
1. invoke(ctx, input)
   ├─ Store ctx in ORCHESTRATION_CTXS[instance_id]
   ├─ call_create_blocking(payload) ──────► createGenerator(payload)
   │                                         ├─ Create OrchestrationContext
   │                                         ├─ Create generator: fn(ctx, input)
   │                                         ├─ gen.next() → first yield
   │                                         └─ Return { status: 'yielded', task }
   │◄────────────────────────────────────────┘
   ├─ Loop:
   │   ├─ execute_task(ctx, task)           // Real DurableFuture or replay
   │   ├─ call_next_blocking(result) ──────► nextStep(result)
   │   │                                     ├─ gen.next(value) or gen.throw(err)
   │   │                                     └─ Return next task or completion
   │   │◄────────────────────────────────────┘
   │   └─ If completed/error: break
   └─ Remove ctx from ORCHESTRATION_CTXS

Key Rules for Orchestration Interop

  1. Always use call_*_blocking methods for JS calls from the orchestration handler — never call_async().await
  2. Store ctx in ORCHESTRATION_CTXS before calling JS — JS tracing needs it immediately
  3. Remove ctx from ORCHESTRATION_CTXS on ALL exit paths (success, error, and early return)
  4. Call dispose_fn on completion to clean up the JS generator

Activity Interop (Async Promise)

Activities are simpler — they use normal call_async with a two-phase await:

rust
let result: String = self.callback
    .call_async::<napi::bindgen_prelude::Promise<String>>(payload)
    .await  // Phase 1: get the Promise object
    .await  // Phase 2: resolve the Promise

Activities are NOT dropped by poll_once() — they run to completion on the worker dispatcher.

Activity Handler Sequence

code
Rust                                        JS
────                                        ──
invoke(ctx, input)
  ├─ Generate unique token (act-0, act-1, ...)
  ├─ Store ctx in ACTIVITY_CTXS[token]
  ├─ call_async(payload).await.await ──────► wrappedFn(payload)
  │                                          ├─ Parse ctx, create ActivityContext
  │                                          ├─ Call user's async function
  │                                          └─ Return JSON result
  │◄─────────────────────────────────────────┘
  └─ Remove token from ACTIVITY_CTXS

Cross-Thread Tracing

JS callbacks run on the Node event loop thread. Rust contexts live on tokio threads. Thread-locals don't cross this boundary.

Solution: Global HashMaps protected by Mutex

rust
// Activity contexts — keyed by atomic token (unique per invocation)
static ACTIVITY_CTXS: LazyLock<Mutex<HashMap<String, ActivityContext>>>

// Orchestration contexts — keyed by instance_id
static ORCHESTRATION_CTXS: LazyLock<Mutex<HashMap<String, OrchestrationContext>>>

JS calls napi functions that look up the Rust context:

javascript
// In OrchestrationContext (fire-and-forget, no yield)
traceInfo(message) {
    orchestrationTraceLog(this.instanceId, 'info', String(message));
}

// In ActivityContext (fire-and-forget)
traceInfo(message) {
    activityTraceLog(this._traceToken, 'info', String(message));
}

Rust napi functions delegate to the stored context:

rust
#[napi]
pub fn orchestration_trace_log(instance_id: String, level: String, message: String) {
    handlers::orchestration_trace(&instance_id, &level, &message);
    // → ORCHESTRATION_CTXS.get(instance_id).trace(level, message)
    //   which internally checks is_replaying
}

Rules for Tracing

  1. Never expose is_replaying to JS — the Rust OrchestrationContext.trace() handles suppression
  2. Always use global maps, not thread-locals — JS runs on a different thread
  3. Clean up map entries on ALL exit paths — leaked entries cause stale traces
  4. Use atomic tokens for activities (not instance_id) — multiple activities for the same instance can run concurrently

ScheduledTask Protocol

JS yields plain objects. Rust deserializes them via serde_json into ScheduledTask enum variants:

rust
#[derive(Deserialize)]
#[serde(tag = "type", rename_all = "camelCase")]
pub enum ScheduledTask {
    Activity { name: String, input: String },
    ActivityWithRetry { name: String, input: String, retry: RetryPolicyConfig },
    Timer { delay_ms: u64 },
    #[serde(rename_all = "camelCase")]
    WaitEvent { name: String },
    SubOrchestration { name: String, input: String },
    SubOrchestrationWithId { name: String, instance_id: String, input: String },
    Orchestration { name: String, instance_id: String, input: String },
    NewGuid,
    UtcNow,
    ContinueAsNew { input: String },
    Join { tasks: Vec<ScheduledTask> },
    Select { tasks: Vec<ScheduledTask> },
}

Adding a New ScheduledTask Type

  1. Add variant to ScheduledTask in src/types.rs with correct serde attributes
  2. Add execution branch in execute_task() in src/handlers.rs
  3. If it should work in select/race, add branch in make_select_future()
  4. If it should work in join/all, add branch in make_join_future()
  5. Add JS method to OrchestrationContext in lib/duroxide.js returning { type: '...', ... }
  6. Add TypeScript type to index.d.ts
  7. Add test in __tests__/e2e.test.js
  8. Rebuild: npx napi build --platform

Provider Polymorphism

napi-rs doesn't support trait objects in constructors. JsRuntime uses factory methods:

rust
// In runtime.rs
#[napi]
impl JsRuntime {
    #[napi(constructor)]
    pub fn new(provider: &JsSqliteProvider, ...) -> Self { ... }  // SQLite

    #[napi(factory)]
    pub fn from_postgres(provider: &JsPostgresProvider, ...) -> Self { ... }  // PG
}

JS wrapper detects provider type:

javascript
if (provider._type === 'postgres') {
    this._native = JsRuntime.fromPostgres(provider._native, options);
} else {
    this._native = new JsRuntime(provider._native, options);
}

When adding a new provider, follow this same pattern — constructor for default, factory for others.

select/race Implementation

select maps to Rust's ctx.select2(), which requires exactly 2 futures. make_select_future() converts a ScheduledTask to Pin<Box<dyn Future<Output = String> + Send + '_>>:

rust
fn make_select_future(ctx: &OrchestrationContext, task: ScheduledTask)
    -> Pin<Box<dyn Future<Output = String> + Send + '_>>

Supported in select: Activity, ActivityWithRetry, Timer, WaitEvent, SubOrchestration, SubOrchestrationWithId, SubOrchestrationVersioned, SubOrchestrationVersionedWithId. Unsupported: Join, Select (nested — rejected with error), ContinueAsNew, NewGuid, UtcNow.

join/all Implementation

join maps to Rust's ctx.join(), which requires Vec<F> with same output type. make_join_future() normalizes all task types to Pin<Box<dyn Future<Output = String>>> with {ok:v}/{err:e} JSON output:

  • Activity: {ok: result} or {err: message}
  • Timer: {ok: null} (timers return ())
  • WaitEvent: {ok: eventData}
  • Sub-orchestration: {ok: result} or {err: message}

Supported in join: all same types as select. Unsupported: Join, Select (nested — rejected with error), ContinueAsNew, NewGuid, UtcNow.

Activity Cancellation

ctx.isCancelled() checks the Rust CancellationToken via ACTIVITY_CTXS global map:

rust
pub fn activity_is_cancelled(token: &str) -> bool {
    ACTIVITY_CTXS.lock().unwrap().get(token)
        .map(|ctx| ctx.is_cancelled())
        .unwrap_or(false)
}

Cancellation mechanism: lock renewal failure → cancellation_token.cancel(). Detection latency = workerLockTimeoutMs / 2.

Common Pitfalls

PitfallWhat HappensFix
Using call_async().await in orchestration handlerFuture dropped by poll_once() — JS callback never executesUse block_in_place + block_on
Thread-local for cross-thread contextLookup returns None — traces silently failUse global HashMap
Exposing is_replaying to JS as static fieldStale after replay→live transition mid-executionLet Rust handle it via ctx.trace()
Forgetting to clean up global map entriesMemory leak + stale context referencesClean up on ALL exit paths (Ok, Err, early return)
cargo build instead of npx napi build --platformJS loads stale .node binary — changes don't take effectAlways use napi build
Missing serde(rename_all) on new ScheduledTask variantsDeserialization fails silently — task type not recognizedMatch JS naming convention (camelCase)

Build Requirements

bash
# MUST use napi build (not cargo build) for the .node binary
npx napi build --platform           # Debug
npx napi build --platform --release # Release

# Cargo build alone only produces a .dylib/.so — JS can't load it