AgentSkillsCN

ruvector-attention-wasm

WebAssembly 注意力机制,采用 SIMD 加速的 FlashAttention、MultiHeadAttention 以及 CrossAttention,适用于浏览器与边缘端的推理计算。适用于在浏览器中运行注意力计算、将 Transformer 层部署至边缘设备、构建客户端侧注意力管道,或为 Web 应用程序引入 WASM 加速的注意力机制。

SKILL.md
--- frontmatter
name: "ruvector-attention-wasm"
description: "WebAssembly attention mechanisms with SIMD-accelerated FlashAttention, MultiHeadAttention, and CrossAttention for browser and edge inference. Use when running attention computations in browsers, deploying transformer layers to edge devices, building client-side attention pipelines, or adding WASM-accelerated attention to web applications."

@ruvector/attention-wasm

WebAssembly-compiled attention mechanisms providing browser-native and edge-compatible FlashAttention, MultiHeadAttention, CrossAttention, and LinearAttention with SIMD acceleration and zero server dependency.

Quick Reference

TaskCode
Installnpx @ruvector/attention-wasm@latest
Import (Node)import { WasmFlashAttention } from '@ruvector/attention-wasm';
Import (Browser)import init, { WasmFlashAttention } from '@ruvector/attention-wasm'; await init();
Createconst attn = new WasmFlashAttention({ heads: 8, dim: 64 });
Forwardconst out = attn.forward(Q, K, V);
Scoresconst scores = attn.attentionScores(Q, K);

Installation

Install: npx @ruvector/attention-wasm@latest See Installation Guide for the full ecosystem.

Key API

WasmFlashAttention

WASM-compiled memory-efficient FlashAttention with IO-awareness.

Node.js:

typescript
import { WasmFlashAttention } from '@ruvector/attention-wasm';

const attn = new WasmFlashAttention({
  heads: 8,
  dim: 64,
  blockSize: 256,
});

Browser:

typescript
import init, { WasmFlashAttention } from '@ruvector/attention-wasm';

await init(); // Initialize WASM module

const attn = new WasmFlashAttention({
  heads: 8,
  dim: 64,
  blockSize: 256,
});

Constructor Options:

OptionTypeDefaultDescription
headsnumber8Number of attention heads
dimnumber64Per-head dimension
blockSizenumber256Block size for tiling
causalbooleanfalseApply causal mask
dropoutnumber0.0Attention dropout rate
scalenumber1/sqrt(dim)Attention score scaling
simdbooleantrueUse WASM SIMD instructions

Methods:

MethodReturnsDescription
forward(Q, K, V, mask?)Float32ArrayCompute flash attention output
attentionScores(Q, K)Float32ArrayGet raw attention weights
benchmark(seqLen)BenchmarkResultRun performance benchmark
getMemoryUsage()MemoryInfoWASM memory stats
dispose()voidFree WASM memory

WasmMultiHeadAttention

WASM-compiled standard multi-head attention.

typescript
import init, { WasmMultiHeadAttention } from '@ruvector/attention-wasm';
await init();

const mha = new WasmMultiHeadAttention({
  modelDim: 512,
  heads: 8,
});

Constructor Options:

OptionTypeDefaultDescription
modelDimnumberrequiredModel embedding dimension
headsnumber8Number of attention heads
dropoutnumber0.0Attention dropout rate
biasbooleantrueInclude projection biases
kdimnumbermodelDimKey dimension
vdimnumbermodelDimValue dimension

Methods:

MethodReturnsDescription
forward(Q, K, V, mask?)Float32ArrayMulti-head attention forward
getAttentionWeights()Float32ArrayGet last attention weights
loadWeights(weights)voidLoad pretrained projection weights
dispose()voidFree WASM memory

WasmCrossAttention

WASM-compiled cross-attention between two sequences.

typescript
import init, { WasmCrossAttention } from '@ruvector/attention-wasm';
await init();

const cross = new WasmCrossAttention({
  queryDim: 512,
  keyDim: 768,
  heads: 8,
});

Constructor Options:

OptionTypeDefaultDescription
queryDimnumberrequiredQuery sequence dimension
keyDimnumberrequiredKey/value sequence dimension
headsnumber8Number of attention heads
dropoutnumber0.0Attention dropout

WasmLinearAttention

WASM-compiled O(n) linear attention approximation.

typescript
import init, { WasmLinearAttention } from '@ruvector/attention-wasm';
await init();

const linear = new WasmLinearAttention({
  dim: 64,
  heads: 8,
  featureMap: 'elu',
});

Constructor Options:

OptionTypeDefaultDescription
dimnumberrequiredPer-head dimension
headsnumber8Number of attention heads
featureMapstring'elu'Kernel: 'elu', 'relu', 'dpfp'
epsnumber1e-6Numerical stability epsilon

Common Patterns

Browser Transformer Inference

typescript
import init, { WasmFlashAttention } from '@ruvector/attention-wasm';

await init();

const selfAttn = new WasmFlashAttention({ heads: 8, dim: 64, causal: true });
const output = selfAttn.forward(queryTokens, keyTokens, valueTokens);

// Render output in browser
document.getElementById('result').textContent = decodeOutput(output);

Edge Device Attention

typescript
import { WasmLinearAttention } from '@ruvector/attention-wasm';

// O(n) linear attention for resource-constrained devices
const attn = new WasmLinearAttention({ dim: 32, heads: 4, featureMap: 'elu' });
const output = attn.forward(Q, K, V);
console.log(`Memory: ${attn.getMemoryUsage().heapUsed} bytes`);

Bundler Configuration (Vite)

typescript
// vite.config.ts
import wasm from 'vite-plugin-wasm';

export default {
  plugins: [wasm()],
  optimizeDeps: { exclude: ['@ruvector/attention-wasm'] },
};

RAN DDD Context

Bounded Context: Learning

References