AgentSkillsCN

ruvector-attention-wasm-pkg

面向 Transformer 和 LLM 的高性能 WASM 注意力机制——MultiHead、Flash 以及双曲注意力。适用于在浏览器中运行 Transformer 推理、为边缘 ML 管道添加注意力层,或借助 WebAssembly 加速 LLM 的标记处理过程。

SKILL.md
--- frontmatter
name: ruvector-attention-wasm-pkg
description: "High-performance WASM attention mechanisms for transformers and LLMs - MultiHead, Flash, and Hyperbolic attention. Use when running transformer inference in browsers, adding attention layers to edge ML pipelines, or accelerating LLM token processing with WebAssembly."

ruvector-attention-wasm

High-performance WebAssembly attention mechanisms optimized for transformer models and LLMs. Provides MultiHead, Flash, and Hyperbolic attention implementations that run in browsers, Node.js, and edge runtimes.

Quick Reference

TaskCode
Importimport { MultiHeadAttention, FlashAttention, HyperbolicAttention } from 'ruvector-attention-wasm';
Initializeawait init();
Multi-head attentionmha.forward(q, k, v)
Flash attentionFlashAttention.forward(q, k, v)
Hyperbolic attentionHyperbolicAttention.forward(q, k, v)

Installation

Hub install (recommended): npx agentdb@latest includes this package. Standalone: npx ruvector-attention-wasm@latest

Node.js Usage

typescript
import init, {
  MultiHeadAttention,
  FlashAttention,
  HyperbolicAttention,
} from 'ruvector-attention-wasm';

await init();

// Multi-Head Attention
const mha = new MultiHeadAttention({
  numHeads: 8,
  headDim: 64,
  dropout: 0.1,
});

const seqLen = 128;
const dim = 512;
const q = new Float32Array(seqLen * dim);  // Query
const k = new Float32Array(seqLen * dim);  // Key
const v = new Float32Array(seqLen * dim);  // Value

const output = mha.forward(q, k, v, { seqLen, dim });
console.log(`Output shape: ${seqLen} x ${dim}`);

// Flash Attention (memory-efficient, O(N) memory)
const flashOutput = FlashAttention.forward(q, k, v, {
  seqLen,
  dim,
  blockSize: 64,
  causal: true,
});

// Hyperbolic Attention (for hierarchical data)
const hyperOutput = HyperbolicAttention.forward(q, k, v, {
  seqLen,
  dim,
  curvature: -1.0,
});

Browser Usage

html
<script type="module">
  import init, { FlashAttention } from 'ruvector-attention-wasm';
  await init();

  const q = new Float32Array(64 * 256);
  const k = new Float32Array(64 * 256);
  const v = new Float32Array(64 * 256);
  const out = FlashAttention.forward(q, k, v, { seqLen: 64, dim: 256, causal: true });
</script>

Key API

MultiHeadAttention

Standard scaled dot-product attention with multiple heads.

typescript
const mha = new MultiHeadAttention(config: MHAConfig);
const output = mha.forward(q: Float32Array, k: Float32Array, v: Float32Array, shape: ShapeInfo): Float32Array;

MHAConfig:

ParameterTypeDefaultDescription
numHeadsnumber8Number of attention heads
headDimnumber64Dimension per head
dropoutnumber0.0Dropout rate (training only)
scalenumber1/sqrt(headDim)Attention scale factor

ShapeInfo:

ParameterTypeDescription
seqLennumberSequence length
dimnumberModel dimension (numHeads * headDim)
batchSizenumberBatch size (default: 1)

FlashAttention

Memory-efficient attention with tiled computation. Uses O(N) memory instead of O(N^2).

typescript
FlashAttention.forward(
  q: Float32Array, k: Float32Array, v: Float32Array,
  config: FlashConfig
): Float32Array

FlashConfig:

ParameterTypeDefaultDescription
seqLennumberrequiredSequence length
dimnumberrequiredModel dimension
blockSizenumber64Tile block size
causalbooleanfalseApply causal mask
numHeadsnumber1Attention heads

HyperbolicAttention

Attention in hyperbolic space for hierarchical and tree-structured data.

typescript
HyperbolicAttention.forward(
  q: Float32Array, k: Float32Array, v: Float32Array,
  config: HyperbolicConfig
): Float32Array

HyperbolicConfig:

ParameterTypeDefaultDescription
seqLennumberrequiredSequence length
dimnumberrequiredModel dimension
curvaturenumber-1.0Hyperbolic curvature
numHeadsnumber1Attention heads

Utility Functions

typescript
import { softmax, scaledDotProduct, attentionMask } from 'ruvector-attention-wasm';

// Softmax over a flat array
const probs = softmax(logits: Float32Array, dim: number): Float32Array;

// Scaled dot-product attention (single head)
const attn = scaledDotProduct(q: Float32Array, k: Float32Array, v: Float32Array, dim: number): Float32Array;

// Create causal attention mask
const mask = attentionMask(seqLen: number, causal: boolean): Float32Array;

References