AgentSkillsCN

@ruvector/attention

面向 Node.js 的高性能注意力机制,包括 MultiHeadAttention、FlashAttention 以及 CrossAttention。适用于实现 Transformer 式注意力层、构建自定义注意力管道、利用 Flash 算法高效处理大上下文注意力,或在智能体协同中集成注意力机制。

SKILL.md
--- frontmatter
name: "@ruvector/attention"
description: "High-performance attention mechanisms for Node.js including MultiHeadAttention, FlashAttention, and CrossAttention. Use when implementing transformer-style attention layers, building custom attention pipelines, performing efficient large-context attention with Flash algorithm, or integrating attention into agent coordination."

@ruvector/attention

High-performance attention mechanism implementations for Node.js, including MultiHeadAttention, FlashAttention (2.49x-7.47x speedup), CrossAttention, and specialized variants for AI agent coordination.

Quick Reference

TaskCode
Installnpx @ruvector/attention@latest
Importimport { FlashAttention } from '@ruvector/attention';
Createconst attn = new FlashAttention({ heads: 8, dim: 64 });
Forwardconst out = await attn.forward(Q, K, V);
Scoreconst scores = await attn.attentionScores(Q, K);

Installation

Hub install (recommended): npx ruvector@latest includes this package. Standalone: npx @ruvector/attention@latest See Installation Guide for the full ecosystem.

Key API

FlashAttention

Memory-efficient attention with IO-awareness (Dao et al., 2022).

typescript
import { FlashAttention } from '@ruvector/attention';

const attn = new FlashAttention({
  heads: 8,
  dim: 64,
  blockSize: 256,
});

Constructor Options:

OptionTypeDefaultDescription
headsnumber8Number of attention heads
dimnumber64Per-head dimension
blockSizenumber256Block size for tiling
causalbooleanfalseApply causal mask
dropoutnumber0.0Attention dropout rate
scalenumber1/sqrt(dim)Attention score scaling

Methods:

MethodReturnsDescription
forward(Q, K, V, mask?)Promise<Tensor>Compute flash attention
attentionScores(Q, K)Promise<Tensor>Get raw attention weights
benchmark(seqLen)Promise<BenchmarkResult>Run performance benchmark

MultiHeadAttention

Standard multi-head attention with projections.

typescript
import { MultiHeadAttention } from '@ruvector/attention';

const mha = new MultiHeadAttention({
  modelDim: 512,
  heads: 8,
  dropout: 0.1,
});

Constructor Options:

OptionTypeDefaultDescription
modelDimnumberrequiredModel embedding dimension
headsnumber8Number of attention heads
dropoutnumber0.0Attention dropout rate
biasbooleantrueInclude projection biases
kdimnumbermodelDimKey dimension (for cross-attention)
vdimnumbermodelDimValue dimension (for cross-attention)

Methods:

MethodReturnsDescription
forward(Q, K, V, mask?)Promise<Tensor>Multi-head attention forward
getAttentionWeights()TensorGet last attention weights
setProjections(W)voidSet Q/K/V projection matrices

CrossAttention

Attention between two different sequences.

typescript
import { CrossAttention } from '@ruvector/attention';

const cross = new CrossAttention({
  queryDim: 512,
  keyDim: 768,
  heads: 8,
});

Constructor Options:

OptionTypeDefaultDescription
queryDimnumberrequiredQuery sequence dimension
keyDimnumberrequiredKey/value sequence dimension
headsnumber8Number of attention heads
dropoutnumber0.0Attention dropout

LinearAttention

O(n) linear attention approximation.

typescript
import { LinearAttention } from '@ruvector/attention';

const linear = new LinearAttention({
  dim: 64,
  heads: 8,
  featureMap: 'elu',
});

Constructor Options:

OptionTypeDefaultDescription
dimnumberrequiredPer-head dimension
headsnumber8Number of attention heads
featureMapstring'elu'Kernel feature map: 'elu', 'relu', 'dpfp'
epsnumber1e-6Numerical stability epsilon

Common Patterns

Transformer Block with Flash Attention

typescript
import { FlashAttention } from '@ruvector/attention';

const selfAttn = new FlashAttention({ heads: 8, dim: 64, causal: true });
const crossAttn = new FlashAttention({ heads: 8, dim: 64 });

// Self-attention
const selfOut = await selfAttn.forward(x, x, x);
// Cross-attention with encoder output
const crossOut = await crossAttn.forward(selfOut, encoderOut, encoderOut);

Agent Coordination via Attention

typescript
import { MultiHeadAttention } from '@ruvector/attention';

const coordinator = new MultiHeadAttention({ modelDim: 256, heads: 4 });

// Each agent's state as a "token"
const agentStates = stackAgentEmbeddings(agents);
const coordinated = await coordinator.forward(agentStates, agentStates, agentStates);
// coordinated[i] = attention-weighted blend of all agent states for agent i

Long-Context with Linear Attention

typescript
import { LinearAttention } from '@ruvector/attention';

const attn = new LinearAttention({ dim: 64, heads: 8, featureMap: 'elu' });
// O(n) instead of O(n^2) -- handles 100K+ token contexts
const output = await attn.forward(Q, K, V);

RAN DDD Context

Bounded Context: Learning

References