AgentSkillsCN

data-structures

在选择、实现或对数据结构进行推理时使用。涵盖数组、链表、栈、队列、哈希表、树(BST、AVL、红黑树、B 树、Trie)、堆,以及图(邻接表、邻接矩阵)。基于 Knuth 的 TAOCP 第 1 卷。 适用场景:根据访问模式选择数据结构、理解操作复杂度、实现基础数据结构、比较数据结构的权衡。 切勿用于:在数据结构上进行图算法(使用图算法)、对数据进行排序(使用排序与搜索)。

SKILL.md
--- frontmatter
name: data-structures
description: |
    Use when selecting, implementing, or reasoning about data structures. Covers arrays, linked lists, stacks, queues, hash tables, trees (BST, AVL, Red-Black, B-Tree, Trie), heaps, and graphs (adjacency list, adjacency matrix). Based on Knuth's TAOCP Vol. 1.
    USE FOR: choosing data structures by access pattern, understanding operation complexities, implementing fundamental data structures, comparing data structure tradeoffs
    DO NOT USE FOR: graph algorithms on data structures (use graph-algorithms), sorting data (use sorting-searching)
license: MIT
metadata:
  displayName: "Data Structures"
  author: "Tyler-R-Kendrick"
compatibility: claude, copilot, cursor

Data Structures

Overview

Data structures are the foundation of efficient software. The choice of data structure determines the complexity of every operation your program performs. Knuth's The Art of Computer Programming, Volume 1: Fundamental Algorithms provides the definitive treatment of information structures -- from linear lists through trees and multilinked structures. This skill covers the major data structures, their operation complexities, and guidance on when to use each.

Arrays

A contiguous block of memory storing elements of the same type, accessed by index.

OperationTime
Access by indexO(1)
Search (unsorted)O(n)
Search (sorted)O(log n)
Insert at endO(1) amortized (dynamic array)
Insert at positionO(n)
Delete at positionO(n)

Use when: Random access is frequent, data size is known or grows by appending, cache locality matters.

Linked Lists

Elements (nodes) are stored non-contiguously; each node contains data and a pointer to the next (and optionally previous) node.

Variants

  • Singly linked: Each node points to the next. Traversal is forward only.
  • Doubly linked: Each node points to both next and previous. Traversal in both directions.
  • Circular: The last node points back to the first, forming a cycle.
OperationSinglyDoubly
Access by indexO(n)O(n)
SearchO(n)O(n)
Insert at headO(1)O(1)
Insert at tail (with tail pointer)O(1)O(1)
Insert at position (given pointer)O(1)O(1)
Delete at headO(1)O(1)
Delete at position (given pointer)O(n) for singly, O(1) for doublyO(1)

Use when: Frequent insertions/deletions at arbitrary positions, no need for random access, implementing stacks/queues.

Stacks

Last-In, First-Out (LIFO) structure.

OperationTime
PushO(1)
PopO(1)
Peek/TopO(1)
SearchO(n)

Implementations: Array-based (dynamic array) or linked-list-based.

Use when: Undo operations, expression evaluation/parsing, DFS traversal, call stack simulation, balanced parentheses checking.

Queues

Standard Queue (FIFO)

First-In, First-Out structure.

OperationTime
EnqueueO(1)
DequeueO(1)
Peek/FrontO(1)
SearchO(n)

Deque (Double-Ended Queue)

Insert and remove from both ends.

OperationTime
Insert front/backO(1)
Remove front/backO(1)
Peek front/backO(1)

Priority Queue

Elements are dequeued by priority, not insertion order. Typically implemented with a heap.

OperationTime (binary heap)
InsertO(log n)
Extract-min/maxO(log n)
Peek min/maxO(1)
Decrease keyO(log n)

Use when: BFS traversal (standard queue), scheduling (priority queue), sliding window problems (deque).

Hash Tables

Store key-value pairs with near-constant-time access by hashing keys to array indices.

Collision Handling

  • Chaining: Each bucket holds a linked list (or other collection) of entries that hash to the same index.
    • Simple to implement. Performance degrades to O(n/k) with poor hash distribution.
  • Open Addressing: All entries stored in the array itself. On collision, probe for the next open slot.
    • Linear probing: Check the next slot sequentially. Simple but suffers from clustering.
    • Quadratic probing: Check slots at quadratic intervals. Reduces clustering.
    • Double hashing: Use a second hash function to determine probe step. Best distribution.
OperationAverageWorst
InsertO(1)O(n)
SearchO(1)O(n)
DeleteO(1)O(n)

Load factor: Ratio of stored elements to table size. Keep below 0.7-0.75 for good performance; resize (rehash) when exceeded.

Use when: Fast key-based lookup is critical, keys are hashable, order does not matter.

Trees

Binary Search Tree (BST)

Each node has at most two children; left child < parent < right child.

OperationAverageWorst (degenerate)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)

AVL Tree

Self-balancing BST where the height difference between left and right subtrees of any node is at most 1.

OperationTime
SearchO(log n)
InsertO(log n)
DeleteO(log n)

Tradeoff: Strictly balanced, so faster lookups than Red-Black trees, but more rotations on insert/delete.

Red-Black Tree

Self-balancing BST with color properties guaranteeing that the longest path is at most twice the shortest.

OperationTime
SearchO(log n)
InsertO(log n)
DeleteO(log n)

Tradeoff: Less strictly balanced than AVL, so fewer rotations on insert/delete but slightly slower lookups. Used in most standard library map/set implementations (C++ std::map, Java TreeMap).

B-Tree

Generalized self-balancing tree where each node can have many children. Designed for systems that read/write large blocks of data.

OperationTime
SearchO(log n)
InsertO(log n)
DeleteO(log n)

Use when: Databases and file systems -- minimizes disk I/O by maximizing keys per node.

Trie (Prefix Tree)

Tree where each node represents a character; paths from root to leaves represent strings.

OperationTime
SearchO(m) where m = key length
InsertO(m)
DeleteO(m)
Prefix searchO(m)

Use when: Autocomplete, spell checking, IP routing tables, prefix-based searching.

Heaps

A complete binary tree satisfying the heap property.

Min-Heap

Parent <= children. Root is the minimum element.

Max-Heap

Parent >= children. Root is the maximum element.

OperationTime
InsertO(log n)
Extract min/maxO(log n)
Peek min/maxO(1)
Build heapO(n)
Decrease/increase keyO(log n)

Implementations: Typically a binary heap backed by an array. For better amortized performance, consider Fibonacci heaps (O(1) amortized insert and decrease-key).

Use when: Priority queues, heap sort, finding k-th largest/smallest, median maintenance.

Graphs

A graph G = (V, E) consists of vertices V and edges E connecting pairs of vertices.

Representations

Adjacency List

Each vertex stores a list of its neighbors.

OperationTime
Add vertexO(1)
Add edgeO(1)
Remove edgeO(degree)
Check edge existsO(degree)
SpaceO(V + E)

Best for: Sparse graphs (E << V^2). Most graph algorithms prefer this representation.

Adjacency Matrix

A V x V matrix where entry (i, j) indicates whether an edge exists between vertices i and j.

OperationTime
Add vertexO(V^2) -- resize
Add edgeO(1)
Remove edgeO(1)
Check edge existsO(1)
SpaceO(V^2)

Best for: Dense graphs (E close to V^2), when fast edge existence checks are needed, small graphs.

Choosing the Right Data Structure

NeedData StructureWhy
Fast index-based accessArrayO(1) random access
Fast insertions/deletions anywhereLinked ListO(1) with pointer
LIFO behaviorStackPush/pop O(1)
FIFO behaviorQueueEnqueue/dequeue O(1)
Fast key-value lookupHash TableO(1) average
Ordered key-value storageBST / Red-Black TreeO(log n) with ordering
Fast lookup with frequent readsAVL TreeStrict balance
Fast lookup with frequent writesRed-Black TreeFewer rotations
Disk-based storage / databasesB-TreeMinimizes I/O
String prefix operationsTrieO(m) prefix search
Priority-based processingHeap / Priority QueueO(log n) extract-min/max
Modeling relationshipsGraph (adjacency list)Flexible structure

Best Practices

  • Choose the data structure based on the dominant operation pattern: read-heavy, write-heavy, or balanced.
  • Prefer standard library implementations -- they are well-tested and optimized for real-world usage.
  • Consider cache locality: arrays and array-backed structures (heaps, hash tables with open addressing) are more cache-friendly than pointer-based structures.
  • For concurrent access, consider concurrent variants (ConcurrentHashMap, lock-free queues).
  • Remember that theoretical complexity is not the full story -- constant factors, memory allocation patterns, and cache behavior matter in practice.
  • Reference Knuth's TAOCP Vol. 1, Chapter 2 (Information Structures) for rigorous treatment of linked structures, trees, and multilinked representations.