redis

精通Redis DBA技能，擅长缓存与数据结构设计、性能调优、内存优化、持久化存储（RDB/AOF）、复制与集群搭建、高可用架构（Sentinel），以及各类运维故障排查。适用于诊断延迟飙升、逐出策略问题、热点Key现象，以及设计安全可靠的缓存使用模式等任务。

SKILL.md

--- frontmatter

name: redis
description: Redis DBA skill for cache/data-structure design, performance tuning, memory optimization, persistence (RDB/AOF), replication/cluster, high availability (Sentinel), and operational troubleshooting. Use for tasks like diagnosing latency spikes, eviction issues, hot keys, and designing safe cache patterns.

redis

Use this skill for Redis 相关设计、性能与运维（DBA/中间件）任务。

Defaults / assumptions to confirm

•Redis mode: single instance / Sentinel / Cluster
•Version and deployment (bare metal, Docker, managed)
•Persistence: RDB / AOF / both
•Memory policy and eviction strategy

Workflow

•Understand use-cases

•Cache vs primary store vs queue/stream.
•Data size, TTL distribution, QPS, latency SLO.
•Consistency requirements and acceptable staleness.

•Key design

•Namespacing: {app}:{domain}:{entity}:{id} (or similar)
•Avoid overly long keys; ensure stable prefixes for metrics.
•Plan for multi-tenant isolation if needed.

•Data structures & patterns

•Strings/Hashes for objects, Sets/ZSets for membership/ranking, Streams for event pipelines.
•Avoid large values; prefer hashes for many small fields.
•Choose one cache pattern explicitly: Cache-Aside / Write-Through / Write-Behind.
•Prevent cache stampede: singleflight/mutex, request coalescing, jittered TTL.

•Performance & reliability

•Identify hot keys, big keys, slow commands.
•Use pipelining where safe; avoid blocking commands on large collections.
•Track latency with LATENCY DOCTOR / slowlog; instrument at client.

•Memory management

•Set maxmemory and an eviction policy suitable for workload (allkeys-lru, volatile-ttl, etc.).
•Watch fragmentation and RSS vs used_memory.
•Use key TTLs and size controls to avoid unbounded growth.

•Persistence & durability

•RDB: snapshot intervals, fork time, disk IO impact.
•AOF: fsync policy, rewrite, size growth.
•Define recovery objectives (RPO/RTO) explicitly.

•HA / scaling

•Sentinel: failover behavior, client reconnection strategy.
•Cluster: hash slots, resharding plan, multi-key operations constraints.
•Plan for multi-AZ and network partitions.

•Operations checklist

•Backups and restore drills (test in staging).
•Capacity planning: memory headroom, CPU, network bandwidth.
•Upgrade playbook and rollback plan.

Outputs

•Key/TTL design doc (prefixes, structures, TTL, max size).
•Config recommendations (maxmemory, persistence, replication).
•Troubleshooting report (symptoms → evidence → root cause → fixes).