DNS Management
Configure and automate DNS records with proper TTL strategies, DNS-as-code patterns, and troubleshooting techniques.
Purpose
Guide DNS configuration for applications, infrastructure, and services with focus on:
- •Record type selection (A, AAAA, CNAME, MX, TXT, SRV, CAA)
- •TTL strategies for propagation and caching
- •DNS-as-code automation (external-dns, OctoDNS, DNSControl)
- •Cloud DNS services comparison and selection
- •DNS-based load balancing patterns
- •Troubleshooting tools and techniques
When to Use This Skill
Apply DNS management patterns when:
- •Setting up DNS for new applications or services
- •Automating DNS updates from Kubernetes workloads
- •Configuring DNS-based failover or load balancing
- •Troubleshooting DNS propagation or resolution issues
- •Migrating DNS between providers
- •Planning DNS changes with minimal downtime
- •Implementing GeoDNS for global users
Record Type Selection
Quick Reference
Address Resolution:
- •A Record: Map hostname to IPv4 address (example.com → 192.0.2.1)
- •AAAA Record: Map hostname to IPv6 address (example.com → 2001:db8::1)
- •CNAME Record: Alias to another domain (www.example.com → example.com)
- •Cannot use at zone apex (@)
- •Cannot coexist with other records at same name
Email Configuration:
- •MX Record: Direct email to mail servers with priority
- •TXT Record: Email authentication (SPF, DKIM, DMARC) and verification
Service Discovery:
- •SRV Record: Specify service location (protocol, priority, weight, port, target)
Delegation and Security:
- •NS Record: Delegate subdomain to different nameservers
- •CAA Record: Restrict which Certificate Authorities can issue certificates
Cloud-Specific:
- •ALIAS Record: Like CNAME but works at zone apex (Route53, Cloudflare)
Decision Tree
Need to point domain to: ├─ IPv4 Address? → A record ├─ IPv6 Address? → AAAA record ├─ Another Domain? │ ├─ Zone apex (@) → ALIAS/ANAME or A record │ └─ Subdomain → CNAME ├─ Mail Server? → MX record (with priority) ├─ Email Authentication? → TXT record (SPF/DKIM/DMARC) ├─ Service Discovery? → SRV record ├─ Domain Verification? → TXT record ├─ Certificate Control? → CAA record └─ Subdomain Delegation? → NS record
For detailed record type examples and patterns, see references/record-types.md.
TTL Strategy
Standard TTL Values
By Change Frequency:
- •Stable records: 3600-86400s (1-24 hours) - NS, stable A/AAAA
- •Normal operation: 3600s (1 hour) - Standard websites, MX
- •Moderate changes: 300-1800s (5-30 min) - Development, A/B testing
- •Failover scenarios: 60-300s (1-5 min) - Critical records needing fast updates
Key Principle: Lower TTL = faster propagation but higher DNS query load
Pre-Change Process
When planning DNS changes:
T-48h: Lower TTL to 300s T-24h: Verify TTL propagated globally T-0h: Make DNS change T+1h: Verify new records propagating T+6h: Confirm global propagation T+24h: Raise TTL back to normal (3600s)
Propagation Formula: Max Time = Old TTL + New TTL + Query Time
Example: Changing a record with 3600s TTL takes up to 2 hours to fully propagate.
TTL by Use Case
| Use Case | TTL | Rationale |
|---|---|---|
| Production (stable) | 3600s | Balance speed and load |
| Before planned change | 300s | Fast propagation |
| Development/staging | 300-600s | Frequent changes |
| DNS-based failover | 60-300s | Fast recovery |
| Mail servers | 3600s | Rarely change |
| NS records | 86400s | Very stable |
For detailed TTL scenarios and calculations, see references/ttl-strategies.md.
DNS-as-Code Tools
Tool Selection by Use Case
Kubernetes DNS Automation → external-dns
- •Annotation-based configuration on Services/Ingresses
- •Automatic sync to DNS providers (20+ supported)
- •No manual DNS updates required
- •See
examples/external-dns/
Multi-Provider DNS Management → OctoDNS or DNSControl
- •Version control for DNS records
- •Sync configuration across multiple providers
- •Preview changes before applying
- •OctoDNS (Python/YAML) - See
examples/octodns/ - •DNSControl (JavaScript) - See
examples/dnscontrol/
Infrastructure-as-Code → Terraform
- •Manage DNS alongside cloud resources
- •Provider-specific resources (aws_route53_record, etc.)
- •See
examples/terraform/
Tool Comparison
| Tool | Language | Best For | Kubernetes | Multi-Provider |
|---|---|---|---|---|
| external-dns | Go | K8s automation | ★★★★★ | ★★★★ |
| OctoDNS | Python/YAML | Version control | ★★★ | ★★★★★ |
| DNSControl | JavaScript | Complex logic | ★★ | ★★★★★ |
| Terraform | HCL | IaC integration | ★★★ | ★★★★ |
Quick Start: external-dns
# Kubernetes Service with DNS annotation
apiVersion: v1
kind: Service
metadata:
name: app
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
ports:
- port: 80
Deploy external-dns controller once, then all annotated Services/Ingresses automatically create DNS records.
For complete examples, see examples/external-dns/ and references/dns-as-code-comparison.md.
Cloud DNS Provider Selection
Provider Characteristics
AWS Route53
- •Best for AWS-heavy infrastructure
- •Advanced routing policies (weighted, latency, geolocation, failover)
- •Health checks with automatic failover
- •ALIAS records for AWS resources (ELB, CloudFront, S3)
- •Pricing: $0.50/month per zone + $0.40 per million queries
Google Cloud DNS
- •Best for GCP-native applications
- •Strong DNSSEC support with automatic key rotation
- •Private zones for VPC internal DNS
- •Split-horizon DNS (different internal/external records)
- •Pricing: $0.20/month per zone + $0.40 per million queries
Azure DNS
- •Best for Azure-native applications
- •Integration with Azure Traffic Manager
- •Azure Private DNS zones
- •Azure RBAC for access control
- •Pricing: $0.50/month per zone + $0.40 per million queries
Cloudflare
- •Best for multi-cloud or cloud-agnostic
- •Fastest DNS query times globally
- •Built-in DDoS protection
- •Free tier with unlimited queries
- •CDN integration
- •Pricing: Free tier, $20/month Pro, $200/month Business
Selection Decision Tree
Choose based on: ├─ AWS-heavy? → Route53 ├─ GCP-native? → Cloud DNS ├─ Azure-native? → Azure DNS ├─ Multi-cloud? → Cloudflare or OctoDNS/DNSControl ├─ Need fastest global DNS? → Cloudflare ├─ Need DDoS protection? → Cloudflare └─ Budget-conscious? → Cloudflare (free tier) or Cloud DNS (lowest zone cost)
For detailed provider comparisons and examples, see references/cloud-providers.md.
DNS-Based Load Balancing
GeoDNS (Geographic Routing)
Return different IP addresses based on client location to:
- •Reduce latency (route to nearest data center)
- •Comply with data residency requirements
- •Distribute load across regions
Example Pattern:
Client Location → DNS Response ├─ North America → 192.0.2.1 (US data center) ├─ Europe → 192.0.2.10 (EU data center) └─ Default → CloudFront edge (global CDN)
Weighted Routing
Distribute traffic by percentage for:
- •Blue-green deployments
- •Canary releases (10% to new version)
- •A/B testing
Example Pattern:
DNS Responses: ├─ 90% → 192.0.2.1 (stable version) └─ 10% → 192.0.2.2 (canary version)
Health Check-Based Failover
Automatically route traffic away from unhealthy endpoints.
Pattern:
Primary: 192.0.2.1 (health checked every 30s) ├─ Healthy → Return primary IP └─ Unhealthy → Return secondary IP (192.0.2.2) Failover time: ~2-3 minutes = Health check failures (90s) + TTL expiration (60s)
For complete load balancing examples, see examples/load-balancing/.
Troubleshooting
Essential Commands
Check DNS Resolution:
# Basic query dig example.com # Clean output (just IP) dig example.com +short # Query specific DNS server dig @8.8.8.8 example.com dig @1.1.1.1 example.com # Trace resolution path dig +trace example.com
Check TTL:
dig example.com | grep -A1 "ANSWER SECTION" # Look for TTL value (number before IN A)
Check Propagation:
# Multiple resolvers dig @8.8.8.8 example.com +short # Google dig @1.1.1.1 example.com +short # Cloudflare dig @208.67.222.222 example.com +short # OpenDNS
Flush Local DNS Cache:
# macOS sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder # Windows ipconfig /flushdns # Linux sudo systemd-resolve --flush-caches
Common Problems
Slow Propagation:
- •Check current TTL (old TTL must expire first)
- •Lower TTL 24-48 hours before changes
- •Use propagation checkers: whatsmydns.net, dnschecker.org
CNAME at Zone Apex:
- •Error: Cannot use CNAME at @ (zone apex)
- •Solution: Use ALIAS record (Route53, Cloudflare) or A record
external-dns Not Creating Records:
- •Verify annotation spelling:
external-dns.alpha.kubernetes.io/hostname - •Check domain filter matches:
--domain-filter=example.com - •Review external-dns logs for errors
- •Confirm provider credentials configured
For detailed troubleshooting, see references/troubleshooting.md.
Common Patterns
Pattern 1: Kubernetes DNS Automation
# Deploy external-dns (once per cluster)
helm install external-dns external-dns/external-dns \
--set provider=aws \
--set domainFilters[0]=example.com \
--set policy=sync
# Then annotate Services
apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
Pattern 2: Multi-Provider Sync with OctoDNS
# octodns-config.yaml
providers:
config:
class: octodns.provider.yaml.YamlProvider
directory: ./config
route53:
class: octodns_route53.Route53Provider
cloudflare:
class: octodns_cloudflare.CloudflareProvider
zones:
example.com.:
sources: [config]
targets: [route53, cloudflare]
Pattern 3: DNS-Based Failover
# Route53 with health checks
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
Integration with Other Skills
infrastructure-as-code:
- •Manage DNS via Terraform/Pulumi alongside other resources
- •Zone configuration in IaC repositories
kubernetes-operations:
- •external-dns automates DNS for Kubernetes workloads
- •Ingress controller integration for automatic DNS
load-balancing-patterns:
- •DNS-based load balancing (GeoDNS, weighted routing)
- •Health checks and failover configurations
security-hardening:
- •DNSSEC for DNS integrity
- •CAA records for certificate authority control
- •DNS-based DDoS mitigation
secret-management:
- •Store DNS provider API credentials in vaults
- •Secure DDNS update mechanisms
Additional Resources
Reference Documentation:
- •
references/record-types.md- Detailed record type guide with examples - •
references/ttl-strategies.md- TTL scenarios and propagation calculations - •
references/cloud-providers.md- Provider comparison and detailed features - •
references/troubleshooting.md- Common problems and solutions - •
references/dns-as-code-comparison.md- Tool comparison matrix
Examples:
- •
examples/external-dns/- Kubernetes DNS automation - •
examples/octodns/- Multi-provider sync with YAML - •
examples/dnscontrol/- Multi-provider with JavaScript DSL - •
examples/terraform/- Cloud provider configurations - •
examples/load-balancing/- GeoDNS and failover patterns
Scripts:
- •
scripts/check-dns-propagation.sh- Verify propagation across resolvers - •
scripts/validate-dns-config.py- Validate DNS configuration - •
scripts/export-dns-records.sh- Export existing DNS records - •
scripts/calculate-ttl-propagation.py- Calculate propagation time
Quick Reference
Record Types Cheat Sheet
| Record | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 192.0.2.1 |
| AAAA | IPv6 address | example.com → 2001:db8::1 |
| CNAME | Alias to domain | www → example.com |
| MX | Mail server | 10 mail.example.com |
| TXT | Text/verification | "v=spf1 include:_spf.google.com ~all" |
| SRV | Service location | 10 60 5060 sip.example.com |
| NS | Nameserver delegation | ns1.provider.com |
| CAA | CA authorization | 0 issue "letsencrypt.org" |
TTL Cheat Sheet
| Scenario | TTL | Why |
|---|---|---|
| Stable production | 3600s | Balance speed/load |
| Before change | 300s | Fast propagation |
| Failover | 60-300s | Fast recovery |
| NS records | 86400s | Very stable |
Provider Cheat Sheet
| Provider | Best For | Key Feature |
|---|---|---|
| Route53 | AWS | Advanced routing, health checks |
| Cloud DNS | GCP | DNSSEC, private zones |
| Azure DNS | Azure | Traffic Manager integration |
| Cloudflare | Multi-cloud | Fastest, DDoS protection, free tier |
Tool Cheat Sheet
| Tool | Use When |
|---|---|
| external-dns | Kubernetes DNS automation |
| OctoDNS | Multi-provider, Python shop |
| DNSControl | Multi-provider, JavaScript preference |
| Terraform | Managing DNS with other infrastructure |