DNS Debugging
Diagnose DNS resolution failures using Blocky query logs stored in PostgreSQL. Primary scenario: a user reports a website is broken and you need to determine whether Blocky is blocking a required domain.
Tool
./scripts/blocky.py queries the log_entries table in the Blocky CNPG PostgreSQL cluster via
kubectl exec.
Run ./scripts/blocky.py --help for full usage.
Quick Reference
# Search: who queried a domain recently? ./scripts/blocky.py search homedepot -f 24h # Logs: all queries from a specific client (by IP, name, or VLAN) ./scripts/blocky.py logs -c 192.168.3.40 -f 1h ./scripts/blocky.py logs -c pixel -f 1h # Blocked: only blocked queries for a client ./scripts/blocky.py blocked -c 192.168.3.40 -f 1h # Combine filters ./scripts/blocky.py logs -c 192.168.3.40 -d homedepot -f 2h # VLAN shorthand (lan, iot, kids, guest, work, cameras) ./scripts/blocky.py blocked -c kids -f 24h # Machine-readable ./scripts/blocky.py -j blocked -c 192.168.3.40
The -c/--client flag accepts: partial IP, partial device name (from reverse DNS), CIDR notation,
or VLAN name. All matching is case-insensitive.
Diagnostic Workflow
When a user reports "website X is broken" or "app Y is not working":
- •
Identify the device. If the user names a specific device, use
-cwith the device name to target it directly:./scripts/blocky.py blocked -c pixel -f 1h. NEVER assume a VLAN based on the app or use case; always confirm or search. If the device is unknown, search for the domain to find it:./scripts/blocky.py search <domain> -f 24hand pick the client with the most recentlast_seentimestamp. - •
Get blocked queries for that client to find the offending domain:
./scripts/blocky.py blocked -c <ip-or-name> -f 1hThe blocked domain is often not the main site but a subdomain (API, CDN, auth service). - •
Verify the block by checking the
reasonfield. It indicates which blocklist group matched (e.g.,ads,threats,social). - •
Determine the fix (see Remediation below).
- •
After pushing the fix, Flux applies the change and Blocky reloads automatically (reloader annotation). Verify resolution:
./scripts/blocky.py logs -c <ip-or-name> -d <domain> -f 5mThe domain should now showRESOLVEDorCACHEDinstead ofBLOCKED.
Remediation
Option A: Add an allowlist entry (single domain fix)
Add an allowlists section to the blocking config in
kubernetes/apps/dns-private/blocky/data/config.yaml. The allowlist group name must match the
denylist group that blocked the domain.
blocking:
denylists:
ads:
- https://...
allowlists:
ads:
- |
alloweddomain.com
api.someservice.com
Allowlists take precedence over denylists within the same group. A domain present in both the deny and allow list for a group will be allowed.
Option B: Switch to a less aggressive blocklist
If false positives are frequent for a list category, consider switching to a less aggressive variant. Read the current config to identify which list tier is in use, then check upstream for available tiers.
Option C: Remove a blocklist group from a client group
If an entire category is causing problems for a VLAN, remove it from clientGroupsBlock for that
VLAN.
Configuration Reference
Read kubernetes/apps/dns-private/blocky/data/config.yaml for current:
- •VLAN-to-subnet mappings and which block groups apply to each
- •Denylist URLs and their group names
- •Any existing allowlist entries
Unimplemented Subcommands
The following subcommands were deferred. If you need one during diagnosis, implement it in
./scripts/blocky.py following the patterns of the existing subcommands, then update this skill
file to move it from this list to the Quick Reference section above.
- •top-domains: Top queried domains by count. Flags:
-f,-c,-l. GROUP BYquestion_name, ORDER BY count DESC. - •top-blocked: Top blocked domains by count. Same as top-domains but filtered to
response_type = 'BLOCKED'. Include thereasoncolumn to show which blocklist group matched. - •top-clients: Top clients by query volume. Flags:
-f,-l. GROUP BYclient_ip,client_name, ORDER BY count DESC. - •slow: Queries exceeding a duration threshold. Flags:
-f,-c,--threshold(milliseconds, default 500). Filterduration_ms >= threshold, ORDER BYduration_msDESC.
Pattern Detection
When adding allowlist entries, check git history for systemic issues:
git log --oneline --invert-grep --author="renovate" \ -- kubernetes/apps/dns-private/blocky/data/config.yaml
If you see 3+ allowlist additions in a short period, the denylist tier may be too aggressive. Propose downgrading the list tier rather than maintaining a growing allowlist.