Docker, Docker Compose & Docker Swarm Integration
Instructions
Use this skill to interact with Docker standalone, Docker Compose applications, and Docker Swarm clusters. Docker Swarm is the container orchestration platform for the Home Lab, running production and development workloads. Docker Compose is used for single-host multi-container applications and local development. Always prefer using AWX Job Templates for deployment operations over manual docker/compose commands.
Configuration
The preferred method of interaction is SSH to Swarm manager nodes or using AWX job templates for orchestration tasks.
Smart Access Pattern: Before running commands, determine if the operation should be executed via AWX (deployments, configuration changes) or direct SSH (read-only verification, troubleshooting).
Bjzy Labs defaults
- •Docker Swarm Clusters:
- •Production: 3-node manager cluster (Huey, Dewey, Louie)
- •Development: 3-node manager cluster (devHuey, devDewey, devLouie)
- •AWX Integration:
- •Job Template ID 68: "Docker Swarm - Generic" (playbook:
docker-swarm-manage.yml) - •Workflow ID 65: "Docker Swarm - Full Stack Deployment"
- •Job Template ID 68: "Docker Swarm - Generic" (playbook:
- •Vault lookup (for credentials if needed):
- •Path:
kvProd_v2/Docker/(various secrets)
- •Path:
Environment and Guardrails (Bjzy Labs)
- •
Managed Environments:
- •Production Cluster:
- •Inventory ID: 15 (Docker Swarm - Production)
- •Nodes: Huey (192.168.60.81), Dewey (192.168.60.82), Louie (192.168.60.83)
- •Hostnames: huey.bjzy.me, dewey.bjzy.me, louie.bjzy.me
- •Network: 192.168.60.0/24 (smDSwitch-ProdLAN)
- •Swarm Node IDs:
- •Huey:
8mj25u1rmr9tzdbp5oj179zlx(Manager - Reachable) - •Dewey:
ma7q2zjjolchc5bs8olga68kv(Manager - Reachable) - •Louie:
ynor96m8zqq427wj5u9mx7dok(Manager - Leader)
- •Huey:
- •Development Cluster:
- •Inventory ID: 16 (Docker Swarm - Development)
- •Nodes: devHuey (192.168.50.81), devDewey (192.168.50.82), devLouie (192.168.50.83)
- •Hostnames: devhuey.bjzy.me, devdewey.bjzy.me, devlouie.bjzy.me
- •Network: 192.168.50.0/24 (smDSwitch-DevLAN)
- •Swarm Node IDs:
- •devHuey:
stdt4v7vltry18zkmxi5fo62i(Manager - Reachable) - •devDewey:
i4wayj3rtpsfy5pl102vzop84(Manager - Leader) - •devLouie:
raw7wsjdvhddbvvliu52bp3ps(Manager - Reachable)
- •devHuey:
- •Production Cluster:
- •
Execution Rules:
- •The agent must NEVER run
docker stack deploymanually; use AWX job templates. - •The agent must NEVER modify Docker daemon configuration directly; changes must be made in the
homelab-playbooksrepo and deployed via AWX. - •SSH is allowed for read-only verification and troubleshooting.
- •Hostname safety:
devHuey!=Huey(dev vs prod are different clusters).
- •The agent must NEVER run
- •
Overlay Networks (both clusters):
- •
ingress- Default overlay network for published ports - •
traefik-public- Public-facing overlay for Traefik services - •
management- Encrypted overlay for admin services - •
monitoring- Encrypted overlay for monitoring stack (Grafana Alloy, etc.) - •
backend- Encrypted overlay for application backends
- •
- •
Storage:
- •Local storage:
/var/lib/docker(20GB LVM partition per node) - •Shared storage: MicroCeph cluster (CephFS mounts for persistent data)
- •Docker volumes: Mix of local and CephFS-backed volumes
- •Local storage:
Standard Operating Procedure (SOP)
When asked to "Deploy a service," "Check Swarm status," or "Troubleshoot containers":
- •Determine Environment: Identify if this is Production (Inventory 15) or Development (Inventory 16).
- •Choose Method:
- •Deployments/Changes: Use AWX Job Template 68 or Workflow 65.
- •Read-Only Checks: SSH directly to any manager node.
- •Verify Access: For SSH operations, use
ssh ansible@<hostname>(e.g.,ssh ansible@huey.bjzy.me). - •Execute & Monitor: Run the operation and verify results.
- •Document: If issues are found, consider enriching Keep alerts or updating documentation.
Examples
1. Check Swarm Cluster Status (Read-Only)
Use this to verify cluster health, node status, and service distribution.
- •Method: SSH to any manager node
- •Command Pattern:
# Check node status (run on any manager)
ssh ansible@huey.bjzy.me "docker node ls"
# Check service status
ssh ansible@huey.bjzy.me "docker service ls"
# Check specific service details
ssh ansible@huey.bjzy.me "docker service ps <service-name>"
# Check cluster info
ssh ansible@huey.bjzy.me "docker info --format '{{.Swarm.NodeID}}\t{{.Swarm.ControlAvailable}}\t{{.Swarm.Managers}}'"
2. Inspect Service Logs (Troubleshooting)
View logs from a specific service across all replicas.
- •Method: SSH to any manager node
- •Command Pattern:
# View service logs (last 100 lines) ssh ansible@huey.bjzy.me "docker service logs --tail 100 <service-name>" # Follow service logs in real-time ssh ansible@huey.bjzy.me "docker service logs -f <service-name>" # View logs from specific task/container ssh ansible@huey.bjzy.me "docker logs <container-id>"
3. Check Container Resource Usage (Read-Only)
Monitor resource consumption across the cluster.
- •Method: SSH to specific node
- •Command Pattern:
# Check resource usage on a specific node
ssh ansible@dewey.bjzy.me "docker stats --no-stream"
# Check disk usage
ssh ansible@louie.bjzy.me "docker system df"
# Check image storage
ssh ansible@huey.bjzy.me "docker images --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}'"
4. Deploy Service via AWX (Preferred Method)
Use AWX to deploy or update services in the cluster.
- •Method: AWX CLI
- •Command Pattern:
# Launch Docker Swarm management job
awx job_template launch 68 \
--extra_vars '{
"operation": "deploy-service",
"service_name": "my-app",
"environment": "production"
}' \
--inventory 15 \
--monitor
# Launch full stack deployment workflow
awx workflow_job_template launch 65 \
--inventory 16 \
--monitor
5. Check Network Configuration (Read-Only)
Verify overlay networks and connectivity.
- •Method: SSH to any manager node
- •Command Pattern:
# List all networks
ssh ansible@huey.bjzy.me "docker network ls"
# Inspect specific network
ssh ansible@huey.bjzy.me "docker network inspect traefik-public"
# Check which services are on a network
ssh ansible@huey.bjzy.me "docker network inspect backend --format '{{range .Containers}}{{.Name}} {{end}}'"
6. Verify Service Health (Read-Only)
Check health status of running services.
- •Method: SSH to any manager node
- •Command Pattern:
# Check service replicas and health
ssh ansible@dewey.bjzy.me "docker service ls --format 'table {{.Name}}\t{{.Replicas}}\t{{.Image}}'"
# Check for failed tasks
ssh ansible@dewey.bjzy.me "docker service ps <service-name> --filter 'desired-state=shutdown' --format 'table {{.Name}}\t{{.Node}}\t{{.Error}}'"
# Inspect service configuration
ssh ansible@dewey.bjzy.me "docker service inspect <service-name> --pretty"
7. Check Docker Daemon Status (Read-Only)
Verify Docker daemon health on individual nodes.
- •Method: SSH to specific node
- •Command Pattern:
# Check Docker service status ssh ansible@louie.bjzy.me "systemctl status docker" # Check Docker version ssh ansible@huey.bjzy.me "docker version" # Check Docker daemon configuration ssh ansible@dewey.bjzy.me "docker info"
8. List Running Containers on Node (Read-Only)
View containers running on a specific node.
- •Method: SSH to specific node
- •Command Pattern:
# List all containers on a node
ssh ansible@devhuey.bjzy.me "docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Image}}'"
# List containers with resource usage
ssh ansible@devlouie.bjzy.me "docker ps --format 'table {{.Names}}\t{{.Status}}' && docker stats --no-stream"
Docker Compose Examples
Docker Compose is used for single-host multi-container applications. Common use cases include development environments, standalone services, and applications not requiring Swarm orchestration.
IMPORTANT: Use docker compose (with a space) - this is the modern Docker CLI plugin syntax. The legacy docker-compose (with a hyphen) command is deprecated and should NOT be used.
9. Check Docker Compose Application Status (Read-Only)
View status of Docker Compose applications.
- •Method: SSH to host running compose application
- •Command Pattern:
# List running compose services (from compose directory) ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps" # View compose application logs ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose logs --tail 100" # Check specific service logs ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose logs -f service-name" # View compose configuration ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config"
10. Inspect Docker Compose Resources (Read-Only)
Check networks, volumes, and containers created by Compose.
- •Method: SSH to host
- •Command Pattern:
# List compose-managed containers ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps -a" # View compose networks ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config --networks" # View compose volumes ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config --volumes" # Check resource usage of compose services ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps -q | xargs docker stats --no-stream"
11. Validate Docker Compose Configuration (Read-Only)
Verify compose file syntax and configuration.
- •Method: SSH to host
- •Command Pattern:
# Validate compose file syntax ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config --quiet" # View resolved compose configuration ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config" # Check for compose file issues ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config 2>&1 | grep -i error"
12. Docker Compose via AWX (Preferred for Deployments)
Use AWX to deploy or manage Docker Compose applications.
- •Method: AWX CLI
- •Command Pattern:
# Launch compose deployment via AWX
awx job_template launch <template-id> \
--extra_vars '{
"operation": "deploy",
"compose_app": "myapp",
"compose_path": "/opt/myapp"
}' \
--inventory 15 \
--monitor
# Note: Specific job template IDs depend on your AWX configuration
13. Harbor Registry Operations (Read-Only)
Harbor is the container registry for the Home Lab, hosting AWX EE images, custom applications, and third-party images.
- •Method: Harbor API (primary) + Docker CLI (registry operations)
- •Registry URL:
harbor.bjzy.me - •Authentication: Robot account tokens (from Vault)
- •Note: Harbor CLI (
harbor-cli) is not installed by default; use API + Docker CLI
Harbor API Examples:
# Check Harbor API health (no auth) curl -k https://harbor.bjzy.me/api/v2/health # List projects (requires token) HARBOR_TOKEN=$(vault kv get -field=robot_token kvProd_v2/Harbor/Application-Prod) curl -k -H "Authorization: Bearer $HARBOR_TOKEN" \ https://harbor.bjzy.me/api/v2.0/projects # List images in a project curl -k -H "Authorization: Bearer $HARBOR_TOKEN" \ https://harbor.bjzy.me/api/v2.0/projects/awx/repositories # Check image manifest curl -k -H "Authorization: Bearer $HARBOR_TOKEN" \ https://harbor.bjzy.me/v2/awx/awx-ee/tags/latest
Docker Registry Operations:
# Login to Harbor (robot account) docker login harbor.bjzy.me -u robot-<project> -p <token> # Pull AWX EE image docker pull harbor.bjzy.me/awx/awx-ee:latest # Push custom image docker tag myapp:latest harbor.bjzy.me/myapp/myapp:latest docker push harbor.bjzy.me/myapp/myapp:latest # List available images (requires Harbor API) curl -k -H "Authorization: Bearer $HARBOR_TOKEN" \ https://harbor.bjzy.me/api/v2.0/repositories
14. Docker Logging Configuration (Read-Only)
Verify Docker daemon logging configuration and Loki integration.
- •Method: SSH to any node
- •Purpose: Verify logs are flowing to Grafana Loki via Alloy
Command Pattern:
# Check Docker daemon logging driver
ssh ansible@huey.bjzy.me "docker info --format '{{.LoggingDriver}}'"
# Verify daemon.json logging configuration
ssh ansible@huey.bjzy.me "sudo cat /etc/docker/daemon.json | jq -r '.\"log-driver\" // \"json-file\"'"
# Check container log paths (for troubleshooting)
ssh ansible@huey.bjzy.me "docker inspect <container-id> --format '{{.LogPath}}'"
# Verify Alloy is collecting Docker logs
ssh ansible@huey.bjzy.me "sudo systemctl status alloy"
# Check Alloy Docker log collection config
ssh ansible@huey.bjzy.me "sudo cat /etc/alloy/config.alloy | grep -A 5 'docker.*logs'"
15. Emergency Stack Deploy Reference
NOTE: Stack deploy operations should normally use AWX Job Templates. This reference is for emergency scenarios when AWX is unavailable.
- •Documentation:
docs/DOCKER_SWARM_DEPLOYMENT.md - •Playbook:
playbooks/docker-swarm-manage.yml - •AWX Template: Job Template 68 ("Docker Swarm - Generic")
- •Emergency Command Pattern:
docker stack deploy -c docker-compose.yml stack-name
For AWX failures:
- •Check AWX service status via
awx-controlskill - •Consider manual stack deploy only if AWX is completely unavailable
- •Always document any manual deployments for later AWX reconciliation
Troubleshooting
Service Won't Start
# Check service logs for errors
ssh ansible@huey.bjzy.me "docker service logs <service-name>"
# Check task failures
ssh ansible@huey.bjzy.me "docker service ps <service-name> --no-trunc"
# Inspect service constraints and placement
ssh ansible@huey.bjzy.me "docker service inspect <service-name> --format '{{.Spec.TaskTemplate.Placement}}'"
Node Shows as Down
# Check Docker service on the node ssh ansible@dewey.bjzy.me "systemctl status docker" # Check node status from manager ssh ansible@huey.bjzy.me "docker node inspect dewey.bjzy.me --pretty" # Check system logs ssh ansible@dewey.bjzy.me "sudo journalctl -u docker.service -n 50"
Network Connectivity Issues
# Check overlay network status ssh ansible@huey.bjzy.me "docker network ls --filter driver=overlay" # Inspect network for connected services ssh ansible@huey.bjzy.me "docker network inspect backend" # Test connectivity between services (if possible) ssh ansible@huey.bjzy.me "docker exec <container-id> ping <other-service-name>"
Storage Issues
# Check disk usage across nodes ssh ansible@huey.bjzy.me "df -h /var/lib/docker" ssh ansible@dewey.bjzy.me "df -h /var/lib/docker" ssh ansible@louie.bjzy.me "df -h /var/lib/docker" # Check Docker storage usage ssh ansible@huey.bjzy.me "docker system df -v" # Check for dangling images/volumes ssh ansible@huey.bjzy.me "docker images -f dangling=true" ssh ansible@huey.bjzy.me "docker volume ls -f dangling=true"
Certificate/TLS Issues
# Check Docker daemon TLS configuration
ssh ansible@huey.bjzy.me "docker info --format '{{.SecurityOptions}}'"
# Verify certificates exist
ssh ansible@huey.bjzy.me "ls -la /etc/docker/certs.d/"
# Check service-specific TLS mounts
ssh ansible@huey.bjzy.me "docker service inspect <service-name> --format '{{.Spec.TaskTemplate.ContainerSpec.Mounts}}'"
Docker Compose Application Issues
# Check compose service status ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps" # View compose logs for errors ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose logs --tail 200" # Check specific service that won't start ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose logs service-name" # Validate compose file ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config" # Check compose network connectivity ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose exec service-name ping other-service" # View compose service resource usage ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps -q | xargs docker stats --no-stream" # Check for port conflicts ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config | grep -A 2 ports"
Quick Reference Commands
Cluster Management (Read-Only)
# View all nodes ssh ansible@huey.bjzy.me "docker node ls" # View all services ssh ansible@huey.bjzy.me "docker service ls" # View all stacks ssh ansible@huey.bjzy.me "docker stack ls" # View cluster-wide events ssh ansible@huey.bjzy.me "docker events --filter type=service --filter type=node"
Service Inspection (Read-Only)
# Service details ssh ansible@huey.bjzy.me "docker service inspect <service-name>" # Service logs ssh ansible@huey.bjzy.me "docker service logs <service-name>" # Service tasks/replicas ssh ansible@huey.bjzy.me "docker service ps <service-name>"
Node-Specific Checks (Read-Only)
# Containers on node ssh ansible@dewey.bjzy.me "docker ps" # Resource usage on node ssh ansible@dewey.bjzy.me "docker stats --no-stream" # Images on node ssh ansible@dewey.bjzy.me "docker images"
Docker Compose Operations (Read-Only)
# View compose application status ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps" # View compose logs ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose logs --tail 50" # Validate compose configuration ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose config" # Check compose resource usage ssh ansible@huey.bjzy.me "cd /opt/myapp && docker compose ps -q | xargs docker stats --no-stream"
Harbor Registry Operations (Read-Only)
# Check Harbor health
curl -k https://harbor.bjzy.me/api/v2/health
# List Harbor projects (with token)
HARBOR_TOKEN=$(vault kv get -field=robot_token kvProd_v2/Harbor/Application-Prod)
curl -k -H "Authorization: Bearer $HARBOR_TOKEN" https://harbor.bjzy.me/api/v2.0/projects
# Pull AWX EE from Harbor
docker pull harbor.bjzy.me/awx/awx-ee:latest
# Check Docker logging driver
ssh ansible@huey.bjzy.me "docker info --format '{{.LoggingDriver}}'"
Important Notes
- •Never run destructive commands (
docker service rm,docker stack rm,docker swarm leave) without explicit user confirmation. - •Always verify environment (Production vs Development) before executing commands.
- •Use AWX for deployments - Direct
docker stack deployshould only be used for emergency fixes. - •Check Keep alerts first - Many Swarm issues may already have alerts in Keep AIOps.
- •Document findings - If you discover issues, enrich Keep alerts or update Notion documentation.
Related Documentation
- •Notion: Docker Swarm Management — Infrastructure Automation Guide
- •Notion: Docker Swarm Deployment Variables Backup
- •GitHub Repo:
homelab-playbooks(BjzyLabs/ansible-homelab) - •Local Docs:
docs/DOCKER_SWARM_OPERATIONS.md,docs/DOCKER_SWARM_DEPLOYMENT.md