Debug Hackathon Infrastructure
Diagnose and fix issues with the running stack.
Reference: common-issues.md - Quick lookup table for common problems
Quick Health Check
Start here for any issue:
./scripts/status.sh
Don't panic if unhealthy immediately after deploy - services take 5-15 minutes to initialize.
SSH Access
./scripts/ssh.sh wa # WorkAdventure ./scripts/ssh.sh lk # LiveKit ./scripts/ssh.sh turn # Coturn ./scripts/ssh.sh jitsi # Jitsi
If SSH fails:
- •Check
ssh-add -lshows keys from 1Password - •Verify instance is running in AWS Console
- •Check security group allows port 22
Important
All terraform commands must be run from terraform/environments/hackathon/.
General Debugging
Once SSH'd in:
Check Initialization Status
cloud-init status # Success: status: done # Problem: status: running (still initializing) or status: error
View Startup Logs
cat /var/log/cloud-init-output.log # Look for errors near the end
Check Docker Containers
docker ps # All containers should show "Up X minutes"
View Container Logs
docker compose logs -f # All containers docker compose logs <service> # Specific service
Check Resources
df -h # Disk space (problem if >90%) free -h # Memory (problem if very low)
Service-Specific Issues
WorkAdventure
OAuth shows template errors: OAuth templates are copied from the cloned hackathon-infra repo during deployment. If templates are missing, redeploy the WorkAdventure instance:
cd terraform/environments/hackathon terraform apply -replace="module.workadventure.aws_instance.workadventure"
Containers keep restarting:
./scripts/ssh.sh wa docker compose logs --tail 100 play docker compose logs --tail 100 back
Map not loading: Maps are served from the cloned hackathon-infra repo on the EC2 instance.
./scripts/ssh.sh wa ls /opt/workadventure/hackathon-infra/maps/default/ # If missing or outdated, sync from git: cd /opt/workadventure/hackathon-infra && git pull
Or use ./scripts/sync-maps.sh from your local machine.
LiveKit
"OK" not returned:
./scripts/ssh.sh lk curl -s http://localhost:7880 # Check local docker logs caddy # Check TLS
TLS certificate errors:
docker restart caddy
Coturn
No relay candidates in Trickle ICE test:
Most common cause - certificate permissions:
./scripts/ssh.sh turn ls -la /etc/coturn/certs/ # Must be 644, if not: sudo chmod 644 /etc/coturn/certs/* docker restart coturn
Jitsi
Let's Encrypt failed:
./scripts/ssh.sh jitsi sudo /usr/share/jitsi-meet/scripts/install-letsencrypt-cert.sh
"service-unavailable" errors:
./scripts/ssh.sh jitsi docker compose restart prosody sleep 10 docker compose restart jicofo docker compose restart jvb
Network Debugging
DNS Not Resolving
dig +short app.hackathon.nf-co.re dig +short livekit.hackathon.nf-co.re dig +short jitsi.hackathon.nf-co.re dig +short turn.hackathon.nf-co.re
Test Port Connectivity
nc -zv livekit.hackathon.nf-co.re 443 nc -zv jitsi.hackathon.nf-co.re 443 nc -zuv turn.hackathon.nf-co.re 3478 # UDP
Check Security Groups
aws ec2 describe-security-groups --profile nf-core --region eu-west-1 \ --filters "Name=tag:Name,Values=nfcore-hackathon-*" \ --query 'SecurityGroups[].[GroupName,IpPermissions]' --output json
Terraform State Issues
Recovery from State Corruption
If Terraform state doesn't match AWS reality:
- •Don't panic - AWS resources still exist
- •Compare state vs reality (commands below)
- •For resources in AWS but not state:
terraform import <address> <id> - •For resources in state but not AWS:
terraform state rm <address> - •Run
terraform planto verify alignment
For complete manual cleanup: See teardown skill and manual-cleanup-scripts.md
State Doesn't Match AWS
# What Terraform thinks exists terraform state list # What actually exists aws ec2 describe-instances --profile nf-core --region eu-west-1 \ --filters "Name=tag:Name,Values=nfcore-hackathon-*" \ --query 'Reservations[].Instances[].[InstanceId,Tags[?Key==`Name`].Value|[0],State.Name]' \ --output table
Resource in AWS but not state: terraform import <address> <id>
Resource in state but not AWS: terraform state rm <address>
State Lock Errors
- •Wait 15 minutes (auto-expire)
- •Check no other terraform process running
- •NEVER force-unlock without explicit user approval
Recovery Actions
Restart Single Service
./scripts/ssh.sh <service> docker compose restart
Full Redeploy
cd terraform/environments/hackathon terraform destroy terraform apply # Wait 10-15 minutes ./scripts/status.sh
Maps and assets are automatically cloned from the hackathon-infra repo during deployment.
Browser-Side Debugging
Check Console (F12)
- •JavaScript errors
- •Failed network requests
- •CORS errors
Common Browser Issues
| Issue | Cause | Fix |
|---|---|---|
| WebSocket failed | Back service not reachable | Check docker compose logs back |
| Map not loading | Maps not synced or git not pulled | Run ./scripts/sync-maps.sh |
| Audio/video denied | HTTPS or permissions | Ensure HTTPS, grant permissions |