Linux Administration Skill
Comprehensive Linux system administration and automation for Debian/Ubuntu/Mint environments
Service Management (systemd)
Essential Commands
bash
# Status and logs systemctl status service-name journalctl -u service-name -n 100 --no-pager journalctl -u service-name --since "1 hour ago" # Control systemctl start|stop|restart|reload service-name systemctl enable|disable service-name systemctl daemon-reload # After editing unit files
Debugging Failed Services
bash
systemctl status service-name --no-pager -l journalctl -u service-name -p err --no-pager systemctl list-dependencies service-name systemd-analyze verify /etc/systemd/system/service-name.service
Custom Service Template
ini
# /etc/systemd/system/myservice.service [Unit] Description=My Service Description After=network.target [Service] Type=simple User=user WorkingDirectory=/path/to/scripts ExecStart=/path/to/venv/bin/python /path/to/scripts/service.py Restart=on-failure RestartSec=10s StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target
Automation Patterns
Core Principles
- •Error Handling:
set -euo pipefail(bash), try/except (Python) - •Logging: Always log to file AND stdout
- •Idempotency: Scripts safe to run multiple times
- •Configuration: Use config files, not hardcoded values
- •Notifications: Alert on failures, not just successes
Bash Script Template
bash
#!/bin/bash
set -euo pipefail
SCRIPT_NAME="$(basename "$0")"
LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"
LOCK_FILE="/tmp/${SCRIPT_NAME%.sh}.lock"
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"; }
cleanup() { rm -f "$LOCK_FILE"; }
trap cleanup EXIT
if [ -f "$LOCK_FILE" ]; then log "ERROR: Already running"; exit 1; fi
touch "$LOCK_FILE"
log "Starting $SCRIPT_NAME"
# Main logic here
log "Completed $SCRIPT_NAME"
Python Script Template
python
#!/path/to/venv/bin/python
import sys, logging
from pathlib import Path
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[logging.FileHandler('/var/log/script.log'), logging.StreamHandler(sys.stdout)]
)
logger = logging.getLogger(__name__)
def main():
logger.info("Starting")
try:
pass # Main logic
except Exception as e:
logger.error(f"Failed: {e}")
sys.exit(1)
logger.info("Completed")
if __name__ == '__main__':
main()
Cron vs Systemd Timers
| Feature | Cron | Systemd Timer |
|---|---|---|
| Logging | Manual | Automatic (journalctl) |
| Missed runs | Lost | Persistent=true catches up |
| Dependencies | None | Can require other services |
Recommendation: Use systemd timers for production.
Systemd Timer Setup
ini
# /etc/systemd/system/task.timer [Unit] Description=Task Timer [Timer] OnCalendar=daily Persistent=true [Install] WantedBy=timers.target
ini
# /etc/systemd/system/task.service [Unit] Description=Task [Service] Type=oneshot User=user ExecStart=/path/to/script.sh StandardOutput=journal StandardError=journal
bash
sudo systemctl daemon-reload && sudo systemctl enable --now task.timer systemctl list-timers --all
Cron Syntax
bash
# minute hour day month weekday command 0 2 * * * /path/to/scripts/backup.sh >> /var/log/backup.log 2>&1 */15 * * * * /path/to/scripts/health_check.sh 0 18 * * 1-5 /path/to/scripts/report.sh
Log Analysis
journalctl Patterns
bash
journalctl -n 100 -o short-precise journalctl -f journalctl --since "2025-10-02 14:00" --until "2025-10-02 15:00" journalctl -p err journalctl -g "error|fail|exception" -n 500
Application Log Patterns
bash
grep " 5[0-9][0-9] " /var/log/nginx/access.log | tail -20
awk '$NF > 1.0' /var/log/nginx/access.log | tail -20
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
Network Troubleshooting
Diagnostics
bash
ping -c 4 google.com nslookup hostname.domain nc -zv host.domain 443 ss -tulpn | grep LISTEN ip route show traceroute host.domain
Firewall (ufw)
bash
sudo ufw status verbose sudo ufw allow 443/tcp comment 'HTTPS' sudo ufw allow from 192.0.2.0/24 to any port 22 sudo ufw status numbered && sudo ufw delete 5
Static IP (netplan)
yaml
# /etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
ens33:
addresses: [192.0.2.100/24]
gateway4: 192.0.2.1
nameservers:
addresses: [8.8.8.8, 8.8.4.4]
# Apply: sudo netplan apply
Container Management
For Docker/Podman container operations, see container-testing skill.
Performance Monitoring
bash
htop
ps aux --sort=-%cpu | head -10
ps aux --sort=-%mem | head -10
iostat -x 1 5
df -h
du -sh /var/log/* | sort -rh | head -10
find /home -type f -size +100M -exec ls -lh {} \;
Package Management (apt)
bash
sudo apt update && sudo apt upgrade -y sudo apt install package-name -y sudo apt remove package-name sudo apt --fix-broken install sudo dpkg --configure -a
User and Permission Management
bash
sudo adduser username sudo usermod -aG groupname username sudo chown -R user:group directory/ chmod 755 script.sh namei -l /path/to/file getfacl file && setfacl -m u:username:rwx file
Troubleshooting Workflows
Service Won't Start
bash
systemctl status service-name journalctl -u service-name -n 50 --no-pager systemctl cat service-name namei -l /etc/service-name/config.conf
High CPU
bash
ps aux --sort=-%cpu | head -5 strace -p PID && lsof -p PID kill PID
Disk Full
bash
df -h du -sh /* | sort -rh | head -10 sudo journalctl --vacuum-time=7d find /tmp -type f -atime +7 -delete
Network Issues
bash
ip addr show && ip route show
ping -c 4 $(ip route | grep default | awk '{print $3}')
cat /etc/resolv.conf && nslookup google.com
sudo ufw status
Automation Validation Checklist
Before deploying:
- • Script runs successfully manually
- • Error handling tested (force failures)
- • Logs written and readable
- • Idempotent (run twice, same result)
- • Timer/cron syntax verified
- • Permissions correct
Pattern Recognition: The same 20 commands solve 80% of problems. Good automation disappears. Bad automation wakes you at 2 AM.