ALB Log Analyzer
Analyze AWS Application Load Balancer (ALB) logs using DuckDB for fast, flexible S3-based queries.
Quick Start
Secure Method (Recommended for Named Profiles)
For AWS named profiles, use these secure scripts that keep credentials private:
# One-time setup (credentials read from ~/.aws/credentials) ./scripts/setup_with_profile.sh your-profile-name # Load logs (profile name only, credentials stay secure) ./scripts/load_with_profile.sh your-profile-name 's3://bucket/path/**/*.log.gz' # Analyze ./scripts/analyze.sh errors ./scripts/analyze.sh performance
Security: Only profile names appear in commands and logs. Credentials are read internally from ~/.aws/credentials.
Standard Method
Basic workflow for ALB log analysis:
- •Setup DuckDB with AWS extensions
- •Load logs from S3 into a table
- •Run analysis queries (errors, performance, traffic, etc.)
- •Export results or create custom queries
Prerequisites
Ensure DuckDB is installed:
# Install DuckDB (if not already installed) brew install duckdb # macOS # or download from https://duckdb.org/docs/installation/
AWS credentials must be configured. See AWS Credentials Setup section below for details.
Database File Location
By default, the DuckDB database file is stored at:
/tmp/alb-log-analyzer-${USER}/alb_analysis.duckdb
Important: The database file is automatically deleted when you run setup or load commands. This ensures:
- •Clean start with each analysis
- •No accumulation of old data in /tmp
- •Reduced disk space usage
This temporary location also:
- •Keeps your skill directory clean
- •Prevents data from being packaged with the skill
- •May be cleared on system restart
To use a custom location, set the DB_FILE environment variable:
export DB_FILE=/path/to/custom/database.duckdb
Setup
Method 1: Using CREDENTIAL_CHAIN (Recommended)
This method automatically detects AWS credentials from environment variables, ~/.aws/credentials, or IAM roles:
./scripts/analyze.sh setup
Method 2: Using Environment Variables
If CREDENTIAL_CHAIN doesn't work, explicitly set environment variables.
For default profile:
# Set AWS credentials export AWS_ACCESS_KEY_ID=your_key_id export AWS_SECRET_ACCESS_KEY=your_secret_key export AWS_DEFAULT_REGION=ap-northeast-1 # Setup DuckDB ./scripts/analyze.sh setup-env
For named profile:
# Export credentials from your profile export AWS_PROFILE=your-profile-name export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name) export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name) export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name) # Setup DuckDB ./scripts/analyze.sh setup-env
Method 3: Manual SQL
duckdb alb_analysis.duckdb < scripts/setup.sql # or duckdb alb_analysis.duckdb < scripts/setup_s3_env.sql
Verify Setup
Diagnose your AWS credentials configuration:
./scripts/analyze.sh diagnose
This will show:
- •Extension installation status
- •Environment variable configuration
- •Configured secrets in DuckDB
AWS Credentials Setup
DuckDB needs AWS credentials to access S3. Choose one of these methods:
Option 1: AWS CLI Configuration (Recommended)
If you have AWS CLI configured, DuckDB can use those credentials:
# Check if AWS CLI is configured aws s3 ls # If not configured, run: aws configure
Option 2: Environment Variables
Export credentials in your shell:
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY export AWS_DEFAULT_REGION=ap-northeast-1
Option 3: AWS Credentials File
Create or edit ~/.aws/credentials:
[default] aws_access_key_id = AKIAIOSFODNN7EXAMPLE aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
And ~/.aws/config:
[default] region = ap-northeast-1
Option 4: AWS Profile (For Named Profiles)
If you're using a named AWS profile (not [default]), you need to explicitly export the credentials.
Easy way (using helper script):
# List available profiles aws configure list-profiles # Load credentials from your profile source scripts/load_profile.sh your-profile-name # Setup and use ./scripts/analyze.sh setup-env ./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'
Manual way:
# Check your profiles aws configure list-profiles # Export credentials from your profile export AWS_PROFILE=your-profile-name export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name) export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name) export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name) # Setup DuckDB with these credentials ./scripts/analyze.sh setup-env # Load logs ./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'
Why this is needed: DuckDB doesn't automatically read from named AWS profiles. It only supports environment variables or the [default] profile. By exporting the profile's credentials as environment variables, DuckDB can access them.
Option 5: IAM Role (EC2/ECS)
If running on EC2 or ECS, DuckDB can use the instance's IAM role automatically.
Loading Logs
Load ALB logs from S3 into a DuckDB table.
Using the analyze.sh script
# Load logs from S3 ./scripts/analyze.sh load 's3://my-bucket/AWSLogs/123456789012/elasticloadbalancing/us-east-1/2024/11/**/*.log.gz'
Manual SQL approach
- •Generate SQL from template:
# Replace placeholders
sed -e "s|{{TABLE_NAME}}|alb_logs|g" \
-e "s|{{S3_PATH}}|s3://my-bucket/path/to/logs/**/*.log.gz|g" \
scripts/load_template.sql > load_logs.sql
- •Execute:
duckdb alb_analysis.duckdb < load_logs.sql
S3 Path Patterns
Use glob patterns to load multiple files:
# Single month s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/11/**/*.log.gz # Multiple months s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/*/**/*.log.gz # Specific day s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/11/01/**/*.log.gz
Analysis Tasks
Error Analysis
Analyze HTTP errors, status code distributions, and error patterns:
./scripts/analyze.sh errors
This provides:
- •Status code distribution
- •Error details (non-200 responses)
- •5xx errors by hour
- •Common error reasons
Performance Analysis
Analyze response times and latency:
./scripts/analyze.sh performance
This provides:
- •Response time statistics (avg, p50, p95, p99)
- •Slowest requests
- •Response time trends by hour
- •Slow request percentage
Custom Queries
Execute custom SQL queries:
# Using a custom SQL file ./scripts/analyze.sh query my_query.sql # Or directly with DuckDB duckdb alb_analysis.duckdb
Common Analysis Patterns
Find requests from specific IP
SELECT * FROM alb_logs WHERE client_ip_port LIKE '192.168.1.%' ORDER BY timestamp DESC;
Analyze specific URL path
SELECT
elb_status_code,
COUNT(*) as count,
ROUND(AVG(target_processing_time), 3) as avg_time
FROM alb_logs
WHERE request LIKE '%/api/users%'
GROUP BY elb_status_code;
Traffic by time of day
SELECT
EXTRACT(HOUR FROM timestamp) as hour,
COUNT(*) as request_count
FROM alb_logs
GROUP BY hour
ORDER BY hour;
Filter by date range
SELECT * FROM alb_logs WHERE timestamp BETWEEN '2024-11-01' AND '2024-11-30' AND elb_status_code >= 500;
Advanced Usage
Create aggregated summaries
CREATE TABLE daily_summary AS
SELECT
DATE_TRUNC('day', timestamp) as day,
COUNT(*) as total_requests,
SUM(CASE WHEN elb_status_code >= 500 THEN 1 ELSE 0 END) as errors_5xx,
ROUND(AVG(target_processing_time), 3) as avg_response_time
FROM alb_logs
GROUP BY day;
Export results
-- Export to CSV COPY (SELECT * FROM alb_logs WHERE elb_status_code >= 500) TO 'errors.csv' (HEADER, DELIMITER ','); -- Export to Parquet COPY alb_logs TO 'alb_logs.parquet' (FORMAT PARQUET);
Multiple table analysis
Load logs into separate tables for comparison:
# Load last week's logs
sed -e "s|{{TABLE_NAME}}|alb_logs_last_week|g" \
-e "s|{{S3_PATH}}|s3://bucket/logs/2024/10/**/*.log.gz|g" \
scripts/load_template.sql | duckdb alb_analysis.duckdb
# Compare with this week
SELECT
'last_week' as period,
COUNT(*) as requests,
AVG(target_processing_time) as avg_time
FROM alb_logs_last_week
UNION ALL
SELECT
'this_week' as period,
COUNT(*) as requests,
AVG(target_processing_time) as avg_time
FROM alb_logs;
Troubleshooting
S3 Access Issues
If you get "Access Denied" or credential errors when loading from S3:
Step 1: Diagnose the issue
./scripts/analyze.sh diagnose
This shows your AWS credential configuration status.
Step 2: Verify AWS credentials work
Test with AWS CLI:
aws s3 ls s3://your-bucket/path/to/logs/
If this fails, fix your AWS credentials first.
Step 3: Try different setup methods
3a. If using default AWS profile:
# Set credentials export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id) export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key) export AWS_DEFAULT_REGION=$(aws configure get region) # Setup DuckDB with environment variables ./scripts/analyze.sh setup-env # Try loading again ./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'
3b. If using a named AWS profile:
# Check which profile you're using aws configure list-profiles # Replace 'your-profile-name' with your actual profile export AWS_PROFILE=your-profile-name export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name) export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name) export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name) # Setup DuckDB with environment variables ./scripts/analyze.sh setup-env # Try loading again ./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'
Step 4: Alternative - Load from local files
If S3 access still doesn't work, download logs locally:
# Download logs from S3
aws s3 sync s3://your-bucket/path/to/logs/ ./local-logs/
# Modify load_template.sql to use local path
sed -e "s|{{TABLE_NAME}}|alb_logs|g" \
-e "s|{{S3_PATH}}|./local-logs/**/*.log.gz|g" \
scripts/load_template.sql | duckdb alb_analysis.duckdb
"Extension aws not found"
Run setup script first:
./scripts/analyze.sh setup
Or manually:
duckdb alb_analysis.duckdb < scripts/setup.sql
"Invalid Input Error" when loading
This usually means the S3 path pattern doesn't match any files:
# Check if files exist
aws s3 ls --recursive s3://your-bucket/path/to/logs/ | head
# Verify the path pattern matches your bucket structure
# ALB logs are typically at:
# s3://bucket/AWSLogs/{account-id}/elasticloadbalancing/{region}/{year}/{month}/{day}/*.log.gz
Empty results after loading
# Check table contents duckdb alb_analysis.duckdb -c "SELECT COUNT(*) FROM alb_logs" # If 0, verify S3 path and try loading again
Out of memory
Use persistent database instead of :memory::
# Already using persistent storage by default # alb_analysis.duckdb is a file-based database
Permission errors on scripts
chmod +x scripts/analyze.sh
Region-specific issues
Ensure the region matches your S3 bucket:
export AWS_DEFAULT_REGION=ap-northeast-1 # Change to your region ./scripts/analyze.sh setup-env
References
ALB Log Schema
See references/alb_schema.md for complete field definitions and data types.
Key fields:
- •
timestamp: Request timestamp - •
elb_status_code: HTTP status from ALB - •
target_status_code: HTTP status from target - •
request: Full HTTP request line - •
target_processing_time: Response time from target - •
client_ip_port: Client address and port - •
target_ip_port: Target address and port - •
error_reason: Error details if applicable - •
transformed_host,transformed_uri,request_transform_status: Request transformation fields
Query Examples
See references/query_examples.md for comprehensive query examples including:
- •Error analysis queries
- •Performance metrics
- •Traffic analysis
- •SSL/TLS analysis
- •Data export patterns
Scripts Reference
analyze.sh
Main analysis script with subcommands:
# Setup (uses CREDENTIAL_CHAIN) ./scripts/analyze.sh setup # Setup with environment variables (if CREDENTIAL_CHAIN doesn't work) ./scripts/analyze.sh setup-env # Diagnose S3 access and credentials ./scripts/analyze.sh diagnose # Load logs ./scripts/analyze.sh load '<s3_path>' # Analyze errors ./scripts/analyze.sh errors # Analyze performance ./scripts/analyze.sh performance # Custom query ./scripts/analyze.sh query <sql_file> # Options --db <file> # Database file (default: alb_analysis.duckdb) --table <name> # Table name (default: alb_logs)
setup_with_profile.sh (Secure - Recommended)
Securely setup DuckDB with AWS profile. Credentials are read internally and never exposed in commands:
# Usage ./scripts/setup_with_profile.sh your-profile-name # Security: Only profile name is visible, credentials read from ~/.aws/credentials
Security advantage: Credentials stay private. Only profile names appear in command history and logs.
load_with_profile.sh (Secure - Recommended)
Securely load logs with AWS profile. Credentials are read internally and never exposed in commands:
# Usage ./scripts/load_with_profile.sh your-profile-name 's3://bucket/path/**/*.log.gz' [table_name] # Security: Only profile name and S3 path are visible
Security advantage: Credentials stay private. Only profile names appear in command history and logs.
load_profile.sh (Legacy - For Manual Setup)
Helper script to load AWS credentials from a named profile into environment variables:
# Usage source scripts/load_profile.sh your-profile-name # This exports: # - AWS_PROFILE # - AWS_ACCESS_KEY_ID # - AWS_SECRET_ACCESS_KEY # - AWS_DEFAULT_REGION
Note: Must be sourced (not executed) to export variables to your current shell.
Security consideration: This exports credentials to environment variables, which may be visible in process listings. Use setup_with_profile.sh and load_with_profile.sh for better security.
SQL Files
- •
setup.sql: Initialize DuckDB extensions with CREDENTIAL_CHAIN - •
setup_s3_env.sql: Initialize DuckDB extensions with environment variables - •
diagnose_s3.sql: Diagnose AWS credentials and S3 access - •
load_template.sql: Template for loading logs (requires variable substitution) - •
analyze_errors.sql: Error analysis queries - •
analyze_performance.sql: Performance analysis queries