CloudTrail Log Analysis
Overview
Amazon CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS accounts. CloudTrail logs all AWS Management Console sign-in events, AWS SDKs and command-line tool calls, and calls made to the AWS APIs by using the AWS Management Console, AWS SDKs, command-line tools, and other software.
By using CloudTrail, you can detect unusual activity in your AWS environment, such as unexpected changes to security groups or IAM users, as well as identify potential security threats. For example, you can use CloudTrail to detect and respond to unauthorized access attempts, unauthorized changes to resources, or suspicious activity in your AWS accounts. Additionally, CloudTrail can be used to create audit trails of resource changes and to ensure compliance with internal policies and industry regulations.
When using AWS CloudTrail, it can be helpful to import certain types of logs in order to perform security analysis.
When downloaded to a local filesystem CloudTrail can be very effective in hunting for threats. This skill provides guidance on how to do local analysis of compressed JSON log sources that have been retrieved from S3.
General Instructions
- •Perform web searches for AWS CloudTrail file format to help parse and understand the content of the cloud events.
- •Review references for guidance on how to search for attack patterns
- •Create a time-stamped markdown file for any work performed (
analyst_log-YY-MM-DD-HH-MM.md) to capture analysis steps and sample data. - •Create persistent scripts for all but the most trivial tasks using the naming convention
analyze_[topic].pyorparse_[topic].sh. Do not remove any scripts after creation. - •Do not clean up temporary output from analyst scripts; rename them with a suffix of
YY-MM-DD-HH-MM.md. - •Prevent Out of Memory (OOM) errors, particularly on low-resource systems:
- •DuckDB: Use DuckDB as the primary engine for large data; it handles disk spilling automatically. Set
PRAGMA memory_limit='2GB'(adjust as needed) to constrain usage. - •Streaming & Chunking: Use
polars.scan_ndjson()for lazy loading or Python generators to process files record-by-record. Avoidjson.load()on massive files. - •Pre-reduction: Use
jqto filter and flatten data before ingesting into Python/Pandas.
- •DuckDB: Use DuckDB as the primary engine for large data; it handles disk spilling automatically. Set
- •Do NOT use analogies to explain concepts.
- •Ensure all analysis files are in
.mdformat.
Quick Start
- •Initialize environment:
uv venv && source .venv/bin/activate - •Install dependencies:
uv pip install duckdb orjson polars pandas matplotlib
Common Recipes
jq: Flatten CloudTrail Records
cat *.json | jq -c '.Records[]' > flattened.jsonl
DuckDB: Direct Ingestion
import duckdb
con = duckdb.connect('analysis.db')
con.execute("CREATE TABLE events AS SELECT * FROM read_json_auto('*.json', format='auto', records='true')")
Python Coding Style
- •Use Python or
jqto parse and analyze log files. - •Use Python
duckdbto ingest and analyze data. Save confident data as a persistent.dbfile in the current directory. - •Review existing Python code in the current directory before writing new code to solve problems.
- •Use
uvto create virtual environments and install libraries. Maintain arequirements.txtfile. - •Use
orjsoninstead of the built-injsonlibrary for better performance. - •Use Python
polarsto convert JSON to parquet if needed. - •Use Python
pandasfor statistical analysis if beneficial. - •Create visualizations as
.pngfiles with meaningful, space-free filenames. - •Use
sys.argvfor command-line arguments instead ofargparseto keep syntax simple. Do not hardcode filenames.