Dataset Exploration
Systematically explore an Axiom dataset to understand its structure, content, and potential use cases.
Arguments
When invoked with a dataset name (e.g., /explore-dataset logs), the name is available as $ARGUMENTS.
Exploration Protocol
1. List Available Datasets
If no dataset specified, list what's available:
axiom dataset list -f json
2. Schema Discovery
Always start here. Discover actual field names and types:
axiom query "['<dataset>'] | getschema" --start-time -1h
Identify:
- •Field names and types
- •Dotted fields requiring bracket notation
- •Timestamp fields
- •Key dimensions (service, status, level)
OTel trace data: If schema contains trace_id, span_id, attributes.*, note that:
- •Service fields are promoted: use
['service.name']not['resource.service.name'] - •Custom attributes:
['attributes.custom']['field']withtostring()for aggregations - •See
axiom-aplskill's OTel reference for field mappings
3. Sample Data
Examine actual values:
axiom query "['<dataset>'] | limit 10" --start-time -1h -f json
Look for:
- •Data structure and relationships
- •Field value formats
- •Data quality issues
4. Volume Analysis
Understand data volume patterns:
axiom query "['<dataset>'] | summarize count() by bin(_time, 1h) | sort by _time asc" --start-time -24h
Analyze:
- •Event volume over time
- •Data freshness
- •Collection gaps
5. Categorical Field Analysis
For each key categorical field (status, level, service):
axiom query "['<dataset>'] | summarize count() by <field> | top 20 by count_" --start-time -1h
Identify:
- •Value distributions
- •Cardinality
- •Key dimensions for filtering
6. Numerical Field Statistics
For numeric fields (duration, bytes, count):
axiom query "['<dataset>'] | summarize count(), min(<field>), max(<field>), avg(<field>), percentiles(<field>, 50, 95, 99)" --start-time -1h
7. Error Pattern Detection
Search for error indicators:
axiom query "search in (['<dataset>']) 'error' or 'fail' or 'exception' | limit 20" --start-time -1h
Output Format
Provide a summary including:
## Dataset Summary: <name> ### Purpose <What system generated this data, what it represents> ### Key Fields | Field | Type | Description | |-------|------|-------------| | ... | ... | ... | ### Volume - Events per hour: ~X - Data freshness: last event at X ### Key Dimensions - `status`: 200, 400, 500, ... - `service.name`: api, web, worker, ... ### Recommended Queries <Common queries for this dataset> ### Monitoring Opportunities <What could be alerted on>
When NOT to Use
- •Known datasets: If you already understand the schema, skip exploration and query directly
- •Quick field check: Use
getschemadirectly for single field lookups - •Production queries: Exploration uses expensive operations (
search); extract patterns then optimize - •Repeated analysis: Once explored, document findings and reuse—don't re-explore
APL Reference
For query syntax, invoke the axiom-apl skill which provides comprehensive documentation on operators, functions, and patterns.