New Relic
Overview
This skill provides a structured workflow for querying New Relic observability data. It ensures consistent integration with the New Relic MCP server for application performance monitoring (APM), error tracking, infrastructure metrics, distributed tracing, log analysis, and incident management.
Prerequisites
- •New Relic MCP server must be connected and accessible via API key
- •Confirm access to the relevant New Relic account and applications
- •Ensure
NEW_RELIC_API_KEYenvironment variable is set
Required Workflow
Follow these steps in order. Do not skip steps.
Step 0: Set up New Relic MCP (if not already configured)
If any MCP call fails because New Relic MCP is not connected, pause and set it up:
- •Add the New Relic MCP:
- •
codex mcp add newrelic --url https://mcp.newrelic.com/mcp/
- •
- •Enable remote MCP client:
- •Set
[features] rmcp_client = trueinconfig.tomlor runcodex --enable rmcp_client
- •Set
- •Configure API credentials:
- •Set environment variable:
NEW_RELIC_API_KEY - •The API key should have appropriate permissions (read access minimum, write for incident acknowledgement)
- •Optional: Set
NEW_RELIC_ACCOUNT_IDas default account (can be overridden per tool call)
- •Set environment variable:
After successful configuration, the user will need to restart codex. You should finish your answer and tell them so when they try again they can continue with Step 1.
Step 1
Clarify the user's goal and scope (e.g., performance investigation, error analysis, capacity planning, incident response). Confirm application names/IDs, time ranges, metric types, and alert priorities as needed.
Step 2
Select the appropriate workflow (see Practical Workflows below) and identify the New Relic MCP tools you will need. Confirm required identifiers (application name, entity GUID, incident ID) before calling tools.
Step 3
Execute New Relic MCP tool calls in logical batches:
- •Query first (metrics, applications, incidents) to gather context
- •Analyze patterns (errors, latency, throughput, resource usage)
- •For complex investigations, explain the analysis approach before executing multiple queries
- •Use NRQL for detailed analysis with proper time filters (
SINCE,UNTIL)
Step 4
Summarize findings, highlight anomalies or trends, propose next actions (further investigation, configuration changes, alert acknowledgement, incident escalation), and provide actionable recommendations.
Available Tools
Application Performance: list_apm_applications, get_app_performance, get_app_errors, get_application_slow_transactions_details, get_application_top_database_operations_details
NRQL Queries: run_nrql_query, query_logs
Incident Management: list_open_incidents, list_open_incidents_rest, acknowledge_incident, list_alert_policies
Entity Discovery: search_entities, get_entity_details, list_related_entities
Infrastructure: get_infrastructure_hosts, get_metric_data_for_host, list_metric_names_for_host
Synthetic Monitoring: list_synthetics_monitors, create_simple_browser_monitor
Dashboards (if supported): list_dashboards, get_dashboard, create_dashboard
Practical Workflows
Performance Analysis
Goal: Identify slow endpoints, database bottlenecks, and optimize application performance.
Steps:
- •List APM applications → Identify target application
- •Get app performance metrics → Check response time, throughput, Apdex
- •Query slow transactions → Find top 10 slowest endpoints with NRQL
- •Analyze database operations → Identify slow queries
- •Check infrastructure → Verify CPU/memory aren't bottlenecks
- •Provide optimization recommendations
Example NRQL:
SELECT average(duration), max(duration), count(*) FROM Transaction WHERE appName = 'MyApp' SINCE 24 hours ago FACET name ORDER BY average(duration) DESC LIMIT 10
Error Investigation
Goal: Diagnose application errors, find root causes, track error rates.
Steps:
- •Get app errors → Check error rate and count
- •Query detailed errors → Find error messages and stack traces
- •Group by error type → Identify most common errors
- •Correlate with transactions → Find affected endpoints
- •Check for recent deployments or changes
- •Provide root cause analysis
Example NRQL:
SELECT errorMessage, error.class, stackTrace, duration, transactionName FROM TransactionError WHERE appName = 'MyApp' AND error IS true SINCE 30 minutes ago LIMIT 50 ORDER BY timestamp DESC
Incident Triage
Goal: Monitor alerts, acknowledge incidents, reduce MTTR.
Steps:
- •List open incidents (filter by CRITICAL/WARNING)
- •Get incident details and affected entities
- •Acknowledge critical incidents with status update
- •Query related metrics to understand impact
- •Check for cascading failures or dependencies
- •Provide incident summary and next steps
Example:
1. list_open_incidents(priority="CRITICAL")
2. get_entity_details(guid="ENTITY_GUID")
3. acknowledge_incident(incident_id=12345, message="Investigating payment latency")
4. run_nrql_query("SELECT * FROM Transaction WHERE appName='PaymentService' AND error IS true SINCE 30 minutes ago")
Log Analysis
Goal: Search application logs to debug issues and understand system behavior.
Steps:
- •Query logs with keywords or patterns
- •Filter by log level (ERROR, WARN, INFO)
- •Group by application or environment
- •Correlate with trace IDs or transaction IDs
- •Identify patterns and anomalies
Example NRQL:
SELECT timestamp, level, message, logger, threadName FROM Log WHERE message LIKE '%timeout%' AND level = 'ERROR' SINCE 1 hour ago ORDER BY timestamp DESC LIMIT 100
Capacity Planning
Goal: Analyze resource usage trends and forecast scaling needs.
Steps:
- •Get infrastructure hosts → Check CPU, memory, disk
- •Query throughput trends over time
- •Analyze peak vs. average load
- •Check database connection pool usage
- •Identify resource constraints
- •Provide scaling recommendations
Example NRQL:
SELECT average(cpuPercent), max(cpuPercent), average(memoryUsedPercent) FROM SystemSample SINCE 7 days ago FACET hostname TIMESERIES 1 day
Infrastructure Health Check
Goal: Monitor host-level metrics and identify resource constraints.
Steps:
- •List infrastructure hosts → Get all hosts
- •Check CPU usage → Identify high CPU hosts
- •Check memory usage → Identify memory pressure
- •Check disk usage → Identify storage issues
- •Correlate with application performance
- •Provide health summary
Synthetic Monitoring
Goal: Proactively monitor availability and performance from external locations.
Steps:
- •List synthetic monitors → Check status
- •Identify failed monitors
- •Analyze failure patterns (geographic, time-based)
- •Check success rate trends
- •Create new monitors for critical endpoints
Tips for Maximum Productivity
- •Always use time filters: Add
SINCEclause to NRQL queries to limit data volume (e.g.,SINCE 1 hour ago,SINCE 24 hours ago) - •Start broad, drill down: Begin with high-level metrics (app performance, error rate), then query details
- •Use FACET for grouping: Group results by endpoint, error type, host (e.g.,
FACET name,FACET error.class) - •Leverage TIMESERIES: Visualize trends over time (e.g.,
TIMESERIES 5 minutes,TIMESERIES 1 day) - •Combine data sources: Correlate
TransactionwithTransactionError,SystemSamplewithTransaction - •Cache application IDs: Reuse application names/GUIDs across multiple queries
- •Batch related queries: Execute multiple NRQL queries in parallel when investigating complex issues
- •Use ORDER BY: Rank results (slowest endpoints, most frequent errors) with
ORDER BY average(duration) DESC
Troubleshooting
- •Authentication Errors: Verify
NEW_RELIC_API_KEYhas appropriate permissions; check account ID is correct; re-authenticate if needed - •Query Timeouts: Reduce time range (use shorter
SINCEintervals); limit result sets withLIMIT; avoid complex aggregations without filters - •Missing Data: Confirm application instrumentation is active; check data retention policies; verify entity is reporting
- •Rate Limits: Batch queries; use specific filters to reduce data volume; implement exponential backoff for retries
- •NRQL Syntax Errors: Validate metric names with
list_metric_names_for_host; check NRQL syntax at https://docs.newrelic.com/docs/query-your-data/nrql-new-relic-query-language/get-started/introduction-nrql-new-relics-query-language/ - •Incident Acknowledgement Failures: Verify incident ID is correct; check incident state (can't acknowledge already closed incidents); ensure API key has write permissions
Best Practices
Query Optimization
- •Use
LIMITto prevent excessive result sets (100-1000 rows) - •Apply
WHEREfilters beforeFACETfor better performance - •Use
TIMESERIESwith appropriate intervals (1 minute for real-time, 1 day for trends) - •Avoid
SELECT *- specify needed attributes
Security
- •Use read-only API keys for monitoring agents
- •Grant write permissions only for incident management
- •Rotate API keys regularly
- •Never commit API keys to version control
Investigation Methodology
- •Scope: Identify affected applications and time range
- •Metrics: Gather high-level performance data
- •Errors: Check for error spikes or patterns
- •Infrastructure: Verify resources aren't constrained
- •Logs: Search for error messages and stack traces
- •Correlate: Connect metrics, errors, and logs
- •Root Cause: Provide evidence-based diagnosis
- •Recommend: Actionable next steps