Performance Testing

Performance Testing Types

Load Testing

Purpose: Verify system performance under expected load

•Simulates expected user traffic and data volume
•Identifies performance bottlenecks under normal conditions
•Establishes performance baselines
•Validates SLA compliance

Key Metrics:

•Response time (average, median, p95, p99)
•Throughput (requests per second, transactions per second)
•Error rate
•Resource utilization (CPU, memory, disk, network)

Stress Testing

Purpose: Identify system breaking points

•Exceeds expected load to find limits
•Tests system recovery after failure
•Identifies failure modes and error handling
•Validates graceful degradation

Key Metrics:

•Maximum concurrent users before failure
•Maximum throughput before failure
•Time to recover after load reduction
•Error patterns and failure modes

Spike Testing

Purpose: Handle sudden traffic increases

•Simulates sudden traffic spikes (e.g., flash sales, viral content)
•Tests system elasticity and auto-scaling
•Validates queuing and throttling mechanisms
•Identifies race conditions under load

Key Metrics:

•Response time during spike
•Error rate during spike
•Time to stabilize after spike
•Queue depth and processing time

Soak Testing

Purpose: Verify stability over extended periods

•Runs sustained load for hours or days
•Identifies memory leaks and resource exhaustion
•Tests database connection pool stability
•Validates garbage collection efficiency

Key Metrics:

•Memory usage over time
•Response time trends
•Error rate over time
•Resource utilization trends

Volume Testing

Purpose: Test with large data volumes

•Tests performance with realistic data sizes
•Identifies database query performance issues
•Tests file system and storage performance
•Validates data migration performance

Key Metrics:

•Query execution time with large datasets
•Index usage and effectiveness
•Storage I/O performance
•Data processing throughput

Performance Testing Tools

JMeter

Best for: Load and stress testing

•Open source, Java-based
•Supports multiple protocols (HTTP, JDBC, JMS, etc.)
•Distributed testing support
•Extensive plugin ecosystem
•GUI and CLI modes

xml

<!-- JMeter Test Plan Example -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan>
  <hashTree>
    <TestPlan guiclass="TestPlanGui">
      <stringProp name="TestPlan.comments">Load Test</stringProp>
    </TestPlan>
    <hashTree>
      <ThreadGroup guiclass="ThreadGroupGui">
        <stringProp name="ThreadGroup.num_threads">100</stringProp>
        <stringProp name="ThreadGroup.ramp_time">10</stringProp>
        <stringProp name="ThreadGroup.duration">60</stringProp>
      </ThreadGroup>
      <hashTree>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui">
          <stringProp name="HTTPSampler.domain">example.com</stringProp>
          <stringProp name="HTTPSampler.path">/api/users</stringProp>
        </HTTPSamplerProxy>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

Gatling

Best for: High-performance load testing

•Scala-based, DSL for test scenarios
•High performance, low resource usage
•Real-time metrics and reporting
•Good for continuous integration
•Supports HTTP, WebSocket, JMS

scala

// Gatling Example
import io.gatling.core.Predef._
import io.gatling.http.Predef._

class LoadTest extends Simulation {
  val httpProtocol = http.baseUrl("https://example.com")
  
  val scn = scenario("User Journey")
    .exec(http("Get Users").get("/api/users"))
    .pause(1)
    .exec(http("Get User").get("/api/users/1"))
  
  setUp(
    scn.inject(
      rampUsers(100).during(10.seconds),
      constantUsersPerSec(50).during(60.seconds)
    )
  ).protocols(httpProtocol)
}

k6

Best for: Developer-friendly performance testing

•JavaScript-based, easy to learn
•Modern CLI and cloud integration
•Good for CI/CD pipelines
•Supports HTTP/1.1, HTTP/2, WebSocket
•Grafana integration for visualization

javascript

// k6 Example
import http from 'k6/http';
import { check, sleep } from 'k6';

export let options = {
  stages: [
    { duration: '10s', target: 100 },
    { duration: '60s', target: 100 },
    { duration: '10s', target: 0 },
  ],
};

export default function () {
  let res = http.get('https://example.com/api/users');
  check(res, {
    'status was 200': (r) => r.status == 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

Locust

Best for: Python-based load testing

•Python-based, easy to write tests
•Web UI for real-time monitoring
•Distributed testing support
•Good for complex user scenarios
•Event-based architecture

python

# Locust Example
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)
    
    @task
    def get_users(self):
        self.client.get("/api/users")
    
    @task(2)
    def get_user(self):
        self.client.get("/api/users/1")

Key Performance Metrics

Response Time

•Average: Mean response time across all requests
•Median: Middle value, less affected by outliers
•p95: 95th percentile, 95% of requests complete within this time
•p99: 99th percentile, 99% of requests complete within this time
•Min/Max: Fastest and slowest response times

Throughput

•Requests Per Second (RPS): Number of requests handled per second
•Transactions Per Second (TPS): Number of business transactions per second
•Concurrent Users: Number of simultaneous users
•Hits Per Second: Number of HTTP requests per second

Error Rate

•HTTP Error Rate: Percentage of HTTP errors (4xx, 5xx)
•Application Error Rate: Percentage of application-level errors
•Timeout Rate: Percentage of requests that timed out
•Connection Error Rate: Percentage of connection failures

Resource Utilization

•CPU Usage: Processor utilization percentage
•Memory Usage: RAM consumption and availability
•Disk I/O: Read/write operations and latency
•Network I/O: Bandwidth utilization and latency
•Database Connections: Active and idle connection counts

Performance Profiling

Application Profiling

•CPU Profiling: Identify CPU-intensive methods
•Memory Profiling: Detect memory leaks and allocation patterns
•Thread Profiling: Identify thread contention and deadlocks
•Database Profiling: Analyze query performance and execution plans

Tools

•Java: JProfiler, VisualVM, YourKit
•Node.js: Node.js Profiler, Clinic.js
•Python: cProfile, Py-Spy
•Go: pprof
•.NET: dotTrace, Visual Studio Profiler

Bottleneck Identification

•Database: Slow queries, missing indexes, N+1 queries
•Network: Latency, bandwidth limitations, connection pooling
•Application: Inefficient algorithms, excessive object creation
•External Services: Third-party API latency, rate limiting
•Caching: Cache misses, stale data, cache stampede

Performance Baselines and SLAs

Establishing Baselines

•Run tests in production-like environment
•Collect metrics over multiple runs
•Account for normal variability
•Document test conditions and data
•Store baselines in version control

SLA Definitions

•Response Time SLAs: Maximum acceptable response times
•Availability SLAs: Minimum uptime requirements (e.g., 99.9%)
•Throughput SLAs: Minimum requests per second
•Error Rate SLAs: Maximum acceptable error rate

Example SLAs

code

API Response Times:
- p50 < 200ms
- p95 < 500ms
- p99 < 1000ms

Availability: 99.9% (8.76 hours downtime/year)

Error Rate: < 0.1%

Throughput: 1000 RPS

Cloud-Based Performance Testing

Cloud Testing Benefits

•Scalable infrastructure on demand
•Geographic distribution
•Realistic load simulation
•Pay-as-you-go pricing
•Integration with cloud services

Cloud Testing Platforms

•AWS: EC2, Lambda, Fargate for distributed testing
•Google Cloud: Compute Engine, Cloud Functions
•Azure: Virtual Machines, Azure Functions
•Managed Services: BlazeMeter, LoadRunner Cloud, k6 Cloud

Cloud Testing Best Practices

•Use multiple regions for geographic testing
•Leverage auto-scaling for flexible load
•Monitor cloud costs during testing
•Clean up resources after testing
•Use cloud-native monitoring and logging

Performance Test Planning

Test Scenarios

•Define realistic user journeys
•Identify critical paths
•Include happy path and edge cases
•Account for different user types
•Consider peak and off-peak patterns

Load Models

•Constant Load: Steady user count over time
•Ramp-up Load: Gradually increase users
•Spike Load: Sudden increase in users
•Step Load: Incremental increases with plateaus
•Random Load: Variable user patterns

Test Data

•Use realistic data volumes
•Include edge cases and boundary values
•Account for data distribution
•Refresh data between test runs
•Consider data privacy and security

Environment Setup

•Mirror production configuration
•Use production-like data
•Monitor system resources
•Isolate test environment
•Document environment differences