Run Benchmark Tool

Purpose

Compare Gemini File API vs Inline document approaches for performance.

What It Tests

•File API: Upload documents once, reuse cached URIs for multiple queries
•Inline: Send raw document bytes with each request

The benchmark shuffles document order each round to prevent Gemini's native caching from affecting results.

Running the Benchmark

Basic Usage

bash

export GEMINI_API_KEY="your-key"
make build
./bin/benchmark -docs test_loan_files/loan_file_1_LN-2024-001847

With Options

bash

./bin/benchmark \
  -docs /path/to/documents \
  -rounds 20 \
  -max-docs 10 \
  -json

CLI Flags

code

-docs string      Directory containing documents (required)
-rounds int       Number of test rounds per method (default 10)
-max-docs int     Maximum income documents to use (default 6)
-json             Output results as JSON
-income           Only use income documents (default true)

Using Makefile

bash

make benchmark DOCS=test_loan_files/loan_file_1_LN-2024-001847 ROUNDS=10

Understanding Results

Output Structure

code

PHASE 1: INLINE DOCUMENTS
  Round 1: [shuffled order] -> time, tokens
  ...

PHASE 2: FILE API
  Upload: X seconds (one-time)
  Round 1: [shuffled order] -> time, tokens
  ...

FINAL COMPARISON
  - Total time comparison
  - Average per-query time
  - Token usage
  - Winner & speedup factor
  - Break-even analysis

Key Metrics

Metric	Meaning
Upload time	One-time cost for File API
Total time	Sum of all operations
Avg per round	Mean time per query
Min/Max round	Query time variance
Speedup	How much faster winner is
Break-even	Queries needed for File API to win

Interpreting Results

File API wins when:

•Many queries against same documents
•Break-even point is low (< 10 queries)
•Per-query savings compound

Inline wins when:

•Few queries (< break-even)
•Different documents each time
•Simplicity preferred

Example Output

code

TIMING COMPARISON
┌─────────────────┬──────────────────┬──────────────────┐
│ Metric          │ File API         │ Inline Docs      │
├─────────────────┼──────────────────┼──────────────────┤
│ Upload (1x)     │           1.976s │              N/A │
│ Total time      │        1m13.733s │        1m14.454s │
│ Avg per round   │           7.176s │           7.445s │
└─────────────────┴──────────────────┴──────────────────┘

BREAK-EVEN ANALYSIS
   Upload overhead:      1.976s
   Savings per query:    270ms
   Break-even at:        7.3 queries

Why Shuffled Order?

Documents are shuffled each round because:

•Gemini may cache based on content/order
•Shuffling ensures each query is "fresh"
•Gives accurate per-query timing
•More realistic for production workloads

Test Questions

The benchmark uses varied income-related questions:

•Annual/monthly income extraction
•Employer information
•YTD income calculation
•Deductions and withholdings
•Income source classification
•Tax year coverage
•And more...

Recommendations

Scenario	Recommendation
Underwriter iterating on loan	File API
One-off document analysis	Inline
Batch processing same docs	File API
Real-time different docs	Inline

Related Files

•cmd/benchmark/main.go - Benchmark implementation
•internal/gemini/client.go - Both API approaches
•internal/gemini/cache.go - File caching logic