PaperBanana - Academic Illustration Generator
Generate publication-quality academic diagrams using a multi-agent AI pipeline (Retriever, Planner, Stylist, Visualizer, Critic).
Prerequisites
- •PaperBanana must be installed:
pip install -e ".[google]"from the paperbanana repo - •Google Gemini API key must be configured in the project's
.envfile:Get a free key at: https://makersuite.google.com/app/apikeycodeGOOGLE_API_KEY=your-api-key-here
Environment Detection
Before running commands, detect the PaperBanana installation:
- •Try running
python -m paperbanana.cli --helpto check if paperbanana is available in the current Python environment - •If not found, search for common locations:
- •Check if a
paperbananadirectory exists in the current workspace - •Look for virtual environments (
.venv,venv, conda envs) that may have it installed
- •Check if a
- •Set the working directory to where PaperBanana is installed (contains
paperbanana/package and.env)
Command pattern:
cd <paperbanana_project_dir> && python -m paperbanana.cli <command> <args>
On Windows with conda, you may need:
powershell -Command "Set-Location '<paperbanana_dir>'; & '<python_path>' -m paperbanana.cli <command> <args> 2>&1"
Set timeout to 300000 (5 minutes) for generate/plot commands since they call external APIs.
Usage Modes
Mode 1: generate - Methodology Diagrams
Generate a methodology/architecture diagram from text.
When user provides a file path:
python -m paperbanana.cli generate --input '<file_path>' --caption '<caption>' --iterations 2
When user provides inline text (no file):
- •Write the user's methodology text to a temporary file (e.g.,
temp_input.txt) in the PaperBanana project directory - •Run the generate command with that temp file as
--input - •Clean up the temp file after completion
Parameter guidelines:
- •
--caption: Extract or craft a concise figure caption from the user's description. Should describe what the diagram communicates (e.g., "Overview of the proposed encoder-decoder architecture"). - •
--iterations: Default to 2 for balanced quality/speed. Use 1 for quick drafts, 3 for highest quality. - •
--vlm-model: Default isgemini-3-flash-preview. For complex diagrams, suggestgemini-3-pro-preview. - •
--output: Optional output path. If not specified, saves tooutputs/<run_id>/final_output.png.
Mode 2: plot - Statistical Plots
Generate statistical plots from CSV or JSON data.
python -m paperbanana.cli plot --data '<data_file>' --intent '<intent>' --iterations 2
- •
--data: Path to CSV or JSON file - •
--intent: What the plot should communicate (e.g., "Bar chart comparing model accuracy across datasets")
Mode 3: evaluate - Diagram Evaluation
Compare a generated diagram against a human reference.
python -m paperbanana.cli evaluate --generated '<gen_path>' --reference '<ref_path>' --context '<text_path>' --caption '<caption>'
Argument Parsing
Parse $ARGUMENTS as follows:
| Input Pattern | Action |
|---|---|
generate <file.txt> <caption> | Use file as --input, text as --caption |
generate <description text> | Write text to temp file, auto-generate caption |
plot <data.csv> <intent> | Use file as --data, text as --intent |
evaluate <gen.png> <ref.png> | Use as --generated and --reference |
| Just a description (no subcommand) | Default to generate mode |
If the user provides only a description without specifying a mode, default to generate.
After Generation
- •Parse the output to find the image path (look for "Output saved to:" in stdout)
- •Use the Read tool to display the generated image to the user
- •Report the Run ID, iteration count, and any Critic feedback
- •If the Critic suggested revisions, inform the user they can re-run with more iterations
Examples
User: /paperbanana generate method.txt Overview of sparse attention transformer
Action: Run generate with the provided file and caption.
User: /paperbanana Our model consists of an encoder that processes input features through 3 convolutional layers, followed by a decoder with skip connections
Action: Write text to temp file, generate caption "Overview of the proposed encoder-decoder architecture with skip connections", run generate.
User: /paperbanana plot results.csv Bar chart comparing accuracy of ResNet vs VGG vs EfficientNet
Action: Run plot with the CSV file and the intent description.