Debugging Issues
Common issues and solutions for inference.sh apps.
Import Errors
"ModuleNotFoundError" in Production
- •
Add
__init__.pyfiles to all packages - •
Add current directory to Python path:
python
import sys, os sys.path.append(os.path.dirname(os.path.abspath(__file__)))
- •For local packages, use editable installs in requirements.txt:
txt
-e ./local_package_directory
Memory Issues
CUDA Out of Memory
- •Reduce batch size
- •Use
torch.float16orbfloat16 - •
model.gradient_checkpointing_enable() - •
torch.cuda.empty_cache()after requests - •Increase
vramin inf.yml
Memory Leaks
Clean up after each request:
python
import gc, torch
async def run(self, input_data):
result = self.process(input_data)
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
return result
Device Errors
"Expected all tensors to be on the same device"
Ensure all tensors are on the same device:
python
input_tensor = input_tensor.to(self.device)
"CUDA not available"
- •Check
inf.ymlGPU requirements:
yaml
resources:
gpu:
count: 1
vram: 24 # 24GB
- •Use device detection:
python
from accelerate import Accelerator device = Accelerator().device
Model Loading Errors
"Token required for gated model"
Add HF_TOKEN to secrets:
yaml
secrets:
- key: HF_TOKEN
description: HuggingFace token for gated models
"File not found" After Download
Don't assume file paths:
python
model_path = snapshot_download(repo_id="org/model")
config_path = os.path.join(model_path, "config.yaml")
if os.path.exists(config_path):
# Load config
File Path Issues
Temporary Files Deleted Too Early
Use delete=False:
python
with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as tmp:
output_path = tmp.name
Path Separators
Use os.path.join:
python
# Good
path = os.path.join("models", "config", "settings.json")
Dependency Issues
Version Conflicts
Pin compatible versions:
txt
torch==2.6.0 numpy>=1.23.5,<2
Debug Logging
python
import logging
logging.basicConfig(level=logging.DEBUG)
async def setup(self, config):
logging.debug(f"Config: {config}")
logging.info("Starting model load...")