HuggingFace Deployment & AI Provider Management
Deploy HuggingFace models using serverless Inference API or dedicated endpoints with cost tracking.
Overview
AINative Studio supports multiple AI providers via factory pattern with cost tracking:
- •Providers: OpenAI, Anthropic, MiniMax, HuggingFace, TogetherAI
- •All use
BillingService.calculate_cost_with_markup()(100% markup)
HuggingFace Deployment Options
Option A: Inference API (Serverless)
Use Cases:
- •Audio/TTS models
- •Variable usage
- •Cost-sensitive deployments
- •Prototyping
URL Pattern:
code
https://api-inference.huggingface.co/models/{model_id}
Implementation:
python
provider = NousCoderProvider(
api_key=os.getenv("HUGGINGFACE_API_KEY"),
base_url="https://api-inference.huggingface.co"
)
Option B: Dedicated Endpoints
Use Cases:
- •Production models
- •Low-latency requirements
- •Custom inference configs
- •GPU-intensive models
Cost Structure (per hour):
- •T4: $0.60/hr
- •L4: $0.80/hr
- •A10G: $1.00/hr
- •A100: $4.00/hr
Implementation:
python
endpoint_service = HuggingFaceEndpointService(db)
endpoint = await endpoint_service.create_endpoint(
endpoint_name="my-audio-model",
model_repo="facebook/mms-tts-eng",
hardware_type="nvidia-t4",
auto_scale=True
)
Cost Tracking Flow
- •Provider Calculates Base Cost
python
duration_seconds = 30
cost_per_second = Decimal("0.0001")
base_cost_usd = duration_seconds * cost_per_second
- •Convert to Credits
python
base_cost_credits = int(base_cost_usd / Decimal("0.04"))
- •Apply Markup
python
billing_service = BillingService(db)
final_cost = await billing_service.calculate_cost_with_markup(
user_id=user.id,
base_cost=base_cost_credits,
developer_markup=Decimal("100.0")
)
- •Deduct from User Account
python
transaction = await billing_service.create_transaction(
user_id=user.id,
amount=-final_cost,
transaction_type="usage"
)
Key Implementation Steps
- •Create Provider Class
python
class YourProvider(BaseAIProvider):
async def generate_audio(self, text: str, **kwargs):
# Implementation
- •Register in Factory
python
AIProviderFactory._providers["your_provider"] = YourProvider
Cost Optimization Strategies
- •Use Inference API for variable audio models
- •Monitor usage patterns
- •Implement caching
- •Break-even analysis for dedicated endpoints
Environment Variables
bash
HUGGINGFACE_API_KEY="hf_xxxxxxxxxxxxxxxxxxxxx" HUGGINGFACE_BASE_URL="https://api-inference.huggingface.co"
Monitoring
python
# Check provider registration providers = AIProviderFactory.get_available_providers() # Monitor costs billing = BillingService(db) total_spent = await billing.get_total_usage(user_id, days=30)
References
- •HuggingFace Inference API Docs
- •HuggingFace Dedicated Endpoints
- •Existing Implementation:
src/backend/app/services/huggingface_endpoint_service.py
Refs #1179