Every LLM integration involves trade-offs: who controls your data, how much you'll pay, and whether you can switch providers.
When you paste sensitive business data into a chat application, it passes through their servers. When you build on a proprietary API, you're betting your application on someone else's pricing decisions and terms of service. When a provider shuts down a model you depend on, your production system breaks overnight.
Provider promises about data usage don't eliminate risk. Models can memorize and reproduce training data—leaking API keys and proprietary details during normal use. Terms of service can change, and data breaches at AI providers can expose your personal data.
Teams are waking up to the risks of vendor lock-in, data exposure, unpredictable costs, and losing the ability to migrate their applications. But the alternatives—open models, dedicated infrastructure, self-hosting—come with their own price tags and complexity that most guides conveniently skip over.
This article maps the LLM landscape as a spectrum of sovereignty and cost. I've organized it into five classes, from zero-control browser apps to fully self-hosted infrastructure. For each class, I examine the trade-offs between cost, control, and vendor dependency.
Let's break down where you stand in each class—and what it takes to move toward greater control.
The LLM Landscape
Class 1 starts with browser-based tools like ChatGPT—zero setup, maximum convenience, minimal control. Class 5 ends with self-hosted GPU clusters—full sovereignty, maximum responsibility. Between these extremes lie API integrations (Classes 2-3) and managed dedicated infrastructure (Class 4), each representing different balances of cost, control, and vendor dependency.

The diagram above maps these classes on a spectrum from low to high sovereignty. Most teams start in Class 1, using LLMs through a web interface, and move toward greater independence as their needs evolve. Each transition brings more control over your data and infrastructure, but also introduces new costs and technical requirements.
The path isn't always linear—teams may shift between classes as priorities change between cost optimization, compliance requirements, and operational complexity. The key is understanding where you need to be based on your actual constraints.
Class 1: Closed LLMs via UI (No Control)
Cost Model: Freemium to Subscription ($0-20/month)
Class 1 includes all services delivered as browser applications, mobile apps, or desktop apps—whether from LLM providers directly (ChatGPT, Claude, Gemini) or third parties building on their APIs (Poe, Perplexity).
Who This Is For
This class serves individuals exploring AI capabilities without technical setup, teams experimenting before committing to infrastructure, and organizations that need quick answers without IT involvement. It's ideal for proof-of-concept work and rapid prototyping where speed matters more than control.
Advantages
The main strength is immediate access with zero technical barriers. Most services offer free tiers that let you test LLM capabilities before paying anything. The $20/month Pro tier has become standard across major providers—ChatGPT Plus, Claude Pro, and Gemini Advanced all sit at this price point, unlocking higher usage limits and access to special features.
Limitations
Platforms impose strict constraints that become apparent quickly. Different tiers typically limit your messages for a time frame, leading to breaks in productivity. You're restricted to specific models and context windows with no ability to customize, fine-tune, or access APIs. Integration with internal tools or workflows is impossible, and usage caps inevitably push power users toward higher tiers.
Data Control
Data governance is entirely in the provider's hands. All conversations live on their servers with no data residency guarantees. While ChatGPT and Claude offer JSON exports of your conversation history, there's no standardized format. The export cannot help to migrate from one provider to another, it can be used only to analyze your data.
The key risks include:
- Deletion requests may not remove data from backups or training datasets
- No audit trail of who accessed your data or verification of true deletion
- Account suspension or service shutdown could mean complete data loss
While switching between Class 1 providers is technically easy (just create a new account), your conversation history and team workflows don't transfer. Organizations that build standard operating procedures around specific providers face significant retraining costs when switching.
Vendor Lock-In Risk: High
Migration Gotchas (Class 1 → 2):
- Chat workflows must be translated into prompts and tools.
- Costs become variable and must be monitored
- Data policies must be defined before integration
Class 2: Closed LLMs via API (Limited Control)
Cost Model: Pay-Per-Use Token Pricing
Class 2 opens programmatic access to proprietary models through APIs. Major providers include OpenAI (GPT-5), Anthropic (Claude), Google (Gemini), and Microsoft (Azure OpenAI). Instead of clicking through a web interface, you send requests programmatically and receive responses that can be further processed.
Who This Is For
This class primarily serves developers integrating LLMs into existing applications or building new AI-powered products. It's also accessible to semi-technical users through low-code platforms like n8n, Dify, or Langflow, which provide visual workflows that call APIs behind the scenes. The barrier to entry is understanding API keys, authentication, and basic request/response patterns.
Advantages
Moving to APIs unlocks capabilities impossible in Class 1. You can embed LLM features directly into your applications, automate workflows without manual copying and pasting, customize the user experience, and build products around LLM capabilities. You gain control over logging and metrics—tracking exactly how your users interact with the model, measuring costs per feature, and optimizing prompts based on real usage data. Unlike Class 1's message limits, you pay only for what you use.
Cost Structure
Pricing shifts from monthly subscriptions to usage-based tokens. Each major provider offers multiple model tiers with different price-performance trade-offs. As of January 2026, here are some examples:
Anthropic (Claude 4.5 family):
- Claude Haiku 4.5: $1 per million input tokens, $5 per million output tokens (fast)
- Claude Sonnet 4.5: $3 per million input tokens, $15 per million output tokens (balanced)
- Claude Opus 4.5: $5 per million input tokens, $25 per million output tokens (premium)
Source: Anthropic Pricing
OpenAI (GPT family):
- GPT-5.2 Mini: $0.25 per million input tokens, $2 per million output tokens (fast)
- GPT-5.2: $1.75 per million input tokens, $14 per million output tokens (balanced)
- GPT-5.2 Pro: $21 per million input tokens, $168 per million output tokens (premium)
Source: OpenAI Pricing
Google (Gemini 2.5 family):
- Gemini 2.5 Flash-Lite: $0.10 per million input tokens, $0.40 per million output tokens (fast)
- Gemini 2.5 Flash: $0.30 per million input tokens, $2.5 per million output tokens (balanced)
- Gemini 2.5 Pro: $1.25 per million input tokens, $10 per million output tokens (premium)
Note: prices for ≤200k tokens
Source: Gemini Pricing
Notice the pattern: all three major providers have adopted a three-tier pricing strategy—budget models for high-volume tasks, balanced mid-tier models for general use, and premium models for complex reasoning. This market structure allows you to optimize costs by routing simple queries to cheaper models while reserving expensive models for tasks that truly need their capabilities.
For context, 1 million tokens is roughly 750,000 words. A typical API call might cost $0.001-0.10 depending on length and model choice. Costs can become unpredictable—a poorly designed application can generate high charges in a short period of time. This can be mitigated by using prepaid accounts and enabling limits on your credit usage.
Since pricing changes frequently and varies significantly across providers and models, you can check current rates at llm-prices.com.
Limitations
While you gain programmatic control, you inherit operational responsibilities. You must handle rate limits, implement error handling and retry logic, monitor latency that varies with provider load, and secure API keys properly. Applications become dependent on external uptime—when the provider's API goes down, your application breaks. You're also subject to breaking changes in API endpoints, requiring code updates to maintain compatibility.
Data Control
You can build application logic that filters what gets sent to the API, but everything you transmit still passes through the provider's infrastructure. This allows you to implement data governance policies—for example, stripping personally identifiable information before API calls—but doesn't change the fundamental reality: the provider processes your requests on their servers. You gain visibility through your own logging systems but not over where the processing happens.
Vendor Lock-In Risk: High
Each provider has proprietary API formats, models, and pricing structures. While some providers offer OpenAI-compatible endpoints, migrating between providers requires code changes, prompt re-optimization for different model behaviors, and cost recalculation. Applications built on GPT-5 cannot simply switch to Claude without testing and modification. The models themselves are not portable—you cannot download them or move to on-premises infrastructure.
Migration Gotchas (Class 2 → 3):
- API compatibility
- Prompts often require retuning
- Evaluation baselines before switching
Class 3: Open LLMs via API (Shared Control)
Cost Model: Pay-Per-Use Token Pricing (Competitive)
Class 3 represents a critical shift: you still use APIs and pay per token like Class 2, but now you're accessing open-weights models rather than proprietary ones. Providers like Groq and Cerebras compete on infrastructure speed rather than model exclusivity—both host similar open models (Llama, Qwen, GPT-OSS) but differentiate through custom hardware that delivers dramatically faster inference than traditional GPUs.
Who This Is For
This class serves teams who want API convenience without permanent vendor lock-in. It's ideal for organizations evaluating long-term migration to self-hosting, companies with data sovereignty concerns planning eventual on-premises deployment, and developers building products where model portability matters. It's also perfect for teams who need production-grade speed but want the flexibility to switch providers or self-host.
Advantages
The defining advantage is infrastructure competition without model lock-in. Because models are open, multiple providers can host identical models, competing on speed, price, and reliability. If Groq's pricing becomes unfavorable, you can switch to Cerebras running the same Llama model with minimal code changes—both providers offer OpenAI-compatible APIs, making migration a matter of changing environment variables.
Beyond portability, Class 3 providers often deliver speeds that exceed proprietary APIs. Groq's Language Processing Unit (LPU) architecture achieves 840 tokens/second on Llama 3.1 8B, while Cerebras's Wafer-Scale Engine reaches 2,200 tokens/second on the same model—dramatically faster than GPT-5 or Claude. This speed advantage enables real-time applications, multi-step reasoning workflows, and agentic systems that would be prohibitively slow on traditional APIs.
You can also test applications in the cloud and calculate exact hardware requirements before committing to self-hosting infrastructure. Unlike Class 2's proprietary models, you have a clear migration path: start with a managed API, measure actual token usage and performance needs, then move to dedicated infrastructure (Class 4) or full self-hosting (Class 5) when volumes justify the investment.
Class 3 is the most practical exit strategy class: build and benchmark via APIs, then redeploy the same weights on dedicated or self-hosted infrastructure when needed.
Cost Structure
Pricing is generally lower than proprietary models and varies significantly between providers hosting identical models. As of January 2026:
Groq Pricing:
- Llama 3.1 8B: $0.05 per million input tokens, $0.08 per million output tokens (840 tokens/second)
- Llama 3.3 70B: $0.59 per million input tokens, $0.79 per million output tokens (394 tokens/second)
- GPT OSS 120B: $0.15 per million input tokens, $0.60 per million output tokens (500 tokens/second)
Source: groq.com/pricing
Cerebras Pricing:
- Llama 3.1 8B: $0.10 per million input tokens, $0.10 per million output tokens (2,200 tokens/second)
- Llama 3.3 70B: $0.85 per million input tokens, $1.20 per million output tokens (2,100 tokens/second)
- GPT OSS 120B: $0.35 per million input tokens, $0.75 per million output tokens (3,000 tokens/second)
Source: cerebras.ai/pricing
Notice that Cerebras charges roughly 2x more than Groq for Llama 3.1 8B ($0.10 vs $0.05-0.08) but delivers nearly 3x the speed (2,200 vs 840 tokens/second). This price-performance trade-off varies by provider—some optimize for lowest cost, others for highest speed.
Both providers also offer generous free tiers, batch processing discounts (50% off), and prompt caching that reduces costs for repeated queries. Since you're not locked to a single vendor, you can optimize by choosing the fastest provider for latency-sensitive workloads and the cheapest for batch processing.
For more comparisons across providers, check Artificial Analysis.
Limitations
You inherit the same operational challenges as Class 2: rate limiting, error handling, API key management, and dependency on external uptime. Multi-provider strategies can actually complicate operations—you might need separate integrations for each provider, different rate limit handling, and provider-specific optimizations.
Open models score well on many benchmarks but typically won't match GPT-5 or Claude Opus 4.5 on the hardest reasoning tasks. You're trading cutting-edge capability for flexibility, speed, and cost savings. For many applications, this trade-off makes perfect sense—a Llama model running at 2,100 tokens/second can feel more responsive than a slower proprietary model even if the underlying intelligence is slightly lower.
Data Control
Like Class 2, your data passes through provider servers during inference. However, open models offer more transparency—the model weights are public, and you can verify behavior matches the official releases. Both Groq and Cerebras explicitly state they don't train on your data, and because the models are already open, there's no incentive to do so.
More importantly, you can validate this claim. Download the same Llama 3.1 weights, run them locally, and verify outputs match the API results. This level of auditability is impossible with GPT-5 or Claude, where the models are black boxes.
You can implement the same filtering and governance policies as Class 2, with the added assurance that you could run the exact same model on your own infrastructure later for compliance audits or air-gapped deployments.
Vendor Lock-In Risk: Medium
This is where Class 3 excels. While you're still dependent on APIs and can't instantly migrate without code changes, switching between providers hosting the same model is dramatically easier than in Class 2. Your prompts work identically across providers (same model = same behavior), and both Groq and Cerebras offer OpenAI-compatible APIs, making integration nearly plug-and-play.
More importantly, you have a clear path to Classes 4 and 5. You can download the exact Llama 3.1 weights you've been testing via API and run them on your own infrastructure whenever you're ready. Class 3 becomes a staging ground—test in the cloud at API prices, then graduate to dedicated infrastructure when your token volumes justify it.
The main risk isn't lock-in but rather version fragmentation. Providers may use different quantizations (16-bit vs 8-bit), different context window implementations, or slightly different inference optimizations that affect outputs. However, this risk is substantially lower than being permanently locked to a model with no migration options.
Migration Gotchas (Class 3 → 4):
- Capacity planning becomes mandatory
- Runtime and quantization differences affect performance
- Evaluation baselines before switching
Class 4: Open LLMs as a Service (High Control)
Cost Model: Dedicated GPU Hosting (Hourly/Monthly)
Class 4 represents the Database-as-a-Service model for LLMs. You're still using a managed service, but now you get dedicated infrastructure—your own GPUs that no one else shares. Think AWS RDS versus running Postgres yourself: same software, dramatically different operational burden. Providers handle deployment, scaling, monitoring, and updates while you control which models run, where data resides, and who has access.
Who This Is For
This class serves organizations that need production-grade performance without building infrastructure teams. It's ideal for companies with data residency requirements (GDPR, HIPAA compliance), teams experiencing unpredictable performance on shared APIs (Class 3), and businesses ready to commit to monthly infrastructure costs for guaranteed capacity. It's also perfect for organizations evaluating whether to eventually self-host (Class 5) but not yet ready to manage clusters and model servers.
Advantages
The defining advantage is predictable performance with managed operations. Your models run on dedicated GPUs, so inference latency never spikes because another customer is hammering the same hardware. You get consistent throughput, reliable SLA guarantees, and the ability to architect for specific performance targets—critical for customer-facing applications where consistency is important.
Beyond performance isolation, you gain operational simplicity that Class 5 doesn't offer. Providers handle model loading, auto-scaling configuration, health monitoring, zero-downtime updates, and infrastructure maintenance. Your team focuses on application logic and model optimization rather than GPU driver updates and cluster management.
Data residency becomes enforceable. Unlike Class 3's multi-tenant clouds where you hope data stays in the right region, Class 4 providers offer region-locked deployments with contractual guarantees. You can specify EU-only infrastructure, configure private network access, and audit the exact data center location—essential for regulated industries like healthcare and finance.
Cost Structure
Pricing shifts from per-token (Classes 2-3) to per-GPU-hour or per-minute billing. You pay for dedicated hardware whether you're using it heavily or lightly, though most providers support scale-to-zero during idle periods. As of January 2026:
Scaleway Managed Inference:
- Llama 3.1 8B on L4 GPU: €0.93/hour (~€679/month)
- Llama 3.1 70B on H100 GPU: €3.40/hour (~€2,482/month)
- Mixtral 8x7B on H100 GPU: €3.40/hour (~€2,482/month)
- Mistral 7B on L4 GPU: €0.93/hour (~€679/month)
Source: Scaleway Managed Inference Pricing
Baseten Dedicated Deployments:
- T4 GPU (16 GiB): $0.63/hour (~$460/month)
- L4 GPU (24 GiB): $0.85/hour (~$621/month)
- A100 GPU (80 GiB): $4.00/hour (~$2,920/month)
- H100 GPU (80 GiB): $6.50/hour (~$4,745/month)
- B200 GPU (180 GiB): $9.98/hour (~$7,283/month)
Source: Baseten Pricing
Notice the infrastructure choice impact on costs: running Llama 3.1 8B on entry-level GPUs (Scaleway L4 at €679/month or Baseten L4 at $621/month) costs roughly the same across providers, while larger models on H100s vary more significantly (Scaleway €2,482/month vs Baseten $4,745/month). These price differences reflect infrastructure optimization strategies—Scaleway pre-optimizes specific model/GPU combinations, while Baseten offers more flexible hardware selection.
Compare this to Class 3's per-token pricing: if you're processing 50+ million tokens daily with consistent load, dedicated infrastructure often becomes more economical while delivering guaranteed performance. The breakeven point typically occurs around 100-200 million tokens monthly, depending on model size and provider rates.
Limitations
You inherit the economic challenge of committed capacity. Unlike Class 3's pay-per-token flexibility, you're paying for GPU time whether traffic is high or low. A marketing campaign that 10x's your inference load doesn't cost 10x more in Class 3, but in Class 4 you need to provision (and pay for) enough GPUs to handle peak demand. Most providers offer auto-scaling, but you're still paying for minimum replica counts.
You also trade some model flexibility for operational simplicity. While providers support any open-source model, deploying a brand-new architecture or custom inference optimization requires more coordination than in Class 5. You're dependent on the provider's inference stack and deployment tooling—generally excellent (Baseten's Inference Stack, Scaleway's optimized runtime) but not infinitely customizable.
Region and hardware availability varies by provider. Scaleway excels at EU data sovereignty but currently only operates in France.
Data Control
This is where Class 4 shines for compliance teams. Data flows through dedicated infrastructure you control, with contractual guarantees about residency and access. Scaleway's EU-only infrastructure means data never leaves European data centers—no "we try to keep it in Europe" ambiguity. Baseten offers region-locked deployments with HIPAA and SOC 2 Type II certification, enabling healthcare and finance deployments that would fail Class 2-3 audits.
However, you're still dependent on the provider's operational practices. They manage the underlying infrastructure, apply security patches, and configure network isolation. This is vastly better than shared APIs, but not quite the same as running everything in your own data center (Class 5).
Vendor Lock-In Risk: Medium-Low
Lock-in risk drops significantly compared to Class 2 because models are portable. You're running the same Llama 3.1 weights available everywhere—if Scaleway's pricing becomes unfavorable, you can download those exact weights and deploy to Baseten, or even graduate to Class 5 self-hosting. Your application code works identically because most providers offer OpenAI-compatible APIs.
The clearest migration path is to Class 5: when operational maturity and token volumes justify it, you take the exact model weights you've been running in Class 4 and deploy them to your own infrastructure. Class 4 becomes a proving ground—validate the economics and architecture in a managed environment, then graduate to full ownership when ready.
Migration Gotchas (Class 4 → 5):
- Infrastructure knowledge becomes mandatory
- Bare-metal behavior differs from managed stacks
- You must standardize deployment and monitoring
Class 5: Open LLMs Self-Hosted (Full Control)
Cost Model: Root Server Rental or Hardware Ownership
Class 5 represents complete infrastructure sovereignty. You're no longer calling someone else's API or trusting their managed service—you run the entire stack yourself. This splits into two distinct paths: renting dedicated GPU servers with root access where you manage everything above the metal, or purchasing hardware outright and owning the full depreciation curve. Both paths give you absolute control over data, configurations, and optimizations, but demand infrastructure expertise that Classes 1-4 don't require.
Who This Is For
This class serves organizations that cannot tolerate any external infrastructure dependency. It's ideal for defense contractors and government agencies requiring air-gapped deployments, financial institutions where data residency is legally mandated and audited, research labs needing custom kernel modifications or experimental model architectures, and high-volume applications where Class 4's managed pricing becomes prohibitively expensive at scale. It's also perfect for teams with existing GPU infrastructure expertise who view operational burden as an investment in control rather than a cost.
Advantages
The defining advantage is absolute data sovereignty and infrastructure control. Your inference requests never traverse external networks, model weights never leave your servers, and you can implement air-gapped deployments where models run completely isolated from the internet. For regulated industries, this isn't a nice-to-have—it's often the only acceptable architecture.
You also gain optimizations that managed services can't match. Want to implement experimental quantization schemes? Modify vLLM's scheduler for your specific workload? Run custom CUDA kernels? In Class 5, you have root access to everything. You can squeeze every percentage point of performance from your GPUs because you control the entire stack from Linux kernel parameters to inference engine configuration.
Cost Structure
Class 5a: Dedicated Root Servers (Rented Infrastructure)
Lambda Labs (GPU-optimized cloud):
- 1x H100 SXM (80 GB): $2.49/hour (~$1,817/month)
- 1x A100 (80 GB): $1.29/hour (~$942/month)
- 8x H100 SXM cluster: $19.92/hour (~$14,540/month)
Source: Lambda Labs Pricing
Features:
- On-demand hourly billing or reserved instances with discounts
- Pre-configured with CUDA, Docker, ML frameworks
- US-based infrastructure
Netcup vGPU Root Servers (European provider):
- RS 4000 G11s (7 GB): Starting at €98.77/month
- RS 8000 G11s (14 GB): Starting at €188.52/month
Source: Netcup vGPU Servers
Features:
- Virtual GPU slices on dedicated root servers
- Full root access with customizable configurations
- German data centers (EU data residency)
Class 5b: Owned Hardware (Capital Investment)
NVIDIA DGX Spark (Workstation/Small Server):
- Designed for small teams and individual developers
- Compact form factor suitable for office deployment
- Estimated cost: $4,000-5,000 depending on configuration
- Can function as both development workstation and inference server
Source: NVIDIA DGX Spark
NVIDIA DGX Systems (Enterprise Scale):
- DGX H100: 8x H100 80GB GPUs (~$300,000-400,000)
- DGX A100: 8x A100 80GB GPUs (~$150,000-200,000)
Custom Builds:
- Single H100 PCIe: ~$30,000-40,000
- 8x H100 custom server: ~$250,000-350,000 (excluding networking/infrastructure)
The CPU Alternative: Small Language Models (SLMs)
Not every use case requires expensive GPUs. Small Language Models (1-8 billion parameters) can run effectively on modern CPUs for lower-throughput applications. Models like Phi-3, Gemma 2B, or Llama 3.2 3B deliver impressive capabilities while running on conventional server hardware.
A 32-core CPU server can handle SLM inference for document analysis, classification tasks, or internal tooling where sub-second latency isn't critical. For teams just starting with self-hosting, CPU-based SLMs offer a lower-cost entry point before investing in GPU infrastructure.
For a comprehensive guide on running local LLMs on CPU hardware, including setup instructions and model recommendations, see Getting Started with Local LLMs.
Limitations
This is where most organizations underestimate Class 5. You're not just running models—you're operating infrastructure. This means:
- Installing and maintaining CUDA drivers across kernel updates
- Configuring vLLM, Text Generation Inference, or TensorRT-LLM with optimal parameters for your hardware
- Setting up monitoring (Prometheus, Grafana) to track GPU utilization, memory pressure, and inference latency
- Implementing high availability: what happens when a GPU fails at 2 AM?
- Securing your infrastructure: firewalls, network isolation, access controls, vulnerability patching
For owned hardware, add facilities management: power, cooling, network connectivity, physical security.
You also inherit the model deployment pipeline. In Class 4, you upload model weights and the provider handles optimization. In Class 5, you're deciding quantization schemes (FP16? FP8? INT4?), configuring tensor parallelism across GPUs, and debugging why your throughput is lower than benchmarks claim. This requires ML systems expertise that takes years to develop.
Data Control
This is Class 5's existential advantage: perfect data control. Inference happens entirely within your infrastructure. You can run air-gapped deployments where servers have zero internet connectivity, implement network isolation where inference traffic never leaves your VPC, and enforce encryption at rest with keys you control, not cloud provider keys.
For compliance audits, you own the complete audit trail. Regulators ask "where does data go?" and you can demonstrate with network diagrams that data never leaves your data center. This level of proof is impossible with external APIs.
However, with great control comes great responsibility. You're accountable for security patching, access logging, and incident response. A misconfigured firewall rule in Class 5 can expose your entire model deployment—there's no provider security team watching your back.
Vendor Lock-In Risk: Very Low
Lock-in nearly disappears. You own the model weights, the infrastructure, and the inference code. Want to switch from vLLM to TensorRT-LLM? It's not a vendor negotiation. Need to move from one data center to another? Pack up the servers and redeploy.
The only "lock-in" is technical inertia. If you've built extensive automation around Lambda Labs' API or optimized heavily for a specific GPU architecture, migration carries costs. But these are technical debts you control and can pay down incrementally, not contractual dependencies.
The real risk isn't lock-in—it's abandonment. Class 5 requires sustained expertise. If your infrastructure team leaves and you can't hire replacements, you're stuck maintaining systems you don't understand. This is why Class 4 exists: some organizations deliberately choose managed services to avoid operational risk.
Operational Gotchas:
- Security risk shifts to misconfiguration
- Staff turnover becomes an existential risk
- Hardware and drivers age faster than you expect
Migration Path
Class 5 is often the graduation point, not the starting point. The typical journey:
- Start in Class 3 to validate the application
- Move to Class 4 when volume justifies dedicated infrastructure
- Graduate to Class 5a when operations expertise develops
- Eventually reach Class 5b when economics and scale demand it
This progression lets you build expertise incrementally while avoiding premature infrastructure investment.
Conclusion: Sovereignty Is a Journey
The most common mistake I see isn't choosing the "wrong" class—it's staying in one class after your requirements have changed.
Startups often burn thousands of dollars on Class 2 APIs because they fear the migration setup of Class 3. Conversely, enterprise teams waste months trying to build Class 5 setups for internal tools that could have been solved safely with a Class 4 managed instance.

Your goal is not to reach Class 5 immediately. Your goal is to maintain the optionality to move there.
By choosing open weights (Class 3) and standard architectures today, you buy yourself the cheapest insurance policy in the AI industry: the ability to pick up your intelligence and leave when the terms no longer serve you.
Don't let your data become a hostage to someone else's business model. Start building your exit strategy today.