ai-hosting

AI hosting & inference

Benchmarks, pricing matrices, and buyer guides for model gateways, serverless GPU hosts, vector infrastructure, storage, and the cheapest sane ways to run small AI backends.

start here

Need GPUs without owning GPUs?

Compare Runpod, Modal, Fal, Baseten, and Replicate first.

budget agents

Hosting workers, cron jobs, and small agent stacks cheaply

Use this when “AI hosting” really means orchestration, not local inference.

model gateways

Comparing OpenRouter, Together, Groq, Fireworks, and Cerebras

Start here if you are choosing per-token APIs instead of renting raw compute.

vector infra

When pgvector wins, when Qdrant wins, and when managed vector DB is worth it

A practical path for small RAG apps that do not need enterprise retrieval theater.

A100 rental price per hour in 2026: 40 GB and 80 GB cloud rates

A100 rental price per hour: compare current 40 GB and 80 GB cloud rates, product scopes, and warm-capacity costs before you rent.

2026-07-31
H100 rental price per hour in 2026: cloud GPU rates compared

H100 rental price per hour: compare current cloud GPU rates, pricing scopes, and always-on cost estimates before choosing an H100 provider.

2026-07-29
What GPU do you need to run Llama 70B? VRAM guide for self-hosting open models

A source-backed Llama 70B VRAM guide covering 4-bit, 8-bit, and BF16 memory estimates, quantization caveats, and practical current cloud GPU capacity options.

2026-07-29
GPU cloud free credits 2026: August check—what can actually run a GPU test?

An August 2026 source check of GPU-cloud credits: which public offers can fund a real test, which require an account upgrade, and where quota or capacity can still stop you.

2026-07-27
Baseten pricing 2026: warm-replica costs explained

Baseten pricing explained: live per-minute GPU rates, 30-day warm-replica estimates, and when scale-to-zero really lowers the bill.

2026-07-26
Replicate pricing 2026: public models vs deployments

Replicate pricing explained: when hardware seconds apply, why official models use output metrics, and the real monthly floor for a warm deployment.

2026-07-26
Modal pricing 2026: per-second GPU billing, explained

Modal pricing explained: live per-second GPU rates, plan gates, warm-capacity estimates, and the billing details that change the decision.

2026-07-25
RunPod pricing 2026: Pods vs Serverless GPU cost guide

Compare RunPod Pods and Serverless pricing: selected Secure Cloud GPU rates, monthly cost estimates, billing behavior, and when scale-to-zero saves money.

2026-07-24
Fal vs Baseten vs Modal for one warm H100 deployment (July 2026): cheapest managed floor or best serving surface?

A July 2026 HostFleet comparison of Fal, Baseten, and Modal for one warm H100 deployment, focused on published warm-capacity cost floors, billing behavior during cold starts and idle time, and which platform shape actually fits production inference.

2026-07-21
Fal for AI inference APIs and jobs (July 2026): serious serverless deployment controls, premium warm-GPU economics

A July 2026 HostFleet review of Fal for AI inference APIs and jobs, focused on custom deployment pricing, billed warm-runner behavior, queue and scaling controls, and why Fal fits serious workloads better than cheap always-on GPU hosting.

2026-07-20
Modal vs RunPod for one warm inference endpoint (July 2026): cleaner Python ergonomics or cheaper GPU control?

A July 2026 HostFleet comparison of Modal and RunPod for one warm inference endpoint, focused on warm-capacity cost floors, product-shape differences, and the tradeoff between Python app-platform ergonomics and cheaper GPU control.

2026-07-19
Baseten vs RunPod for one warm inference endpoint (July 2026): managed inference polish or cheaper GPU control?

A July 2026 HostFleet comparison of Baseten and RunPod for one warm inference endpoint, focused on warm-capacity cost floors, product-shape differences, and the tradeoff between managed inference polish and cheaper GPU control.

2026-07-18
Baseten vs Modal vs Replicate for one warm inference endpoint (July 2026): who makes you pay for readiness?

A July 2026 HostFleet comparison of Baseten, Modal, and Replicate for one warm inference endpoint, focused on 30-day warm-capacity cost floors, scale-to-zero tradeoffs, and where each platform shape really fits.

2026-07-17
Baseten for AI inference APIs and jobs (July 2026): polished dedicated inference, pricey once replicas stay warm

A July 2026 HostFleet review of Baseten for AI inference APIs and jobs, focused on dedicated deployment pricing, autoscaling defaults, cold-start tradeoffs, and why the real bill depends on whether replicas stay warm.

2026-07-16
Replicate for AI inference APIs and jobs (July 2026): fast to ship, expensive once you buy warm control

A July 2026 HostFleet review of Replicate for AI inference APIs and jobs, focused on the real cost shape of official models versus public models versus deployments, the warm-capacity tradeoff, and the operational edges buyers miss.

2026-07-12
DigitalOcean vs Hetzner Cloud for AI side projects (July 2026): clean cloud vs cheap cloud

A July 2026 HostFleet comparison of DigitalOcean vs Hetzner Cloud for AI side projects, focused on the honest always-on floor, backup and IP caveats, and where cleaner operations justify the higher bill.

2026-07-11
RunPod for AI inference APIs and jobs (July 2026): flexible GPU hosting with sharp billing edges

A July 2026 HostFleet review of RunPod for AI inference APIs and jobs, focused on the Pods vs Serverless split, warm-worker costs, prepaid-balance risk, and where RunPod beats cleaner abstractions like Modal.

2026-07-11
Modal for AI inference APIs and jobs (July 2026): brilliant for bursty GPU work, awkward as a cheap warm endpoint

A July 2026 HostFleet review of Modal for AI inference APIs and jobs, focused on the real cost shape of bursty serverless GPU workloads, the warm-capacity tradeoff, and when Modal beats cheaper always-on infrastructure.

2026-07-08
Hetzner Cloud for AI side projects (July 2026): cheapest serious self-hosted CPU, with the catches that still matter

A July 2026 HostFleet review of Hetzner Cloud for AI side projects, focused on the real all-in floor after IPv4 and backups, the June 2026 price shift, and when Hetzner still beats friendlier managed platforms.

2026-07-07
DigitalOcean Droplets for AI side projects (June 2026): what fits, what breaks, and when to pay for dedicated CPU

A June 2026 HostFleet guide to DigitalOcean Droplets for AI side projects, including the honest always-on floor, backup and volume math, and when shared CPU stops being the right trade.

2026-06-30
DigitalOcean vs Hetzner vs Hostinger for AI side projects (June 2026): the honest always-on floor

A June 2026 HostFleet comparison of DigitalOcean, Hetzner Cloud, and Hostinger VPS for always-on AI side projects, including the real monthly floor once backups, IPs, and billing rules are included.

2026-06-29
Hetzner vs Contabo vs Hostinger VPS for AI workloads (June 2026): which budget box actually fits an agent stack

Honest June 2026 comparison of Hetzner, Contabo, and Hostinger VPS plans for AI agent backends, RAG glue, and browser-automation workers — using current public pricing and real workload constraints.

2026-06-29
S3 alternatives for AI assets: R2 vs B2 vs Wasabi vs Tigris (June 2026)

A June 2026 honest comparison of Cloudflare R2, Backblaze B2, Wasabi, and Tigris for AI asset storage — model weights, embeddings dumps, training data, generated media — with current per-GB pricing, egress rules, and where each one wins.

2026-06-29
What it costs to run an AI side project on a VPS for 30 days (June 24, 2026): honest budget ranges

A practical June 24, 2026 guide to what an AI side project really costs on a VPS for 30 days, using current Hostinger, DigitalOcean, and Hetzner pricing plus explicit workload assumptions.

2026-06-24
RunPod vs Modal vs Replicate for shipping a small inference API (June 2026): who should own the endpoint, queue, and warm pool?

A June 2026 source-backed comparison of RunPod, Modal, and Replicate for small inference APIs, focused on endpoint shape, warm-capacity control, queueing behavior, and hosting tradeoffs rather than model-pricing noise.

2026-06-22
Cloudflare Workers AI vs self-hosted GPU: when each wins (June 2026)

A practical June 2026 guide to when Cloudflare Workers AI is the right edge inference layer and when a self-hosted GPU becomes the better deployment shape.

2026-06-18
Best VPS setup for LangGraph or CrewAI (June 2026): what fits on 4 GB, 8 GB, and beyond

Best VPS for LangGraph or CrewAI in 2026: when 4 GB is enough, when 8 GB is the honest floor, and which single-box setups make sense for small self-hosted agent backends.

2026-06-13
Best hosting for AI agents on a budget (June 2026): choose by workload, not by AI branding

A June 26, 2026 HostFleet refresh on budget AI agent hosting, split by scheduled jobs, always-on workers, and small self-hosted stacks.

2026-06-06
Hostinger VPS for AI side projects: what fits, what breaks, and when to upgrade (June 2026)

Hostinger VPS looks cheap for AI side projects, but the right tier depends on whether you are hosting a small agent worker, a Docker stack, or trying to squeeze several AI-adjacent services onto one box.

2026-06-03
Best hosts for long-running agent workers (June 2026): where always-on costs and queue limits bite

Best hosts for long-running agent workers in 2026: a practical comparison of Railway, Render, Fly.io, Hetzner, and DigitalOcean for always-on queues, browser workers, and small agent backends.

2026-06-02
Vector database hosting for small AI apps (May 2026): when pgvector wins, when Qdrant wins, and when managed vector DB is worth it

A practical 2026 guide to vector database hosting for small AI apps: when to keep vectors in Postgres, when to self-host Qdrant on a VPS, and when managed services like Qdrant Cloud, Pinecone, or Weaviate justify their higher floor.

2026-05-25
OpenRouter vs Together vs Groq vs Fireworks vs Cerebras: the per-token model gateways compared (April 2026)

Five major LLM API gateways, side by side: published per-token pricing, supported models, what each one routes through, and where the per-token pricing breaks down. Sourced from public pricing pages — no measured benchmarks.

2026-04-27
Serverless GPU pricing 2026: August rate matrix and deployment caveats

An August 2026 GPU rate matrix for RunPod, Modal, Baseten, and Replicate—plus the product-shape and price-transparency caveats that change the decision.

2026-04-21

Next path

Past infrastructure choice, into deployment shape

If the model or database choice is done and the open question is where the app should actually live, move into Deploy AI Apps or compare frontend hosts in Edge.