How Usage-Based Pricing Works in SaaS
Usage-based pricing (also called metered billing or consumption pricing) charges customers based on what they consume — API calls, tokens, GB, builds, transactions, events — rather than a flat monthly subscription. Four structures dominate the market in 2026. Pure usage bills a single rate per unit (Twilio SMS, AWS S3 GET requests). Tiered volume pricing steps prices down across breakpoints (Snowflake credits, OpenAI per-token tiers). Base + overage charges a flat fee that includes a quota, then bills per unit beyond it (most analytics SaaS, Mailgun, SendGrid). Credit-based packs sell prepaid bundles that customers consume over time (CircleCI build credits, dev-tool plans). Each structure has different revenue, margin, and predictability characteristics — and this usage based pricing calculator lets you model any of them against the same synthetic customer base. The right structure depends on your customer distribution. Pure usage maximizes alignment with value but produces high monthly variance. Tiered pricing rewards larger customers and is the default for developer APIs. Base + overage gives you a predictable revenue floor plus expansion. Credit packs hide unit prices (good for margin) and create upfront cash. Most successful metered SaaS — Snowflake, Twilio, Datadog, OpenAI — actually run hybrids: a base subscription plus consumption above it.
Per-Token Pricing for AI and LLM APIs
Per-token pricing is the single most-searched billing model in the AI infrastructure category right now. The math is straightforward but the cost stack is brutal. Per-token revenue equals tokens consumed × $/token. Per-token cost equals tokens consumed × inference COGS. Margin is revenue minus cost, and at low tiers it can go negative. As of early 2026, OpenAI publishes GPT-4o at roughly $0.005/1K input tokens and $0.015/1K output tokens; Anthropic publishes Claude Haiku at roughly $0.0008/1K input and $0.004/1K output; self-hosted Llama 3 runs roughly $0.0001–$0.0003/1K depending on GPU pricing — verify current vendor rate cards before pricing your own plan. If you wrap GPT-4 and resell at $0.01/1K tokens with 1,000 customers averaging 100K tokens each, you generate $1,000 in monthly revenue but pay OpenAI roughly $500 — your gross margin is 50%, before infra and support. Most AI startups use input-vs-output token splits (output is typically 2–4× more expensive), and context-window multipliers matter as longer prompts inflate the per-call cost dramatically. Always model your highest-volume tier against your worst-case (longest prompt + most output tokens) inference cost — that is where bleeding margins hide.
Designing API Pricing Tiers
Tier breakpoints work best when they are anchored to your actual customer distribution rather than picked as round numbers. The standard playbook: sort customers by usage, place breakpoints at the 50th, 75th, 90th, and 95th percentiles, and price each tier at a step-down rate (e.g. 100% → 80% → 60% → 40% of pure rate). Tiered usage pricing follows three rules. First, the top tier exists to anchor enterprise — never close the top end (open-ended top tier). Second, never price below COGS in any tier (the per-tier margin engine in this tool flags bleeding tiers in red). Third, target 5%+ of customers in every tier; an under-populated tier is a design mistake — practitioner heuristic, not a hard rule, but a reliable signal that breakpoints need to move. The tier designer here lets you reshape breakpoints live and recomputes revenue and margin instantly. Use the "tier design for max revenue" reverse mode to grid-search optimal configurations across 3-, 4-, and 5-tier candidates while keeping margin above 50%.
Power-Law Customer Concentration and the Gini Coefficient
Metered SaaS exhibits one of the most extreme revenue distributions in software: the top 10% of customers regularly drives 70–80% of MRR. Three measures quantify this. The Gini coefficient ranges from 0 (perfect equality, every customer pays the same) to 1 (cliff, one customer pays everything). Below 0.5 is diversified, 0.5–0.7 is healthy power law, above 0.7 is concentrated and risky. The top-10% share is more intuitive — what percent of MRR comes from your top decile. Below 50% is diversified; above 75% is dangerous. The top-customer share is the existential one — if any single customer is more than 20% of MRR, churn risk is binary. Gini is computed using the standard discrete formula 2 × Σ(rᵢ × i) / (n × Σr) − (n+1)/n. The tool runs it live across your synthetic customer base and renders the Lorenz curve so you can see whether your distribution looks like a gentle 70/30 concavity or a near-vertical cliff at the right edge.
Gross Margin per Tier — Why Free Tiers Can Bleed Money
Per-tier margin analysis is what separates surface-level pricing math from a real diagnosis. The math: for each tier, revenue equals units consumed in that band × tier price; COGS equals units consumed × your unit cost. Margin is revenue minus COGS, and margin percentage is margin / revenue. The catch: free tiers and low first-band tiers often have negative margin if your unit cost approaches the tier price. A free tier with 1,000 included units at $0.0005/unit COGS costs you $0.50 per customer per month — for 10,000 free users that is $5,000/mo of pure burn before any conversion. Bleeding tiers are flagged in red below the ladder so the leak is visible. The healthy design rule: the top tier should generate enough margin dollars to subsidize lower tiers and still leave gross margin above 60%. If it cannot, you have a structural pricing problem — either reprice the top tier or reduce free quota.
Consumption Predictability and Committed-Use Floors
Pure metered billing introduces a cash-flow problem: monthly revenue swings as customers consume more or less. The standard metric for this is the coefficient of variation (CoV), the ratio of monthly MRR standard deviation to mean MRR. As a working practitioner heuristic, CoV under 15% is rock-solid; 15–25% is normal for metered SaaS; above 40% is hard to defend in board meetings or fundraises. The tool simulates 12 months of usage with random-walk variance to produce a realistic CoV estimate. The fix is a committed-use floor: customers commit to a minimum monthly spend (say 20% of their average usage value) regardless of consumption. AWS, Snowflake, and most enterprise SaaS deploy committed-use discounts because they convert variance into predictable annual revenue. The slider here lets you toggle a 0–50% floor and watch CoV drop in real time. For a typical AI API with CoV of 35%, a 25% commitment floor commonly cuts CoV to roughly 14% — a finance-grade improvement.
Base + Overage vs Credit Packs vs Pure Metering
Base + overage answers a different question than pure metering. It charges a fixed monthly fee that includes a quota and bills overages per unit. Healthy plans see 40–60% of customers exceed the quota — the sweet spot. Below 40% the included quota is too generous (you are leaving expansion revenue on the table). Above 60% the base feels punitive (customers will churn or downgrade). Credit packs work differently: customers buy prepaid bundles (500 builds for $49, 2,000 for $149) and consume them over months. Pack-based pricing hides per-unit prices (margin protection), creates upfront cash (great for working capital), and pairs naturally with expiration to drive re-purchase. Pure metering is the cleanest alignment with value but produces the highest monthly variance. Most successful metered SaaS — Snowflake, Twilio, Datadog — actually run hybrids: a base subscription plus consumption above it. Toggle the structure dropdown to compare all four side by side against the same customer base.
Migrating from Subscription to Usage-Based Pricing
"Is usage-based better than subscription for my SaaS?" — the only honest answer comes from modeling both against your real customer base. Three outcomes are common. Power users (top 10%) generate 2–4× more revenue under usage-based, while light users pay less. Net change is positive if your customer mix is power-law shaped (most metered SaaS); net change is negative if your customer base is uniform (typical SMB SaaS). Hybrid plans (subscription floor + usage above quota) capture most of the upside without the variance — this is why Snowflake, Datadog, and Twilio all converge on hybrid models. When migrating, grandfather existing customers for 6–12 months at the legacy price, run new logos on the new model, and protect revenue by setting the new floor at 80–90% of legacy ARPU. The Base + Overage preset above models exactly this transition. As a healthy-output checklist for infrastructure SaaS, target 60%+ gross margin, Gini between 0.55 and 0.65, and CoV under 20%. Anything outside those bands signals a pricing redesign is overdue.
Frequently Asked Questions
How do you calculate per-token pricing for an AI API?
Per-token pricing for an AI API multiplies tokens consumed by a $/token rate, often tiered by volume. Example: $0.005/1K tokens for 0–1M tokens, $0.003/1K for 1M–10M, $0.0015/1K for 10M+. Subtract per-token COGS (model inference, infrastructure) to get per-call margin. The tool above models per-token pricing across a synthetic customer base and shows margin by tier.
How do you design API pricing tiers?
Anchor breakpoints to customer-usage percentiles (50/75/90/95). Step prices down so larger customers feel rewarded, but each tier must stay above COGS. The tier designer above lets you drag breakpoints live and recomputes revenue and per-tier margin in real time.
What is the difference between usage-based pricing and subscription?
Subscription bills a fixed monthly fee regardless of usage. Usage-based pricing (also called consumption pricing) bills per unit consumed — per API call, per GB, per token. Subscription is more predictable but can underprice power users; usage-based aligns revenue to value but introduces monthly variance. Hybrid plans (base + overage) blend both.
How do you model consumption pricing for SaaS?
Simulate a customer base with a realistic usage distribution (most metered SaaS shows a power-law: top 10% drives 70%+ of revenue), price each customer through your tier engine, and stress-test monthly variance over 12 months. The Pareto curve and coefficient-of-variation outputs in this tool quantify both concentration and predictability.
What is credit-based pricing and how is it calculated?
Credit-based pricing sells prepaid packs (e.g. 500 build-minutes for $49). Customers consume from the pack and buy more when depleted. Calculation: revenue = ceil(usage / packSize) × packPrice. Credit packs hide unit price (good for margin), expire to prevent hoarding, and create a predictable upfront-cash motion compared to pure metering.
How do you calculate LLM API pricing and margins?
LLM API margin = (price per 1K tokens × volume − inference cost × volume) / revenue. Inference cost varies by model — as of early 2026, GPT-4o sits near $0.005/1K input, Claude Haiku near $0.0008/1K, Llama on own infra near $0.0001–$0.0003/1K. Healthy LLM API gross margins typically land 50–70%; below 40% means head-tier customers are subsidizing the rest. The tool models the full distribution and flags bleeding tiers.
How do you price infrastructure SaaS (storage, compute)?
Infrastructure SaaS prices per GB-month, per compute-hour, or per request. Tiered volume discounts are standard (AWS, Snowflake, Datadog all use them). Aim for 60–70%+ gross margin at the median customer; the top tier subsidizes free / low tiers. The Infrastructure SaaS preset loads 2025-calibrated unit costs, tier breakpoints, and benchmark margin / Gini values.
What is power-law customer concentration and why does it matter?
A power-law customer concentration means a small minority of customers drives a large majority of revenue — typically the top 10% generates 70–80%+ of MRR in metered SaaS. It matters because losing one whale can crater quarterly revenue. The Gini coefficient (0=equal, 1=cliff) quantifies it: <0.5 is diversified, 0.5–0.7 is healthy power law, >0.7 is concentrated and risky. The tool computes Gini and top-N concentration live.
How do you price a developer API with a free tier?
Cap the free tier at usage that costs you under $1/customer/mo in COGS, so freemium funnels self-pay through churn rather than burning cash. Then tier paid usage with steep discount on the top tier to anchor enterprise. The free tier exists for adoption, not revenue. The Data API preset includes a 0–1K free tier so you can model the math directly.
How does base + overage pricing work?
Base + overage charges a flat monthly fee that includes a quota (e.g. $50/mo + 5,000 messages), then bills per unit beyond the quota at an overage rate (e.g. $0.008/message). Healthy plans see 40–60% of customers exceed the quota — the sweet spot where the base feels valuable but expansion revenue is meaningful. <40% means quota is too generous; >60% means base feels punitive.
Related SaaS Tools
- CAC Payback Calculator — match CAC payback to your usage-based ARPU
- LTV:CAC Ratio Visualizer — unit economics for metered plans
- Churn & NRR Calculator — net retention drives expansion revenue in usage-based models
- Pricing A/B Test Estimator — once your model works, optimize willingness-to-pay