OpenAI recently made headlines with a significant announcement: its "compute margin"—the revenue share remaining after covering the substantial server costs to run services like ChatGPT—reportedly reached 70% in October 2025. This marks a dramatic improvement from 35% in January 2024, effectively doubling its margin efficiency in less than two years. This news, initially reported by The Information, suggests a potential turning point for AI profitability at the foundational layer.

For B2B founders and investors, this shift is critical. A 35% gross margin often characterizes a services business, while 70% begins to resemble the high-margin profile of a software company. The immediate question arises: Have AI gross margins truly turned the corner, or is this merely a result of creative accounting on a substantial cash burn? More importantly for B2B startups, does this improvement at the foundation layer translate into benefits for them?

The honest answer, as this analysis will explore, is likely not as much as many would hope.

The Headline Numbers Look Great, But Reality Is More Complicated

Traditional SaaS models are lauded for their unit economics: software is built once, hosted affordably, and the marginal cost per new customer approaches zero, leading to the 75-80% gross margins that attract investors. AI, however, disrupted this paradigm.

Every AI inference—each ChatGPT response or Copilot code suggestion—consumes significant computational power. GPUs and electricity are expensive, and processing billions of queries quickly escalates costs. In late 2023 and early 2024, the landscape was challenging:

  • OpenAI: Around 35% compute margins (early 2024).
  • Anthropic: Negative 94% to negative 109% gross margins in 2024, indicating infrastructure costs exceeded revenue.
  • GitHub Copilot: Microsoft reportedly lost over $20 per user per month at the $10/month price point.

Now, OpenAI's compute margin has surged to approximately 70% by October 2025, and Anthropic anticipates its gross profit margin to reach 50% this year and 77% by 2028. These improvements at the foundation layer are indeed positive developments.

However, a crucial nuance is often overlooked: the decline in inference costs primarily applies to older models. Frontier models, conversely, are becoming more expensive, not cheaper.

The Treadmill Problem: Why B2B Startups Aren't Benefiting

This is where the optimistic narrative diverges for companies building AI applications. As one report noted, "Rather than falling as expected, the cost of some of the latest AI models has risen, as they use more time and computational resources to handle complicated, multistep tasks."

The rise of agentic workflows has led to a 10x-100x jump in token consumption per task since December 2023. Models like o3, DeepSeek R1, and Grok 4 employ multi-step reasoning, generating massive outputs—and every token incurs a cost. An analysis comparing the same coding task found an aggressive reasoning model generated 603 tokens compared to 60 for a simpler model, a 10x cost increase for identical results due to "token bloat."

In essence, while per-token costs may be falling, total costs per task are rising. This is the "treadmill problem" for B2B startups: constant pressure to deliver better results necessitates using more advanced models, which in turn require more expensive reasoning tokens.

For example, a SaaStr Fund portfolio company with $100M ARR projects an additional $6M in incremental inference costs over the next 12 months. This isn't due to product failure but a strategic investment to outpace competitors—a 6-point margin sacrifice simply to stay ahead.

This dynamic highlights a critical point: competition elevates the performance bar as much as token costs might lower it. While GPT-4-level performance has become more affordable, competitors are now deploying reasoning models and agentic workflows that consume significantly more tokens. Stagnation in this environment is a recipe for failure.

OpenAI's focus on reasoning models, such as its Thinking and Deep Research modes, also comes at a higher operational cost due to increased compute demands. Similarly, AI coding assistants are compelled to adopt the newest, most advanced, and often most expensive LLMs, as model developers continuously fine-tune them for coding and debugging improvements.

So, while GPT-3.5-level performance is dramatically cheaper than three years ago, user expectations demand cutting-edge solutions like Claude Opus, GPT-5.2 Thinking, and truly effective reasoning capabilities—and application builders must bear that cost.

The Numbers That Actually Matter for B2B Startups

Looking at the application layer, where companies build products, the financial picture is stark. Bessemer's 2025 dataset shows rapidly growing AI "Supernovas" averaging only about 25% gross margins early on, while more stable "Shooting Stars" hover closer to 60%. Notably, many AI Supernovas exhibit negative gross margins, a rarity in traditional software.

To put this in context, a "good" gross margin for traditional SaaS is typically 75% or higher. Presenting 55% gross margins at a Series B funding round often leads to uncomfortable discussions about whether a company is truly a software business or a services operation. The average 25% gross margins for AI Supernovas represent a fundamentally different business model, often prioritizing market distribution over short-term profit.

Consider GitHub Copilot: reports indicated it cost Microsoft (GitHub's parent) up to $80 per user per month in compute/model fees for heavy users, averaging a ~$20 loss per user in early 2023. This meant Microsoft was absorbing approximately $30 in costs for each $10 subscriber, with even higher losses for power users. While Microsoft can absorb such losses, most startups cannot.

The Cursor Evolution: From API Costs to Proprietary Models

The journey of Cursor (by Anysphere) offers a compelling case study. In mid-2025, Cursor exemplified the "thin wrapper" problem. One investment firm estimated Cursor was paying around $650 million annually to Anthropic while generating roughly $500 million in revenue, resulting in a negative 30% gross margin. Their AWS bills doubled in a single month when Anthropic introduced Priority Service Tiers.

Cursor's strategic response was to build its own models. In October 2025, they launched "Composer," their first proprietary coding LLM. This reinforcement-learned mixture-of-experts model, specifically trained for agentic coding workflows, runs four times faster than comparable frontier models while maintaining similar quality. It was described as training "a big MoE model to be really good at real-world coding, and also very fast."

Benchmarks showed Composer matching "mid-frontier" systems (e.g., GPT-5 and Claude Sonnet 4.5) and generating at 250 tokens per second—twice as fast as leading fast-inference models. While Cursor still offers Anthropic, OpenAI, and Google models, it increasingly routes traffic to its own infrastructure.

By November 2025, Cursor achieved $1 billion in annualized revenue with a $29.3 billion valuation. Projections indicated gross margins improving from 74% to 85% by 2027 as they transitioned to a mix of open-source and proprietary models. As of December 2025, the company maintained single-digit monthly cash burn and $1 billion in cash reserves.

Cursor survived the margin squeeze by undertaking what most startups cannot: investing hundreds of millions in proprietary model infrastructure. Their CEO confirmed their in-house models "now generate more code than almost any other LLMs in the world." However, this path required $3.5 billion in total funding and a willingness to burn cash for years—a playbook largely unreproducible by most B2B startups.

The Real Framework: Why This Is Different From Traditional SaaS Economics

AI startups face a unique economic challenge: compute costs that scale super-linearly with model size and usage. Unlike traditional software companies, where marginal costs approach zero at scale, AI companies contend with GPU bills that can grow faster than revenue.

Traditional SaaS had variable costs like hosting, payment processing, and customer support, but these were modest relative to revenue and scaled sub-linearly. More users typically meant better economics. AI reverses this: a company with $10M ARR and $15M in compute costs appears identical to one with $10M ARR and $2M in compute costs when applying a 20x revenue multiple, yet their fundamental value differs vastly.

Three primary forces are compressing SaaS margins: rising cloud costs, AI inference expenses, and increasing support salaries. SaaS companies that once boasted 85% margins are now adjusting to 60-70% or even charging separately to achieve 80% margins with a blended model. The "AI tax" is compressing margins across the entire software industry, not just AI-native companies.

What Actually Works: The Companies Getting Margin Math Right

So, what distinguishes companies successfully managing AI economics from those burning cash?

1. Intelligent Model Routing

Successful companies aren't using frontier models for every task. They implement routing layers that direct simple queries to cheaper, smaller models and reserve expensive, complex queries for frontier models. The key is determining if a use case requires the top model for every request or if a quality bar can be met by more cost-effective options, allowing for bursts to frontier models when necessary.

2. Usage-Based Pricing That Actually Works

A 2025 industry report revealed that 92% of AI software companies now employ mixed pricing models—combining subscriptions with usage fees or offering tiered pricing for heavy usage—specifically to address margin issues. The "unlimited" usage model is becoming obsolete. By mid-2025, GitHub announced that its formerly "unlimited" Copilot would include a generous allowance of AI requests, with custom pricing for usage beyond that. Offering unlimited AI usage at a flat price risks subsidizing power users with unsustainable margins.

3. Building Value Beyond Token Markup

Replit's evolution illustrates this strategy. They charge $25/month for Core (or $20 annually), which includes $25 in usage credits. Critically, hosting an average customer's website costs Replit approximately $4, yielding an 80%+ margin on the hosting layer—classic SaaS economics. The true margin comes from services beyond basic AI assistance, such as hosting, deployments, storage, and bandwidth. Replit's Bounties marketplace also generates a clean 10% fee from posters, unrelated to inference costs.

Replit's overall gross margins fluctuated significantly, reaching a reported 36% by late 2025, up from a negative 14% at the start of 2025, driven by LLM access costs. However, by layering subscription revenue with high-margin hosting infrastructure and marketplace fees, they've created a model where AI acts as a hook, but infrastructure provides the margin. They transitioned from flat 25-cent pricing per coding task to "effort-based pricing" that can reach $2 for complex tasks, directly passing cost variability to users while retaining predictable hosting revenue. Companies surviving the margin squeeze build product depth beyond mere API call markups.

4. Proprietary Models (The Nuclear Option)

Cursor's playbook demonstrates the potential of building proprietary models to escape the margin squeeze. Their Composer model, trained on real software engineering tasks using reinforcement learning, now handles most of their inference volume at a fraction of the cost of external APIs. However, this demands a nine-figure R&D commitment, including custom reinforcement learning infrastructure, thousands of NVIDIA GPUs, specialized MoE kernels, and hybrid sharded data parallelism—not a feasible path for most startups. A more realistic approach for many involves fine-tuning open-source models (e.g., Qwen, Llama, Mistral) for specific use cases and routing as much traffic as possible to these cheaper models, reserving frontier models for edge cases.

Is VC Subsidizing a Structurally Unprofitable Industry?

The current AI landscape is effectively subsidized. Even as inference costs decrease by 50-100x every few years, prices often remain below true economic cost, propped up by Big Tech, leading labs, and their investors. OpenAI reportedly still burns $8 billion annually, and Anthropic billions. AI coding startups losing money on every user continue to raise at billion-dollar valuations.

The optimistic view posits this as the "invest to win the market" phase, expecting unit economics to improve. The skeptical view, articulated by Erik Nordlander, a general partner at Google Ventures, suggests, "The inference cost today, that’s the most expensive it’s ever going to be." The ultimate veracity of this statement remains to be seen.

The Bottom Line for B2B Founders

For founders building AI-native B2B products, the situation is nuanced:

  • The good news is real but limited. OpenAI and Anthropic's improved margins signify a more sustainable infrastructure layer, ensuring the continued availability of essential APIs.
  • Benefits don't flow down automatically. While inference costs may fall significantly, these declines often apply to older models. Frontier models are becoming more expensive, and user expectations compel the use of these advanced, costly models.
  • The treadmill is real. You will face pressure to adopt better models, which will increase costs per task even if per-token costs decrease. Your margins will remain compressed unless actively managed.

The survival playbook for B2B AI startups is clear:

  1. Don't build thin wrappers. Develop deep workflow products with multiple revenue streams.
  2. Don't offer unlimited usage. Price your services according to your actual cost curve.
  3. Don't assume your API provider is your friend. They are often developing competing products.
  4. Don't ignore routing and optimization. Cost efficiency is paramount for winning companies.

Ultimately, the most critical metric isn't your current gross margin, but its trajectory. Are you becoming more efficient as you scale, or are you caught on a treadmill where every product improvement erodes your margins?

OpenAI's leap from 35% to 70% compute margin demonstrates that the treadmill can be escaped at the foundation layer. The open question is whether application-layer companies can achieve similar feats, or if they are permanently trapped between declining API costs and escalating user expectations. My honest assessment: some will succeed spectacularly, many will struggle, and a significant amount of venture capital will be lost in the process.

Welcome to the real economics of AI in B2B.