Want to become a partner, click here to schedule time to talk with the team.
FinOps & Beyond is what engineering, finance, and IT leaders read to understand FinOps, and what it means for operating models, accountability, and spend decisions.
Table of Contents
General Tech Trends Analysis
Proactive FinOps: Stopping the Backlog at the Source
Last week, I discussed the reactive side. This week, the other half.
Reactive FinOps is throughput work. Items get cleaned and stay clean if the program runs on cadence with engineering ownership. The recurrence rate measures whether the cleanup is sticking. But recurrence above 40% is not a reactive problem — it is a design problem. The backlog is being regenerated faster than it can be cleared. Reactive work cannot fix that. The fix lives upstream, at the artifacts that record what gets built.
Two of those artifacts already exist in most engineering organizations. They are just not asked to carry cost.
Why Now
For forty years, SaaS economics rewarded carelessness about marginal cost. Build was expensive, distribution was nearly free, and 80% gross margins absorbed architectural waste without making it visible.
AI inverts that. In The Great Software Inversion, Guru Chahal documents the shift:
Companies that started as SaaS and added AI features are watching margins compress from the mid-80s into the 70s, or lower. Pure AI-native applications are running at 40-55% gross margins.
A 50% margin business cannot absorb the architectural carelessness an 80% margin business shrugged off. The same defaults and inefficient model selections that cost a slice of SaaS margin now eat into the operating model. Cost design at the artifact layer is what protects what is left.
That is the why. The how follows.
Cost as the Fourth Criterion
Issue 05 of FinOps & Beyond made the case: performance, reliability, security, cost. One decision, four criteria. Defaults are not cost-neutral. Cost becomes structural at the moment of configuration.
The follow-on question is mechanical. If cost is a criterion, it has to live somewhere. Slack threads and review minutes do not count. The artifacts that carry weight in engineering decisions are Architecture Decision Records (ADRs) and Product Requirements Documents (PRDs). Both already record tradeoffs. Neither typically records cost.
That is the gap.
The ADR Cost Profile
Most ADR templates are a variation of three sections: Context, Decision, Consequences. Cost shows up — when it shows up at all — as a vague mention under Consequences. That is not enough. I have seen the most success with organizations who include cost beyond the vague mention. So, organizations should add a fourth section. Call it Cost Profile. It captures four things:
Baseline cost at expected launch volume
Cost driver (per-request, per-GB, per-hour, per-instance)
Scaling behavior at 10x volume
Nonlinearities or cliffs in the pricing model
Worked example. Decision: managed Postgres on Amazon RDS or self-managed on EC2.
The Decision section says RDS, for operational simplicity. The Consequences section notes the managed service tradeoff. The Cost Profile section says: ~$100/month baseline at db.t3.medium with Multi-AZ, IOPS-based pricing scales nonlinearly with I/O-heavy workloads, and cost at 10x volume runs 12–14x. Self-managed runs ~$50/month baseline but carries engineering time for patching, backup, and HA.
Now the decision is recorded with the cost shape attached. Six months later, when usage grows and the bill is up, the original review is documented — including what was known about scaling at the time. The artifact is what makes the decision auditable.
The PRD Cost-Impact Field
PRDs are the parallel artifact on the product side. They already include success metrics, user impact, dependencies. Add a Cost Impact field.
It captures less than an ADR. Three lines:
Expected unit cost at launch volume (cost per user, per request, per inference)
Cost driver
Cost ceiling at which the feature stops being economically viable
The third line is the one that matters. Most features ship without an explicit ceiling. They are evaluated on user growth and retention. When the cost grows faster than the revenue attached to those users, the conversation happens in arrears.
A cost ceiling in the PRD changes that conversation. The product team commits to economics at write time, not at quarterly review. The engineering team has a number to design against. The finance team has a checkpoint that does not require dashboard archaeology.
OpenClaw in Practice (as a proxy)
OpenClaw is a useful test case. The architecture has one decision that dominates cost: which model handles which task. Every message routed through the gateway is an inference call. Every inference call has a price.
The candidates: free-tier models with prompt-tuning to compensate for quality, direct API access at per-token pricing, OAuth through Pro accounts at flat-rate pricing, and self-hosted open models with infrastructure cost. Each has a different Cost Profile shape. The free tier pushes work back onto prompt engineering. Per-token API access scales linearly — easy to budget, easy to overrun. The Pro account is flat until it is not — quotas and rate limits are nonlinearities that do not show up in a spreadsheet projection. Self-hosted only works above a sustained volume floor.
The Pro account beats per-token API pricing at low-to-mid message volume, but the math inverts above a threshold that depends on average tokens per message and which provider is being called. The threshold itself moves when prompts change, when the routing logic shifts more traffic to a stronger model, or when a new provider tier enters the market.
That is the kind of decision that does not survive a quarterly review. It survives an artifact that records it and gets revisited when the volume assumption changes.
Visibility Theater at the Decision Layer
Visibility Theater usually describes dashboards: the appearance of cost control without the ownership to act on it. It also exists earlier, at the decision layer.
Architecture reviews that nominally include cost. Design docs with a "cost considerations" line that says "we will monitor usage." PRDs that defer economics to post-launch optimization. All of it produces the appearance of cost-aware design without producing a number anyone can be held to.
ADRs and PRDs with required cost sections make that performance impossible. The artifact either has the number or it does not. The number is defended at review or it is not. There is no middle ground where "we considered cost" becomes a ceremonial line.
What This Solves
Reactive work cleans the backlog one item at a time. Cost-aware design changes what gets added to the backlog in the first place. The recurrence rate trends down not because the cadence improved, but because the upstream system stopped producing the same waste.
Reactive cleanup produces the signal. Every recurring item points to a design pattern regenerating cost faster than it can be cleared. That signal becomes input to the next ADR or PRD — the artifacts get sharper because the reactive work tells you which Cost Profile assumptions were wrong.
Without that loop, ADRs and PRDs become another form of Visibility Theater — written, filed, and ignored. With it, the reactive backlog gets shorter every quarter, not because cleanup got faster, but because fewer items are being created upstream.
If your recurrence rate is above 40%, your reactive program is not broken. Your design process is.

FinOps Signal
Structural Trend Quick Takeaway
Cost Annotation Rate: Coverage as the First Test
Most proactive FinOps programs talk about cost-aware design. Almost none measure whether the artifacts that drive design carry cost at all.
The most basic test of proactive discipline is whether ADRs and PRDs include cost as a written, defended section — not deferred, not "TBD," not punted to a follow-up ticket. Cost annotation rate measures that coverage.
It is the proactive analog to recurrence rate. Where recurrence tells you whether reactive cleanup is sticking, cost annotation tells you whether the upstream discipline is structural or whether teams are still working around it.
How to Define It
Cost annotation rate is a simple ratio. Numerator: ADRs and PRDs closed or merged this quarter where the cost section is completed with actual numbers. Denominator: total ADRs and PRDs closed or merged this quarter.
What counts as completed matters more than the formula. Specific dollar estimates, scaling assumptions, or a defended cost ceiling count. "We will monitor usage" or "see follow-up ticket" do not. A cost section that is present but empty is the same as no cost section at all — it just makes the metric look healthier than it is.
Be strict early. The metric is only useful if the bar is set, and the bar is much harder to raise once teams have shipped under a lower one.
The AI Twist
AI features tend to ship faster than traditional infrastructure work. The artifact discipline that exists for a database migration or a network design rarely catches up to a model integration that lands in two sprints. The ADR is skipped. The PRD ships without a cost ceiling.
That is exactly the work that needs the most cost annotation, because it sits inside the inference cost structure compressing margins industry-wide. A cost annotation rate that drops specifically on AI-related artifacts is a leading indicator of margin erosion two quarters out.
The Diagnostic Threshold
Under 50% Coverage: cost is being skipped as the default. The artifact policy exists in name only.
Between 50% and 80% Coverage: cost is being captured selectively, usually on artifacts where the engineer or product manager already cared. The discipline is uneven, dependent on individual judgment rather than process.
Above 90% Coverage: cost is structural. At that point the work shifts to a different question — whether the numbers in those artifacts are any good. That is downstream, and it is a better problem to have.
Coverage Is Not Quality
A high cost annotation rate does not mean the cost analysis is correct. It only means it exists. The numbers can still be wrong. The scaling assumptions can still be optimistic. The cost ceiling can still be set at a level no one defends.
Quality is the next problem. Coverage is the first one. You cannot evaluate whether the cost numbers in your ADRs are accurate if half your ADRs do not have cost numbers in them.
If you do not know your cost annotation rate, you cannot say whether your proactive FinOps program is working or whether it is theater.

FinOps Industry
News or Market Updates - Open Source
The Pattern from Issue 01, At Scale
On April 23, Meta and Microsoft announced workforce actions on the same day. Combined, the actions affect up to 23,000 positions.
Meta is cutting roughly 8,000 jobs — 10% of its workforce — and cancelling 6,000 open roles, effective May 20. Microsoft launched its first-ever voluntary retirement program: a "Rule of 70" buyout (age plus tenure ≥ 70) covering up to 8,750 US employees, with notifications going out May 7.
Both companies reported record revenues. Both are simultaneously spending record amounts on AI infrastructure. Microsoft's fiscal 2026 capex sits at $145 billion. Meta has guided 2026 expenses to $162–169 billion, driven by infrastructure costs and AI talent compensation. Microsoft's AI and Copilot teams were explicitly exempted from the buyout — this is not a workforce reduction, it is a workforce composition change.
Issue 01 named the pattern in March:
Organizations are not simply reducing cost. They are reallocating cost. Some of that cost is being removed from payroll. But a growing share is being redirected toward infrastructure — particularly the infrastructure required to train and run AI systems.
Block, Amazon, and Oracle were the early examples. Microsoft and Meta are the same pattern, six weeks later, at greater scale and with less ambiguity. The press is now saying it directly — TheNextWeb described April 23 as Big Tech "converting payroll into AI capital expenditure." That is the structural shift Issue 01 traced, now visible enough that it does not require interpretation.
Fixed labor cost becomes variable infrastructure cost. The dollars do not leave the company — they cross from one line item to another, and the second one scales with usage. The unit economics question Issue 01 raised is the operational consequence.
FinOps Company Spotlight
If you would like your company included in the Spotlight, contact the CloudXray AI Team

Company: Beakpoint
Category: Cost Intelligence & Visibility
What They Do: The platform uses activity-based costing to connect every dollar of cloud spend to customers, features, and activities, transforming cloud bills into actionable business intelligence.
Why It Matters: Businesses want customer-level margin data

Operator Playbook
Practical guide for leaders and practitioners
Building, Measuring, and Reporting Cost in ADRs and PRDs
The main piece made the case for ADRs and PRDs as the artifacts that carry cost. This is the execution.
Cost-aware design works when the artifacts are required, the entries are concrete, the measurement is structured, and the data feeds back into the next round. Six rules.
1. Make the Cost Section Required, Not Optional
ADR templates and PRD templates both need a cost section as a required field. Empty submission gets blocked or flagged at review — the same mechanism that requires unit tests to pass before a merge.
A waiver is allowed for a specific artifact, but only with a written reason and a follow-up date. Permanent waivers are not waivers; they are policy carve-outs, and they should be visible to engineering leadership.
2. Define What "Completed" Means
The cost section is only useful if the bar is concrete.
ADRs: baseline cost at launch volume, cost driver, scaling estimate at 10x volume
PRDs: unit cost at launch, cost driver, cost ceiling at which the feature stops being viable
"TBD," "we will monitor usage," and "see follow-up ticket" do not count. Be strict early. The bar is much harder to raise once the team has shipped under a lower one.
3. Track Three Numbers, Not One
Coverage is the first metric. Two more give you accuracy.
Cost annotation rate — percentage of ADRs and PRDs with a completed cost section
Estimate variance — actual cost at 90 days vs. the estimate written into the artifact
Ceiling adherence — percentage of features still under their PRD cost ceiling at six months
The first tells you if the discipline exists. The second tells you if the estimates are credible. The third tells you if the ceilings are real.
4. Sample, Do Not Audit
Reviewing every ADR and every PRD is unrealistic. Pick a sample — ten ADRs and ten PRDs from the prior quarter, drawn at random. Score each on the three numbers. Extrapolate.
The point is the trend, not the audit. Engineers will resist a full audit. They will tolerate a sample.
5. Report Quarterly, Not Continuously
Monthly is too tight; the data does not move that fast. Quarterly is right.
The readout has three sections:
Coverage trend this quarter vs. prior
Top three estimate misses, with the assumption that was wrong
Ceiling breaches, with what is being done about them
The audience is engineering leadership, finance, and product. Not the FinOps team alone. The metric does not change behavior unless the people whose decisions produced it see it.
6. Feed the Signal Back into the Templates
Estimate variance and ceiling breach data are inputs to the next round of artifacts. If RDS scaling estimates are consistently off by 3x, the ADR template gets a more conservative scaling guidance note. If AI features routinely breach their ceilings within two quarters, the PRD template requires a stricter cost driver definition.
The artifacts get sharper because the measurement points to where they were wrong. That is the loop the main piece described, executed at the template level.
Without it, the annotation rate climbs but the numbers in those annotations stay wrong. With it, the artifacts converge on accuracy over time, and the recurrence rate from last week's reactive program drops as a side effect.

