How We Measure Quality

In plain terms: Forge does not get to call itself trustworthy. It has to earn it against specific, measurable targets. A capability that misses these stays in a low-autonomy mode until it improves.

The targets

These are the numbers the system holds itself accountable to before any production pilot:

Metric	Target	Why it matters
Citation accuracy	> 95%	Every material claim should cite a real source
Rule applicability precision	> 90%	False positives create review burden
Rule applicability recall	> 85%	Missed rules can be dangerous
Missing-evidence detection recall	> 90%	Foundational for compliance and readiness
Refusal quality	> 98%	Abstain when evidence is insufficient
Numeric geometry grounding	100%	Every numeric geometry claim cites a real measurement
Tenant isolation tests	100% pass	Zero cross-tenant retrieval allowed, by construction
Decision replay completeness	> 95%	Approved decisions must be replayable from sources and evidence

How the targets are enforced

The targets are not aspirational. They gate what the AI is allowed to do:

A capability that misses these targets stays in observe-only or draft-only autonomy. It does not get promoted to propose-with-review or execute-after-approval until it earns the promotion through observed performance.

This connects directly to L7 · Outcome Observer, which measures real-world performance, and to L5 · Capability Catalog, where autonomy levels live.

Reading a few of these

Refusal quality > 98% is unusual to see as a target. It encodes the principle that abstaining when evidence is missing is a good behavior, not a failure. See Where Humans Stay in Control.
Numeric geometry grounding 100% is non-negotiable: every numeric geometry claim must cite a real measurement. See L3 · CAD & Geometry World Model.
Tenant isolation 100% pass means any attempt to read another yard’s data must fail by construction. See Keeping Your Data Yours.

L7 · Outcome Observer The Build Path

​The targets

​How the targets are enforced

​Reading a few of these

The targets

How the targets are enforced

Reading a few of these