Verification Engineering

Closing Functional Coverage Under Tape-Out Pressure

Hiroshi Watanabe · 8 min read
Abstract compressed timeline visualization in dark engineering style

The scenario is familiar to most DV engineers: the tape-out date is fixed, the schedule has slipped for reasons outside your control, and you have four weeks to do what the plan allocated eight weeks for. Coverage closure is not optional — the signoff criteria are set — but you cannot simply double the simulation farm budget and halve the wall-clock time.

What you actually do in those four weeks is a triage problem, not a brute-force simulation problem. The bins you need to close are not equal in value. Some represent critical control paths through the design. Others are edge cases that will never fire in any real workload but are technically open because your constrained-random testbench never generated a 3-cycle AXI burst with a particular byte-enable pattern on a specific queue depth. Distinguishing between those two classes — under schedule pressure, when everyone is exhausted and management is asking for daily status — is hard.

First: Classify the Open Bins

Before running another simulation, the most valuable hour you can spend is a manual pass through your open coverage bins to classify them by potential bug-impact severity. This is not a mechanical process. It requires someone who knows the design architecture.

A rough three-tier classification:

  • Tier 1 — must close: open bins on datapath control logic, protocol handshake edge cases, error-handling FSM states, and any bins with a known architectural dependency on the tape-out feature list. If a bug in this class escapes, it costs a re-spin.
  • Tier 2 — close if time allows: open bins on peripheral functionality, graceful degradation paths, and coverage that's technically required by the sign-off plan but on paths with low operational frequency in the target workload.
  • Tier 3 — waive with documentation: bins that represent architectural combinations the design spec explicitly excludes, or combinations that require a stimulus sequence provably unreachable given your constrained-random harness constraints. These need formal waiver documentation, not more simulation time.

Most DV teams in practice do a rough version of this classification in their heads. Making it explicit — actually writing the tier down for each open bin cluster — forces the disagreements onto the table before they become implicit assumptions. You want the DV lead and the RTL architect looking at the same classification before you prioritize simulation resources.

The Directed-Test Decision

For Tier 1 bins that random simulation hasn't closed after a reasonable budget of simulation cycles, you're writing directed tests. The question is how to write them efficiently under time pressure.

The trap is writing directed tests that are too narrow — tests that hit exactly one uncovered bin and close it in isolation. Narrow directed tests are fast to write but create maintenance debt if the design changes, and they don't generalize to related bins. Better directed test design covers a cluster of semantically related bins with a single coherent test scenario. A test that exercises the memory controller's retry-on-ECC-error path with a constrained burst length range can close 12-15 related bins in one structured scenario instead of 12-15 narrow scripts.

Cluster identification — which bins are semantically related and can be addressed by one scenario — is where we've found Photoniq's ranked recommendations most useful. The model identifies not just which individual bins to target but which bins cluster together because they share a common stimulus precondition. In the ECC retry example, the model recognizes that the 14 open bins all depend on a specific 3-phase sequence: inject error → trigger retry FSM → observe retry completion with valid data. One structured test scenario, not 14.

Simulation Resource Allocation Under Constraint

When you have a fixed simulation farm budget — say, 5,000 simulation-hours available over 3 weeks — how you allocate it across design blocks matters as much as which tests you run.

The common failure mode is proportional allocation: assign simulation hours to design blocks in proportion to their current open-bin count. This sounds fair but is wrong. A block with 200 open bins that are all Tier 2 peripheral-path bins should get fewer hours than a block with 30 open bins that are all Tier 1 control-path coverage.

A better allocation heuristic is expected coverage gain per simulation hour weighted by bin tier. This requires estimating simulation cost to close each bin cluster — which is a domain judgment call — but even a rough order-of-magnitude estimate (some clusters need 100-cycle directed tests, some need 10,000-cycle random runs to manifest) produces meaningfully better allocation than proportional.

One concrete practice that helps: daily coverage snapshots with explicit delta tracking. Not overall coverage percentage — that number moves too slowly to be useful — but a delta report showing which bins were closed in the last 24-hour regression run and which Tier 1 bins are still open. This makes progress visible and surfaces if a bin cluster is resisting closure longer than expected, which is itself a signal that either the test strategy is wrong or there's an underlying RTL bug preventing closure.

Formal Verification as a Coverage Closure Accelerant

For specific bin clusters that are provably reachable but where constrained-random simulation isn't generating the required stimulus efficiently, bounded model checking can close the bin in a fraction of the simulation time. The BMC approach encodes the bin's coverage condition as a reachability property and lets the formal tool find the shortest stimulus trace that satisfies it.

We're not saying formal is always the right tool here — the setup cost for a formal environment is real, and if you don't already have one configured for the design, adding that setup under tape-out pressure is risky. But if you have an existing formal setup (common for protocol IP blocks), and you have open Tier 1 bins that random simulation has burned 2,000+ simulation hours trying to hit, the formal tool will very likely close those bins in hours rather than days.

The practical limit is state space: formal tools handle targeted property verification well but don't replace a broad coverage closure run. Use formal surgically on specific hard-to-reach bin clusters, not as a wholesale substitute for simulation-based closure.

Waiver Management Under Pressure

Coverage waivers under time pressure are a risk surface that deserves its own discipline. The danger isn't writing waivers — waiving unreachable bins is correct verification practice. The danger is writing waivers for bins that are actually reachable but inconvenient, and the distinction is hard to make clearly when everyone is operating on 5 hours of sleep.

Two practices that help maintain waiver integrity under schedule pressure:

First, require that every waiver include a brief description of why the bin is unreachable — not just that it is unreachable. "This combination is excluded by the design spec section 4.2.3" is a valid waiver rationale. "We don't have time to hit this" is not, and should not appear in any waiver document, even informally.

Second, tie waivers to specific RTL or testbench constraints that enforce the claimed unreachability. If you're claiming a bin is unreachable because the design constrains input A when signal B is asserted, that constraint should be expressible as an SVA assertion or a simulator constraint that will flag if the constraint is ever violated. Waivers without an enforcement mechanism are architectural assumptions that will eventually break silently.

What the Schedule Pressure Is Actually Telling You

When coverage closure is genuinely infeasible in the available time, the correct response is to surface that fact to the project owner with specific data — which bins are open, their tier classification, and the estimated simulation cost to close them — rather than adjusting the sign-off criteria implicitly or quietly letting the slip happen.

Four-week compresses-to-eight-week coverage plans don't succeed by working faster. They succeed by making explicit triage decisions early, allocating simulation resources to Tier 1 bins before Tier 2, writing waiver rationales that will survive post-silicon scrutiny, and accepting that some Tier 2 and Tier 3 bins won't be closed in time and documenting that honestly. The chip gets taped out either way. The difference is whether the risk is visible or invisible.