Product

How Photoniq Ranks Test Vectors: The Score Behind the List

Hiroshi Watanabe · 9 min read
Abstract ranked list visualization with priority scoring

The output that users see from Photoniq is a ranked list of test scenarios with a score and a description of which coverage bins each scenario is predicted to advance. What users don't immediately see is how those scores are computed. This post is the explanation we'd want to read if we were evaluating the tool: what goes into a score, what the score actually means, what it doesn't mean, and how to use the ranking most effectively.

We're writing this because we've had enough conversations with DV engineers who treat our ranked list as a black-box oracle — "Photoniq said scenario A is score 0.87, so we'll run that first." That's a reasonable starting point, but understanding what the score captures helps you use the list better, particularly in cases where the top-ranked scenario may not be the right first choice for your specific timeline and resource constraints.

The Three Score Components

The Photoniq ranking score is a composite of three sub-scores, each representing a different dimension of value. These are computed independently and combined with learned weights that vary based on design characteristics. Here's what each represents:

1. Predicted Coverage Gain (PCG)

PCG is the core ML prediction: given the current coverage state from your UCDB file and the RTL structure of your design, how many open bins is this test scenario expected to close? This is computed by the RTL-level graph model using message-passing inference across the design's structural graph with coverage bin state as node features.

PCG is normalized per analysis run — a PCG of 1.0 means the scenario is predicted to close more open bins than any other candidate scenario in the current run. It's relative to the candidate set, not an absolute count. When we surface the raw predicted bin count (shown in the detail view as "estimated bin closure: ~14"), that number comes from the same inference, just denormalized.

PCG is highest when we have strong training evidence for the design class. For processor pipelines and memory controller blocks, PCG is reliable. For highly custom accelerator datapaths with unusual coding patterns, PCG degrades — we note this in the confidence indicator on each recommendation.

2. Stimulus Efficiency (SE)

A scenario that closes 14 bins but requires a 50,000-cycle simulation trace to manifest the relevant stimulus is less efficient than a scenario that closes 12 bins with a 2,000-cycle directed test. Stimulus Efficiency is our estimate of coverage gain per simulation cycle, derived from two inputs: the predicted bin closure count and a simulation cost estimate for the scenario.

Simulation cost estimation is the less precise part of this sub-score. We estimate cost from: the interface width of the stimulus (wider interfaces require more cycles to sweep), whether the coverage event requires multi-cycle sequences (FSM traversal, handshake protocol sequences), and calibration data from previous analysis runs on similar design blocks. This is explicitly an estimate, not a measurement — we haven't run the simulation. The SE sub-score is most reliable when you've run prior Photoniq analyses on similar design blocks, because the calibration data accumulates.

3. Coverage Bin Importance (CBI)

Not all open bins are equal. An open bin on the error-recovery FSM state machine is more important to close before tape-out than an open bin representing a peripheral data-width conversion path. CBI is our attempt to weight bins by estimated architectural importance.

CBI is derived from structural signals: proximity of the bin's RTL source location to primary outputs and architectural boundary crossings, depth of the logic cone feeding the bin's coverage event, and whether the bin is in a control-plane or data-plane module as inferred from structural analysis. We're not saying we have a complete semantic understanding of what the design does — we don't. CBI is a structural heuristic, not architectural judgment.

In practice, CBI's influence on the final composite score is lower than PCG and SE. We've found that for most designs, if you're closing the bins with highest PCG scores, you're disproportionately closing the structurally important ones as well — they tend to correlate. CBI is most influential for close-scored candidates where PCG and SE are nearly tied.

The Composite Score Calculation

The final score visible in the ranked list is a weighted sum of the three sub-scores. The weights are not fixed; they're calibrated per design class based on training data. For most designs:

  • PCG carries ~55–60% of the composite score weight
  • SE carries ~25–30%
  • CBI carries ~10–15%

These weights shift when context signals suggest the calibration should change. If the UCDB analysis shows the project is near sign-off (overall coverage above 90%), SE weight increases because you're in efficiency-maximizing mode, not discovery mode. If the design is in an early verification phase with many open bins across multiple design blocks, PCG weight stays high because maximizing bin closure per test is more valuable than optimizing per-cycle cost.

What the Score Does Not Mean

Two common misinterpretations we want to address directly.

The score is not a probability that the test will pass. A high-ranked test scenario may expose a bug — that's actually part of the point of coverage closure. If a scenario that closes 14 open bins also trips a simulation assertion, that's valuable information, not a false positive. Don't filter out test scenarios from the recommendation list because you expect them to fail. Expected-to-fail tests that close coverage bins and find bugs are exactly what the tool is identifying.

The ranking is not a strict execution order. The list is a priority queue, not a prescription. The top-ranked scenario is predicted to be the best use of your next simulation slot given current coverage state. But if you have team members with domain expertise on specific design blocks, a lower-ranked scenario targeting their block may be the right choice to run in parallel because they can write and debug it faster. The ranking is input to a resource allocation decision, not a replacement for that decision.

How Coverage State Updates Affect the Ranking

This is the part that surprises users most often: the ranking is dynamic. After you run a scenario from the list and re-upload the updated UCDB to Photoniq, the entire ranking recomputes. Scenarios that depended on bins now closed drop in rank; scenarios targeting the remaining open bins may rise or fall depending on how the coverage state shift affects the structural inference.

In a compressed coverage closure sprint, the pattern that works best is: run top-3 ranked scenarios in parallel → upload updated UCDB → recompute ranking → run next top-3. The recomputation takes 40-120 seconds. If you're running scenarios against a live simulation farm, this fits naturally between regression launches. If you're running manually, the batch update is still worth it every few sessions.

A specific pattern to watch for: after running a high-PCG scenario, sometimes a previously low-ranked scenario for the same module suddenly jumps to the top. This usually means the first scenario opened up a stimulus path that the model now predicts will feed into the remaining open bins more efficiently. It's not the model being inconsistent — it's responding correctly to a changed coverage state.

Confidence Indicators and When to Override

Each ranked recommendation includes a confidence indicator: High, Medium, or Low. This reflects the model's uncertainty about the PCG prediction for that specific scenario and design block combination. High confidence means the design block is well-represented in training data and the current coverage state is within the distribution the model has seen. Low confidence means the opposite — you should treat a Low-confidence recommendation as a starting point for discussion with a DV engineer who knows the block, not as a directive.

Overriding the ranking is always correct behavior when you have domain knowledge the model doesn't have. If the top-ranked scenario involves the AXI slave interface and you know that block was manually verified through directed testing last week, deprioritize it and move down the list. The model doesn't know about out-of-band verification work that didn't produce UCDB entries.

The right mental model for the ranking is: "the model has seen a lot of RTL coverage patterns and is making its best structural inference about which scenarios will move the needle given what it can see in the UCDB and RTL graph." Your domain knowledge of what happened in the last two weeks of verification is not visible to the model. Combining both produces better decisions than either in isolation.