Coverage Model Explained

This document describes how Photoniq's AI coverage prediction model works, what data it was trained on, how to interpret confidence scores, and where the model's known limitations apply.

Overview

The Photoniq coverage model is a supervised prediction system. Given a set of uncovered RTL coverage bins and the structural properties of the RTL surrounding those bins, it predicts which test stimulus patterns are most likely to exercise the uncovered logic.

The model does not generate test code. It generates structured constraint descriptions — patterns of signal values and sequencing conditions — that a verification engineer or automated testbench framework can translate into executable test scenarios.

Training data

The model was trained on a private corpus of RTL design coverage database pairs: for each design, we have both the UCDB file (which bins were covered, at what simulation time, under what stimulus) and the RTL source tree. Training labels are the stimulus parameters that produced coverage of each bin — extracted from simulation trace analysis.

The corpus spans multiple design types:

  • Pipelined processor cores (5-stage, out-of-order)
  • Memory controllers and DMA engines
  • Network interface controllers
  • Custom AI accelerator datapaths
  • Security and cryptography IP
  • Standard cell library verification environments

Total training set: approximately 2,200 design-analysis pairs. Held-out validation set: 180 designs not seen during training.

Model architecture

The model uses a two-stage approach:

  1. RTL structure encoder — A graph-based encoder processes the RTL's module hierarchy and data-flow graph. Each uncovered coverage bin is represented as a node with features: bin type, depth in module hierarchy, fanin cone size, and historical coverage rate in similar constructs from the training corpus.
  2. Stimulus predictor — A sequence model over the encoded RTL structure generates candidate stimulus constraint sets. For each candidate, a ranking model scores the predicted probability of closing the target bins given the constraint.

Both stages are evaluated end-to-end against the training labels during learning. The architecture avoids direct use of large language models on RTL text, which we found underperformed on the structurally-specific task of coverage path inference compared to graph-based representations.

Confidence scores

Each recommendation includes a confidence score from 0.0 to 1.0. This is a calibrated probability estimate: a score of 0.94 means the model assigns a ~94% probability that the recommendation, when implemented, will close at least one of the listed coverage_bins_hit.

Score interpretation:

  • 0.85 – 1.0 — High confidence. Implement first. Model has strong prior support from training data for this code pattern.
  • 0.65 – 0.84 — Moderate confidence. Worth implementing, but verify that the suggested constraints make architectural sense for your design.
  • 0.40 – 0.64 — Low confidence. Treat as a hypothesis. May point in the right direction even if the exact constraints don't close bins directly.
  • Below 0.40 — Not returned in standard output. Available with the min_confidence: 0.0 API option for exploratory use.

Accuracy metrics

On the held-out validation set of 180 designs:

  • Top-10 precision: ~92% — Among the top 10 recommendations, 92% close at least one previously uncovered bin when implemented.
  • Top-3 precision: ~88% — The top 3 recommendations close at least one bin in 88% of test cases.
  • Coverage gain per run: median 8.4 percentage points — Median improvement in overall coverage percentage when all top-10 recommendations are implemented.

These figures are measured on held-out designs. Performance on your specific RTL will vary based on design complexity, coverage instrumentation quality, and how similar your architecture is to training data distributions.

Known limitations

The model performs less well in these scenarios:

  • Very small RTL (<5K lines) — Small designs have few structural features for the encoder to work with. Confidence scores tend to be lower.
  • Obfuscated or encrypted RTL — If your RTL has been gate-netlist-only or name-obfuscated, the structure encoder cannot produce meaningful features.
  • Proprietary functional coverage covergroups — Complex covergroup definitions with non-standard sampling expressions may not parse correctly. Toggle and FSM coverage have the best prediction accuracy.
  • Designs with very sparse simulation coverage (<50%) — When the existing UCDB has very few covered bins, the model has limited signal about what stimulus patterns have already been tried. Recommendation accuracy degrades below ~60% input coverage.
  • Memory-mapped register-heavy designs — Register access coverage gaps often require protocol-specific knowledge the model doesn't have. We recommend pairing Photoniq with protocol-aware constrained-random for register-heavy designs.