embeddedtestingcompliance

From Unit Tests to WCET: Building Test Suites That Prove Timing for Embedded Systems

ttunder

2026-02-02

11 min read

Hands-on tutorial to extend unit tests with instrumentation and RocqStat-style analysis to produce WCET evidence for ISO 26262/DO-178.

Hook: When unit tests aren't enough — proving timing for safety-critical embedded systems

If your unit tests all pass but your system misses a deadline on the road or in the cockpit, the consequences are not just a bug report — they're a safety event. For embedded teams working to meet ISO 26262 or DO-178 requirements in 2026, functional correctness is necessary but not sufficient. You must produce evidence that code meets its timing constraints: WCET (Worst-Case Execution Time) needs to be known, justified, and traceable. This tutorial shows how to extend everyday test suites with timing hooks, instrumentation, and RocqStat-style statistical processing to produce certifiable timing evidence.

The 2026 context: why timing proof demands new workflows

Industry consolidation and tool evolution changed the game in late 2025 and early 2026. Most notable: Vector Informatik announced acquisition of RocqStat technology and team in January 2026, prioritizing integrated timing analysis inside established verification tool chains. This reflects two trends:

Growing regulatory scrutiny and OEM demand for runtime timing evidence (not just static reasoning).
Increased use of multicore and virtualization on safety platforms, which complicates WCET and forces hybrid approaches (static, measurement-based, statistical).

Practically, that means your lab must run deterministic timing tests, capture hardware traces, and feed them into statistical or formal tools. This tutorial walks a hands-on path you can adopt today with open hardware counters plus RocqStat-style processing when available.

High-level strategy: combine instrumentation, measurement, and analysis

Deliverable: a traceable artifact that links requirements to tests, shows WCET bound with stated assumptions, and includes tool outputs and raw traces for audit.

Define timing requirements and operational assumptions.
Prepare a timing-aware test harness that isolates nondeterminism.
Instrument code for cycle-accurate timestamps and hardware tracing.
Run controlled measurement campaigns (single-core isolation, interference tests).
Analyze results with statistical WCET tools or combine with static analysis.
Produce certification artifacts: traceability matrix, reports, raw traces, configurations.

1) Define timing requirements and success criteria

Start by making timing assumptions explicit and testable. For each software requirement map:

Requirement ID → Worst-case deadline (ms/µs)
Operational scenario (interrupts enabled, cache policy, power mode)
Assumptions (fixed CPU frequency, disabled DVFS, isolation from other cores)
Acceptance criteria (e.g., measured WCET < deadline / 2 for margin)

Traceability is mandatory for certification: every test case must point back to a requirement and list the timing measurement configuration used.

2) Build a timing-aware test harness

Your existing unit and integration tests are the starting point. Extend them so they run in controlled conditions and emit precise timing markers:

Pin the CPU frequency and disable DVFS in test setup. Record the exact CPU and bus frequencies.
Run on a dedicated core when possible. For multicore targets, create isolation tests and interference scenarios.
Establish a warm-up protocol: identical pre-test runs to reach cache/branch predictor steady state.
Standardize the test harness to log configuration: clock, build ID, compiler flags, optimization level, RTOS version, interrupt masks.

3) Instrumentation patterns that scale

Choose instrumentation that provides cycle-accurate or timestamped events without adding uncontrolled overhead. Common, proven techniques:

Core cycle counters (ARM DWT->CYCCNT, RISC-V cycle CSR): minimal overhead and high resolution.
High-precision hardware timers using dedicated timers — useful when cycle counters are not trustworthy due to frequency scaling.
ETM/ITM trace or CoreSight: instruction-level trace that is invaluable for forensic analysis of outliers and reconstructing execution paths.
Instrumented entry/exit probes around functions under test, recording timestamp + test-id to a circular buffer or streaming endpoint.
Lightweight software hooks that log to RAM (avoid blocking I/O) and flush at test end.

Example: ARM DWT cycle counter bootstrap code (C):

#include <stdint.h>
#define DWT_CONTROL (*(volatile uint32_t*)0xE0001000)
#define DWT_CYCCNT  (*(volatile uint32_t*)0xE0001004)
#define DEMCR       (*(volatile uint32_t*)0xE000EDFC)

void enable_cycle_counter(void) {
  DEMCR |= (1 << 24); // TRCENA
  DWT_CYCCNT = 0;
  DWT_CONTROL |= 1; // CYCCNTENA
}

uint32_t now_cycles(void) { return DWT_CYCCNT; }

Wrap the region under test:

uint32_t t0 = now_cycles();
run_under_test();
uint32_t t1 = now_cycles();
record_sample(test_id, t1 - t0);

4) Measurement campaigns — methodology

Good measurement design is the difference between spurious bounds and defensible evidence. Follow these principles:

Repeatability — run thousands of repetitions per test case when possible. Record environment and seed for random inputs.
Warm-up and steady-state — discard initial iterations until caches and predictors stabilize.
Controlled interference — perform baseline single-core isolated runs, then run defined interference tests (other cores stressing memory, DMA, I/O) and document impact.
Outlier handling — capture raw data, mark and justify removed samples (e.g., system maintenance interrupt). Do not silently discard samples.
Stress tests — include pathological inputs and worst-case code paths (deep loop counts, maximum recursion, maximum message sizes).

5) Analysis: turning samples into WCET evidence

There are three mainstream analysis approaches — use one or combine them:

Static WCET analysis uses control-flow and microarchitectural models to compute a safe upper bound independent of tests. It is sound but can be conservative, especially on complex processors and multicore.
Measurement-based WCET takes observed execution times and applies safety margins to produce a bound. It is practical but requires extensive coverage and justification of assumptions.
Statistical WCET (RocqStat-style) applies statistical inference and extreme-value analysis to estimate a bound with a confidence level (e.g., 10^-9 probability of exceedance). This reduces conservatism while quantifying risk.

RocqStat and similar tools, recently integrated into major toolchains per the Jan 2026 Vector announcement, focus on rigorous statistical processing of measured traces. They add value by:

Aggregating thousands of runs and modeling tail distributions.
Providing confidence intervals and p-values for the estimated WCET.
Generating reproducible reports and artifacts suitable for audit.

Example: a simple Python aggregator for cycle samples (illustrative):

import numpy as np
from scipy import stats

samples = np.loadtxt('samples.csv')
# basic stats
mean = np.mean(samples)
p95 = np.percentile(samples, 95)
# fit generalized Pareto to tail for extreme-value estimate
threshold = np.percentile(samples, 95)
tail = samples[samples > threshold] - threshold
params = stats.genpareto.fit(tail)
# compute extreme quantile for desired exceedance probability
p_exceed = 1e-9
quantile = threshold + stats.genpareto.ppf(1 - p_exceed, *params)
print('Estimated WCET (1e-9):', quantile)

That script demonstrates the idea; certified tool chains embed robust variants of this analysis plus diagnostics.

6) Hybrid strategies and multicore considerations

Multicore systems introduce contention (caches, memory controllers, buses). Certification bodies expect conservative accounting for interference. Practical approaches:

Prefer single-core execution for the critical task when feasible.
When multicore is required, run interference tests with documented attacker patterns and derive separate interference WCET contributions.
Use mixed static/measurement analysis: static analysis to bound shared resource effects plus measurements on isolated cores for local computation.
Qualification of hypervisor or partitioning must be documented; show that cross-partition effects are bounded.

7) Toolchain, tool qualification, and traceability for DO-178 / ISO 26262

Regulatory standards define expectations for tooling and artifacts:

DO-178C: Tools that introduce or fail to detect errors require qualification per DO-330. Timing measurement tools that directly affect outputs used for certification must be qualified or used in a qualifiable workflow.
ISO 26262: For ASIL-rated software, timing evidence must be supported by documented methods and tool validation traces.
Document tool versions, configuration, calibration, and raw-data retention policies. Include revision control hashes for every binary and test script.

Produce a formal timing assurance package with:

Requirements and assumptions matrix (with test case links).
Test harness config and hardware lab setup photos/schemas.
Raw logs, trace files, and tool outputs (e.g., statistical WCET reports).
Analysis scripts and their inputs, with reproducible invocation steps.
Rationale for outlier removal and environmental differences between lab and field.

"Vector's acquisition of RocqStat in Jan 2026 signals toolchain convergence: timing analysis will be integrated with test and verification workflows, making end-to-end evidence generation more straightforward for safety-certified teams."

8) Practical checklist: converting a unit test suite into a timing test suite

Use this checklist to evolve your current test suite into one that produces timing evidence:

Audit tests: flag tests that execute critical code paths and create a timing test subset.
Add instrumentation hooks at function entry/exit; store cycle counts in RAM buffers.
Add pre-run warm-up iterations and post-run flushing steps in harness.
Standardize run configuration: CPU freq, interrupt masks, RTOS tick, peripherals status.
Create dedicated test rigs or hardware-in-the-loop (HiL) benches to ensure repeatability.
Automate data collection and push to a centralized artifact repository with immutable IDs.
Integrate statistical processing into CI: nightly measurement campaigns, gating only on pass/fail of timing thresholds.

9) Handling variability and documenting assumptions

Every timing claim must include the exact environment it depends on. Common sources of variability you must control and report:

CPU and bus frequencies (explicitly record PLL settings).
Peripheral and DMA activity (list disabled devices or simulated loads).
RTOS behavior: tick rate, timer mode, scheduler preemption settings.
Compiler optimizations and link-time options (LTO, inline heuristics) — store build artifacts.
Hardware revisions and silicon errata that affect timing.

10) Case study sketch: one function, from unit test to certified timing claim

Scenario: a brake-controller function compute_lockstep() must complete within 1 ms on a Cortex-M55-based ECU under ASIL-D.

Identify code paths and inputs affecting loop counts. Create worst-case synthetic inputs covering max loop iterations.
Instrument entry/exit and enable DWT cycle counter with warm-up runs.
Run 10,000 repetitions single-core isolated; observe distribution of cycle counts. Fit tail with a RocqStat-style extreme-value model to estimate WCET with probability 1e-9.
Run interference tests with a memory-bandwidth attacker on neighboring core to measure interference contribution and add to the WCET bound.
Complement with static analysis to show no hidden paths, and generate a final WCET bound with documented margins and assumptions.
Collect artifacts: raw samples, EVT fit parameters, static analysis report, hardware configuration, and a traceability matrix linking compute_lockstep() → requirement → test cases → final timing bound.

For ASIL-D and DO-178 DAL A, include tool qualification evidence or follow a qualified workflow using a DO-330 compliant analyzer where necessary.

Advanced strategies and future predictions (2026+)

Looking forward, expect these trends through 2026 and beyond:

Toolchain convergence: Vector's RocqStat integration will accelerate unified workflows where functional tests, static analysis, and statistical WCET share artifacts and traceability.
CI for timing: Hardware-integrated CI that schedules nightly WCET campaigns on rented hardware benches will become common practice for large teams.
Probabilistic safety arguments: Statistical WCET evidence paired with architectural mitigations (e.g., time partitions) will gain acceptance in some certification contexts when accompanied by conservative confidence targets and tool qualification.
ML-assisted anomaly detection: Machine learning will assist in flagging suspicious timing outliers but will not replace deterministic proofs — ML outputs will be used as diagnostics rather than certified evidence.
Increased focus on multicore interference models: vendors and toolmakers will publish standardized interference benchmarks to make comparisons and evidence portable.

Actionable takeaways

Do not treat timing as an afterthought: embed timing markers and a timing harness in your unit test pipeline now.
Combine measurement campaigns with statistical processing to get defensible WCET bounds — use RocqStat-style workflows where available.
Document everything: environment, configuration, and assumptions are as important as measured numbers.
For multicore systems, always include interference experiments and conservative resource models.
Plan for tool qualification and retention of raw artifacts to satisfy DO-178 and ISO 26262 auditors.

Final checklist before submission to certification

Raw trace files and measurement logs archived with immutable identifiers.
WCET reports with stated confidence levels and fitting methodology (statistical or static).
Traceability matrix linking requirements → tests → artifacts.
Tool qualification evidence or a qualified workflow description (DO-330/ISO 26262 workflow).
Hardware configuration, build artifacts, and test harness code committed to version control.

Call to action

Converting your unit tests into a timing-proof suite is a strategic investment: it reduces risk, shortens audit cycles, and prepares your product for the multicore, software-defined future. If you want a practical jumpstart, tunder.cloud can help architect a timing test pipeline, integrate RocqStat-style analytics into your CI, and produce a certification-ready timing assurance package tailored to ISO 26262 or DO-178. Contact our team for a lab assessment and hands-on workshop to integrate timing evidence into your development lifecycle.

tunder

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.