embeddedtutorialtiming

Hands‑on: measuring worst‑case execution time with RocqStat and sample embedded projects

UUnknown

2026-02-05

11 min read

Hands‑on tutorial: add RocqStat to a FreeRTOS project, collect traces, compute WCET, and fix timing violations with reproducible steps.

Hook — when a hard real‑time deadline becomes a late bug

If you've ever shipped an RTOS‑based feature only to discover a sporadic timing violation in the field, you know the cost: late fixes, safety reviews, and lost customer trust. The hard part isn't just measuring average latency — it's proving a safe worst‑case execution time (WCET) under realistic conditions. This hands‑on guide shows how to add RocqStat to a simple RTOS project, collect timing data on hardware, and interpret results to find and fix timing violations.

What you'll learn (quick)

How to instrument a FreeRTOS/Cortex‑M project to collect execution timelines
How to convert captured logs into a format RocqStat can analyze
How to run RocqStat for WCET estimation and interpret its statistical output
Concrete fixes for common timing violations and how to validate them
How to gate WCET checks in CI pipelines — and why Vector's 2026 acquisition of RocqStat matters

The state of timing analysis in 2026 (brief)

By 2026, timing safety is a first‑class engineering requirement for automotive, avionics, and industrial control systems. Toolchains have moved from pure static analysis to hybrid workflows that combine static proofs with measurement‑driven, statistical WCET estimation. In January 2026 Vector Informatik announced it acquired StatInf’s RocqStat tech and team — a signal that mainstream tool vendors want unified verification pipelines that include statistical timing analysis.

"Vector will integrate RocqStat into VectorCAST to unify timing analysis and software verification." — Automotive World, Jan 16, 2026

That integration trend matters: teams need tools that fit CI pipelines, report machine‑readable thresholds, and produce actionable hotspots developers can fix. For organizations that need strong operational traceability and accountability, an edge auditability and decision plane approach helps preserve context around measurements and ensure reproducible thresholds.

Why a measurement‑driven WCET workflow?

Static WCET tools are strong when hardware and cache behavior can be modeled precisely. But embedded teams are increasingly using measurement to complement static proofs because:

Realistic inputs: measurements capture real code paths, I/O interactions, and platform surprises.
Faster iteration: fix → measure → verify loops are quicker than reconfiguring abstract models; many teams pipe probe dumps into serverless ingestion or real‑time ingestion services for fast analytics.
Statistical guarantees: modern tools (like RocqStat) produce high‑confidence upper bounds using rigorous statistics instead of a single sample.

Sample project: FreeRTOS on Cortex‑M (STM32 example)

We'll use a minimal FreeRTOS project with two tasks: a periodic sensor processing task (TaskA) and a background maintenance task (TaskB). The goal: measure TaskA's WCET, find a timing violation, and fix it. Code snippets below are intentionally compact and runnable on a typical STM32 dev board.

Task code (simplified)

/* tasks.c */
#include "FreeRTOS.h"
#include "task.h"
#include "timing_probe.h"  // helper shown below

void TaskA(void *pv) {
    for (;;) {
        probe_start("TaskA_work");
        // simulate variable processing path
        sensor_acquire();
        if (read_config_flag()) {
            complex_filter(); // occasionally expensive
        }
        probe_end("TaskA_work");
        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

void TaskB(void *pv) {
    for (;;) {
        probe_start("TaskB_maintenance");
        // background housekeeping
        flush_logs();
        probe_end("TaskB_maintenance");
        vTaskDelay(pdMS_TO_TICKS(500));
    }
}

Timing probe: DWT cycle counter + circular buffer

On ARM Cortex‑M, the DWT_CYCCNT is the lowest‑overhead clock for cycle‑accurate timing. Use a tiny probe layer that records timestamps and an event id into RAM. At safe points, flush the buffer to host via portable capture devices, UART/SEGGER RTT/ITM, or an automated uploader.

/* timing_probe.h */
#include <stdint.h>

void probe_init(void);
void probe_start(const char *tag);
void probe_end(const char *tag);
void probe_flush(void); // call from a safe, low‑priority context

/* timing_probe.c (key parts) */
#include "timing_probe.h"
#include "stm32f4xx.h" // example

typedef struct { uint32_t ts; uint16_t tag_id; uint8_t ev; } probe_event_t;
#define PROBE_BUF_LEN 4096
static volatile probe_event_t probe_buf[PROBE_BUF_LEN];
static volatile uint32_t probe_idx = 0;

void dwt_init(void) {
    CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
    DWT->CYCCNT = 0;
    DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
}

void probe_init(void) {
    dwt_init();
}

static uint16_t tag_id_for(const char *tag) {
    // tiny hash; in production use table
    uint16_t h = 0;
    while (*tag) h = (h * 31) + (uint8_t)*tag++;
    return h ? h : 1;
}

void probe_record(uint8_t ev, const char *tag) {
    uint32_t i = probe_idx++ & (PROBE_BUF_LEN - 1);
    probe_buf[i].ts = DWT->CYCCNT;
    probe_buf[i].tag_id = tag_id_for(tag);
    probe_buf[i].ev = ev; // 1=start, 2=end
}

void probe_start(const char *tag) { probe_record(1, tag); }
void probe_end(const char *tag) { probe_record(2, tag); }

void probe_flush(void) {
    // simple: write raw buffer using UART or RTT
    // ensure critical section when copying
}

Notes:

Keep the probe code minimal; heavy logging skews results.
Use a circular buffer size that fits your RAM and expected run length.
Record both start and end events so you can recover execution intervals offline.

Collecting traces safely

Choose a transport that doesn't perturb timing. Recommended order:

SEGGER RTT — minimal intrusion and no blocking I/O.
ITM/SWO — low overhead but limited bandwidth.
UART with DMA — reliable but can interfere if transmit buffers block.
Flash logs — highest fidelity, but requires power‑cycle extraction.

Flush the probe buffer from a low‑priority background task (or via a debug break) to avoid influencing task timing. For teams shipping devices in the field, secure and auditable transport of probes is important — consider hardened, auditable upload flows and offline‑first edge hosts such as pocket edge hosts that can aggregate and forward logs when connectivity is available. Record system clock frequency in the log header so timestamps convert to microseconds later.

Converting raw events to intervals

RocqStat expects a sequence of execution times or events per invocation. Convert the start/end event stream to a CSV of intervals. Here's a simple Python utility that reconstructs durations by tag id and outputs CSV.

#!/usr/bin/env python3
import csv
from collections import defaultdict

# load binary dump from target (example format: ts,tag_id,ev)
with open('probe_dump.csv') as f:
    reader = csv.reader(f)
    rows = [(int(r[0]), int(r[1]), int(r[2])) for r in reader]

open_events = defaultdict(list)
intervals = []
for ts, tag, ev in rows:
    if ev == 1:
        open_events[tag].append(ts)
    elif ev == 2 and open_events[tag]:
        start = open_events[tag].pop()
        intervals.append((tag, ts - start))

# write intervals in cycles; later convert to us with CPU freq
with open('intervals.csv', 'w', newline='') as out:
    w = csv.writer(out)
    w.writerow(['tag_id', 'cycles'])
    for tag, cycles in intervals:
        w.writerow([tag, cycles])

Convert cycles to microseconds: microseconds = cycles / (CPU_MHZ).

Running RocqStat (measurement → statistical WCET)

RocqStat is built for measurement‑driven WCET: feed it many observed execution times and it computes a conservative upper bound with user‑specified confidence (for example 99.999%). The exact CLI depends on your RocqStat version; the workflow is:

Provide a dataset (CSV) of observed intervals for the code region you want to bound.
Configure the statistical model and required confidence level.
Run analysis to produce an upper bound, report on outliers, and highlight hotspots.

Example (pseudo‑CLI):

# Convert cycles->us then run rocqstat
$ python3 convert_cycles.py --cycles-file intervals.csv --mhz 168 --out intervals_us.csv
$ rocqstat analyze --input intervals_us.csv --tag 12345 --confidence 0.99999 --output rocq_report.json

Key outputs to expect:

Estimated WCET at the requested confidence level.
Statistical diagnostics: number of samples, tail behavior, and any assumptions violated.
Hotspot mapping: which tags or call paths show the largest variance or highest per‑call times.

Interpreting RocqStat results — what the numbers mean

RocqStat's report typically contains the computed upper bound and diagnostic data. Here's how to read it:

WCET (upper bound): the conservative execution time you must budget to meet your confidence requirement.
Sample count: more samples reduce uncertainty. If sample count is low, the bound may be overly conservative.
Tail diagnostics: indicates whether heavy‑tailed behavior (e.g., occasional cache misses) is present.
Outliers: single huge samples that dominate the bound — investigate their cause (interrupt, DMA stall, page fault).

Common causes of timing violations and practical fixes

Once RocqStat flags a WCET that breaches your deadline, use its hotspots plus run‑time trace to find root causes. Common causes and fixes:

CPU preemption by higher priority tasks / ISRs:
- Fix: raise the priority of the critical task, shorten interrupt handlers, or offload work to deferred tasks. Use interrupt masking sparingly and only for short critical sections.
Cache and memory variability:
- Fix: use cache locking for critical code/data, place hot code in TCM/ITCM, or apply static WCET analysis for cache effects.
Unbounded loops / data dependent path explosion:
- Fix: add loop bounds, use profiling to find worst cases, and consider algorithmic changes (e.g., bounded ring buffers).
Blocking I/O or DMA stalls:
- Fix: use non‑blocking DMA, increase buffer sizes, move blocking work to lower priority tasks.
Instrumentation overhead skew:
- Fix: keep probes minimal and, when possible, use hardware counters (DWT) rather than printf or blocking IO in tight loops.

Example: find and fix a timing violation

Scenario: TaskA has a periodic budget of 5 ms. RocqStat reports WCET(99.999%) = 7.8 ms and highlights spikes corresponding to path where complex_filter() runs while an ISR triggers a long DMA handler.

Inspect trace: match spikes to ISR timestamps → you see a slow SPI DMA completion callback running at high priority.
Fix attempt 1: move DMA completion handling to a low‑priority task by having the ISR only set an event flag.
Re‑measure: new RocqStat WCET = 4.6 ms — success; margin restored.

Key takeaways from this exercise:

Short ISRs and deferring work are low‑effort, high‑impact fixes.
Before/after statistical bounds provide an evidence trail for safety reviews and operational auditability.

CI integration: gating builds on WCET

To prevent regressions, integrate this flow in CI:

Nightly hardware runs that exercise realistic workloads and upload probe dumps to the CI server.
CI step transforms probe dumps into intervals and runs RocqStat with configured confidence. The tool emits a machine‑readable result (JSON) suitable for long‑term storage and trending in an auditability pipeline.
Fail the build if the WCET exceeds the configured threshold, and attach RocqStat diagnostics to the build job for fast triage — this is the kind of operational guardrail SRE teams formalize in an SRE playbook.

In 2026, with Vector moving RocqStat into VectorCAST, expect tighter integration with existing unit + system test flows so WCET checks can run alongside coverage and functional tests inside a single toolchain; teams integrating novel toolchains sometimes consult guidance like developer toolchain playbooks for managing deployments and upgrades.

Advanced strategies and future direction

As embedded systems become more dynamic (heterogeneous MCUs, multicore, and mixed‑criticality scheduling), WCET workflows evolve:

Hybrid analysis: combine static worst‑case path analysis with RocqStat's measurement of environmental effects; teams sometimes combine these traces with serverless ingestion for rapid analysis.
Per‑configuration WCET: produce bounds per feature set or per scheduling configuration and automatically select applicable bounds in CI.
Regression detection: statistical baselines rather than single thresholds — flag significant distribution changes, not only absolute exceedance. For live teams sending many artifact reports, edge‑assisted aggregation can reduce latency between capture and alerting.

Checklist: How to add RocqStat to your RTOS project (practical)

Instrument: add minimal probes (DWT) to start/end regions and map tag IDs to human‑readable names.
Collect: choose non‑blocking transport (RTT/ITM/DMA UART) and capture long runs with realistic inputs.
Convert: turn start/end streams into per‑invocation intervals (CSV) and record CPU frequency.
Analyze: run RocqStat with a required confidence level (e.g., 1e‑5 tail probability) and save JSON reports. Store results in an auditable system as described in operational decision plane guidance.
Triage: inspect top contributors, check for ISRs, cache misses, or blocking calls.
Fix: apply scheduler/ISR/design fixes, re‑measure, and iterate until the bound meets the deadline.
Automate: add to CI and gate on approved WCET values; store historic reports for trend analysis and retention policies that may reuse edge aggregation hosts.

Actionable tips and gotchas

Sample size matters: collect thousands of invocations across varied inputs. Small sample sets inflate statistical uncertainty.
Warm caches: warm the system before recording to capture steady operating behavior and also profile cold cache worst cases if needed.
Separate concerns: only measure the region you care about. Measuring an entire system at once increases variance and reduces diagnostic clarity.
Document the environment: CPU freq, memory map, interrupts enabled, power modes — a change here invalidates past results.
Secure uploads: when devices are in the field, use hardened upload and authentication flows. Field teams often consult guides on secure, portable team operations such as practical field security guides and organization‑wide password hygiene guidance.

Where to go next

Try this tutorial on a real board and run the analysis on a week's worth of operational logs. If you're already a VectorCAST user, watch for the RocqStat integration promised by Vector in 2026 — it will simplify combining functional tests with timing checks in a unified workflow. For teams operating hybrid capture and streaming setups, reviews like the NovaStream Clip field review and edge collaboration playbooks can help choose capture and aggregation tools.

Conclusion — why this matters now

In 2026 the industry expects evidence, not guesswork. Measurement‑driven WCET using tools like RocqStat provides defensible, statistically rigorous bounds that fit into modern CI/CD. Combining lightweight, hardware‑accurate probes with automated statistical analysis helps you find the true root causes of timing violations and validate fixes quickly.

Actionable takeaways

Start small: instrument one critical task and build your pipeline before scaling to the whole system.
Use hardware counters (DWT) to minimize probing overhead.
Collect sufficient, varied samples — then let RocqStat give you a confidence‑controlled upper bound.
Automate WCET checks in CI to catch regressions early.

Call to action

Ready to try this on your board? Clone the companion sample project (FreeRTOS + DWT probe + conversion scripts) from our repo, run the end‑to‑end flow, and share your RocqStat report. If you want a CI starter template or help mapping RocqStat results to safety documentation (ISO 26262/DO‑178C), reach out — we publish an updated integration guide for VectorCAST+RocqStat as the 2026 integration lands.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.