This project is a framework for Property-Based testing of hardware designs.
In simple words, property-based testing is a kind of testing in which the user sets a number of properties for the system, and the computer generates test cases by itself.
Unlike formal verification we don’t try all variants, which can take a lot of time, and we don’t shift the responsibility for creating tests to a human (he will either forget to check the extreme case , or he will think about the coverage points and implement them by hand, which we would like to avoid).
Since this framework uses Hypothesis as a generator, here is an example of its use in python:
from hypothesis import given, strategies as st
@given(st.lists(st.integers() | st.floats()))
def test_sort_correct(lst):
# lst is a random list of numbers
assert my_sort(lst) == sorted(lst)
test_sort_correct()
This is a basic example from the documentation, here we test our own implementation of sorting on int and float values.
You can read more about generating feedforward values here.
So while this is an early prototype I will use poetry to run…
If you execute this command in the root of the decloned project:
poetry run python main.py counter ./example/counter.v
Then we can find 2 files in the ./gen
directory:
counter_interface.py
run_counter.py
Let’s talk about the first one, it contains a description-based testbench generation function on the input sequences:
def drive_counter(clk_seq, rst_n_seq, count_seq):
...
And here is the module from which everything was generated:
module counter (
input wire clk,
input wire rst_n,
output reg [7:0] count
);
initial begin
count = 0;
end
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) count <= 0;
else count <= count + 1;
end
endmodule
In addition to the input signals, the function accepts the expected values of the output signals. This is how you can use this function:
from gen.counter_interface import drive_counter
clk_seq = [0, 1, 0, 1]
rst_n_seq = [0, 1, 1, 1]
count_seq = [0, 1, 1, 2]
# Call the drive function
drive_counter(clk_seq, rst_n_seq, count_seq)
Such a call will create the next testbench:
// Auto-generated testbench for counter on 2025-06-15T00:47:04.259455
`timescale 1ns/1ps
module counter_tb;
logic clk;
logic rst_n;
logic [7:0] count;
integer cycle; // Test cycle counter
// Instantiate DUT
counter dut (.clk(clk), .rst_n(rst_n), .count(count));
// VCD Dumping
initial begin
$dumpfile("gen/dump/counter_tb_20.vcd");
$dumpvars(0, counter_tb);
end
initial begin
cycle = 0; // Initialize counter
clk = 1'b0;
rst_n = 1'b0;
if (count !== 8'd0) begin
$error("Cycle %0d: count: expected=%0d, actual=%0d", cycle, 8'd0, count);
end
#1; // Wait for signals to propagate
cycle = cycle + 1; // Next test cycle
clk = 1'b1;
rst_n = 1'b1;
if (count !== 8'd1) begin
$error("Cycle %0d: count: expected=%0d, actual=%0d", cycle, 8'd1, count);
end
#1; // Wait for signals to propagate
cycle = cycle + 1; // Next test cycle
clk = 1'b0;
rst_n = 1'b1;
if (count !== 8'd1) begin
$error("Cycle %0d: count: expected=%0d, actual=%0d", cycle, 8'd1, count);
end
#1; // Wait for signals to propagate
cycle = cycle + 1; // Next test cycle
clk = 1'b1;
rst_n = 1'b1;
if (count !== 8'd2) begin
$error("Cycle %0d: count: expected=%0d, actual=%0d", cycle, 8'd2, count);
end
#1; // Wait for signals to propagate
cycle = cycle + 1; // Next test cycle
$finish;
end
endmodule
Now it looks a bit sketchy, but that will change in the future….
In this particular case, we set the input and output values ourselves, but how do we set them automatically?
The idea was to give the user a convenient format for describing the input data, so let’s look at the generated run_counter.py
file:
# Auto-generated Hypothesis test for module counter
import os
import pytest
import hypothesis.strategies as st
from hypothesis import given, settings, HealthCheck
from counter_interface import drive_counter
# Hypothesis configuration
@settings(
max_examples=20,
deadline=None,
suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture]
)
@given(
clk_seq=st.lists(st.booleans(), min_size=10, max_size=100),
rst_n_seq=st.lists(st.booleans(), min_size=10, max_size=100),
count_seq=st.just([0]*100) # Dummy output
)
def test_counter(
clk_seq, rst_n_seq, count_seq
):
# Convert boolean to 0/1 for clk
clk_seq = [1 if x else 0 for x in clk_seq]
# Convert boolean to 0/1 for rst_n
rst_n_seq = [1 if x else 0 for x in rst_n_seq]
# Ensure all sequences have same length
num_cycles = min(len(seq) for seq in [clk_seq, rst_n_seq, count_seq])
clk_seq = clk_seq[:num_cycles]
rst_n_seq = rst_n_seq[:num_cycles]
count_seq = count_seq[:num_cycles]
# Generate testbench with sequences
drive_counter(clk_seq, rst_n_seq, count_seq)
if __name__ == '__main__':
pytest.main([__file__])
Unless we specify specific properties the framework knows nothing about what data should go into the module,
but this file should provide a framework for further description,
for example we have useful limits on the number of tests (max_examples
),
the number of inputs (see the @given
directive).
Let’s change test_counter
so that it makes some sense:
@settings(
max_examples=100,
deadline=None,
suppress_health_check=[HealthCheck.too_slow, HealthCheck.function_scoped_fixture]
)
@given(
rst_n_seq=st.lists(st.integers(0, 1), min_size=10)
)
def test_counter( rst_n_seq ):
num_cycles = len(rst_n_seq)
clk_seq = [(i % 2) for i in range(num_cycles)]
count_seq = reference_counter(clk_seq, rst_n_seq)
# Generate testbench with sequences
drive_counter(clk_seq, rst_n_seq, count_seq)
I won’t give the reference_counter
implementation here, it can be found in ref_model/ref_counter.py
(potentially, you can reuse reference models from other languages if you write an adapter).
Back in the new implementation, we’ve clarified how clk should behave, since clock generators usually behave this way.
This test will create 100 testbenches that succeed. The reader is invited to find the bug himself
reference model or hardware design for the example/multiplier_pipe.v
module (the model is in ref_model/ref_multiplier.py
).
I asked the neural network to generate some block with registers, now it can be found in example/csr.v
.
The diagram below is supposed to represent the transition between FSM states.
Of course, I was too lazy to write a reference model, so I limited myself to the input data. This time I decided to split my test into a set of strategies, so that I could put them together later.
import os
import pytest
import hypothesis.strategies as st
from hypothesis import given, settings, HealthCheck
from csr_interface import drive_csr
# Constants from the Verilog module
CTRL_ADDR = 0x00
STATUS_ADDR = 0x04
CONFIG_ADDR = 0x08
DATA_IN_ADDR = 0x0C
DATA_OUT_ADDR = 0x10
VERSION_ADDR = 0xFC
START_BIT = 0
RESET_BIT = 1
IE_BIT = 2
DONE_BIT = 0
BUSY_BIT = 1
ERR_BIT = 2
@st.composite
def base_signals(draw, num_cycles):
"""Basic signals: clock and reset with 5% reset probability"""
# Generate clock signal (deterministic)
clk_seq = [i % 2 for i in range(num_cycles)]
# Generate reset signal with:
# - First 3 cycles: guaranteed reset (0)
# - Subsequent cycles: 2% probability of reset (0), 98% normal (1)
reset_n_seq = [0] * min(num_cycles, 3) # Handle cases with <3 cycles
# For remaining cycles
if num_cycles > 3:
reset_tail = draw(
st.lists(
st.integers(min_value=0, max_value=99),
min_size=num_cycles-3,
max_size=num_cycles-3
)
)
reset_n_seq += [0 if x < 2 else 1 for x in reset_tail]
return clk_seq, reset_n_seq
@st.composite
def config_strategy(draw, cycle):
"""Configuration recording strategy"""
config_value = draw(st.one_of(
st.integers(1, 10), # Short processing
st.integers(200, 255) # Long processing
))
return {
'cycle': cycle,
'addr': CONFIG_ADDR,
'wdata': config_value,
'wr_en': 1,
'rd_en': 0
}, config_value
@st.composite
def start_strategy(draw, cycle):
"""Processing launch strategy"""
return {
'cycle': cycle,
'addr': CTRL_ADDR,
'wdata': 1 << START_BIT,
'wr_en': 1,
'rd_en': 0
}
@st.composite
def status_strategy(draw, cycle):
"""Status reading strategy"""
return {
'cycle': cycle,
'addr': STATUS_ADDR,
'wdata': 0,
'wr_en': 0,
'rd_en': 1
}
@st.composite
def complete_strategy(draw, cycle, error_mode):
"""Processing completion strategy"""
if error_mode:
wdata = 1 << RESET_BIT
else:
wdata = 0
return {
'cycle': cycle,
'addr': CTRL_ADDR,
'wdata': wdata,
'wr_en': 1,
'rd_en': 0
}
@st.composite
def random_operation(draw, cycle):
"""Accidental anti-sticking operation"""
if draw(st.booleans()):
return {
'cycle': cycle,
'addr': 0,
'wdata': 0,
'wr_en': 0,
'rd_en': 0
}
else:
# Read random register
addr = draw(st.sampled_from([
STATUS_ADDR, DATA_OUT_ADDR, VERSION_ADDR
]))
return {
'cycle': cycle,
'addr': addr,
'wdata': 0,
'wr_en': 0,
'rd_en': 1
}
Here we just explain what the signals should be for sending values correctly, the idea is that we can write strategies for standard protocols like AXI or CHI and create a standard library from them.
Now it’s time for composition:
# Composite strategy
@st.composite
def fsm_sequence_strategy(draw):
num_cycles = 100
clk_seq, reset_n_seq = draw(base_signals(num_cycles))
# A list to collect all transactions
all_operations = []
# 1. Configuration operation (after reset)
config_cycle = 5
config_op, config_value = draw(config_strategy(config_cycle))
all_operations.append(config_op)
# 2. Operation to start processing
start_cycle = config_cycle + 2
start_op = draw(start_strategy(start_cycle))
all_operations.append(start_op)
# 3. Defining the type of test
error_mode = config_value > 200
# 4. Reading the status
status_cycle = start_cycle + 2
status_op = draw(status_strategy(status_cycle))
all_operations.append(status_op)
# 5. Completion operation (if we have time)
if not error_mode and (start_cycle + config_value + 5) < num_cycles:
complete_cycle = start_cycle + config_value + 5
complete_op = draw(complete_strategy(complete_cycle, error_mode))
all_operations.append(complete_op)
# 6. Filling the remaining cycles
for cycle in range(num_cycles):
if any(op['cycle'] == cycle for op in all_operations):
continue
# Add a random operation with a 30% probability
if draw(st.floats(0, 1)) < 0.3:
rand_op = draw(random_operation(cycle))
all_operations.append(rand_op)
else:
# Empty cycle
all_operations.append({
'cycle': cycle,
'addr': 0,
'wdata': 0,
'wr_en': 0,
'rd_en': 0
})
all_operations.sort(key=lambda x: x['cycle'])
# Collecting signal sequences
addr_seq = [0] * num_cycles
wdata_seq = [0] * num_cycles
wr_en_seq = [0] * num_cycles
rd_en_seq = [0] * num_cycles
for op in all_operations:
cycle = op['cycle']
if cycle < num_cycles:
addr_seq[cycle] = op['addr']
wdata_seq[cycle] = op['wdata']
wr_en_seq[cycle] = op['wr_en']
rd_en_seq[cycle] = op['rd_en']
return {
'clk_seq': clk_seq,
'reset_n_seq': reset_n_seq,
'addr_seq': addr_seq,
'wdata_seq': wdata_seq,
'wr_en_seq': wr_en_seq,
'rd_en_seq': rd_en_seq,
'num_cycles': num_cycles
}
@settings(
max_examples=20,
deadline=None,
suppress_health_check=[HealthCheck.too_slow, HealthCheck.data_too_large]
)
@given(data=fsm_sequence_strategy())
def test_csr_fsm(data):
clk_seq = data['clk_seq']
reset_n_seq = data['reset_n_seq']
addr_seq = data['addr_seq']
wdata_seq = data['wdata_seq']
wr_en_seq = data['wr_en_seq']
rd_en_seq = data['rd_en_seq']
drive_csr(
clk_seq=clk_seq,
reset_n_seq=reset_n_seq,
addr_seq=addr_seq,
wdata_seq=wdata_seq,
wr_en_seq=wr_en_seq,
rd_en_seq=rd_en_seq,
rdata_seq=None,
ready_seq=None
)
if __name__ == '__main__':
pytest.main([__file__])
Although we generated only 20 tests, we were able to click through the block states by composing a special strategy to do so.
What can be provided to the end user:
A library of strategies for testing protocols and generating standard sequences. For example, you could put clk generation into library function or strategy for sending data over AXI interface.
Add support for X/Z comparisons.
Somehow refuse to generate a whole sequence of signals by the user.
The --uvm
key for testbench compatibility with the methodology.
The existing implementation could be better if:
Remove duplication in generated testbenches.
Refactor the core.py
code so that maintaining the code becomes easier
(e.g. rewrite with a templating tool).
Right now, svapy cannot be used as a replacement for existing industrial verification methods, however, assuming the framework results can be embedded into the UVM methodology, svapy could greatly reduce functional coverage time.