Benchmarks
The benchmark suite is meant to support a narrow claim: ferro-ta is often faster on selected indicators, and the evidence is published in a reproducible form.
What is published
The authoritative benchmark workflow lives in benchmarks/:
Cross-library speed suite:
benchmarks/test_speed.pyCross-library accuracy suite:
benchmarks/test_accuracy.pyTA-Lib head-to-head script:
benchmarks/bench_vs_talib.pyBacktesting engine benchmark:
benchmarks/bench_backtest.pyTable generation from benchmark JSON:
benchmarks/benchmark_table.pyPerf-contract artifact bundle:
benchmarks/run_perf_contract.py
Backtesting engine — competitor comparison
Measured on Apple M-series, Python 3.13, Rust 1.91, using an SMA(20/50) crossover strategy with 0.1% commission and 5 bps slippage. Median of 5 runs.
Library |
1k bars |
10k bars |
100k bars |
vs ferro-ta core (100k) |
|---|---|---|---|---|
ferro-ta |
0.004 ms |
0.033 ms |
0.286 ms |
— |
ferro-ta |
0.004 ms |
0.037 ms |
0.332 ms |
~same |
NumPy vectorized (manual) |
0.013 ms |
0.042 ms |
0.459 ms |
1.6× slower |
vectorbt 0.28 |
1.32 ms |
1.31 ms |
2.90 ms |
10× slower |
backtesting.py |
10.5 ms |
42.3 ms |
319.6 ms |
1,117× slower |
backtrader 1.9 |
53.9 ms |
518 ms |
n/a (skipped) |
>15,000× slower |
Accuracy: ferro-ta positions and bar-returns are bit-exact against the NumPy reference implementation (max per-bar equity diff = 0.00e+00 with zero commission/slippage).
Additional ferro-ta capabilities not present in the libraries above:
Capability |
ferro-ta result |
NumPy baseline |
Speedup |
|---|---|---|---|
Monte Carlo 1,000 sims (100k bars) |
50 ms (parallel Rayon + LCG) |
612 ms (Python loop) |
12× |
23 performance metrics, single call (100k bars) |
2.8 ms |
0.36 ms (2 metrics only) |
0.12 ms / metric |
Multi-asset 100 assets (100k bars) |
43 ms parallel / 88 ms serial |
— |
2× parallel speedup |
Walk-forward fold indices (100k bars) |
0.3 µs |
— |
— |
Reproduce the backtest benchmark:
python benchmarks/bench_backtest.py --sizes 10000 100000 \
--json benchmarks/artifacts/latest/bench_backtest_results.json
Latest checked-in TA-Lib artifact
The current checked-in TA-Lib comparison artifact benchmarks contiguous
float64 arrays at 10k and 100k bars on an Apple M3 Max with 14 logical
cores, about 38.7 GB RAM, CPython 3.13.5, and Rust 1.91.1 using the
default release profile (lto = true, codegen-units = 1).
Summary from benchmarks/artifacts/latest/benchmark_vs_talib.json:
Size |
Rows |
ferro-ta wins |
Median speedup |
TA-Lib wins or ties |
|---|---|---|---|---|
|
12 |
6 |
|
|
|
12 |
6 |
|
|
Examples from the 100k-bar run:
Indicator |
ferro-ta |
TA-Lib |
Speedup |
Read |
|---|---|---|---|---|
|
|
|
|
clear ferro-ta win |
|
|
|
|
clear ferro-ta win |
|
|
|
|
ferro-ta win |
|
|
|
|
TA-Lib win |
|
|
|
|
TA-Lib win |
|
|
|
|
tie on this machine |
Methodology notes
The head-to-head script uses the same synthetic OHLCV generator, the same parameters, and the same contiguous
float64array layout for both libraries.Reported speedup is
TA-Lib median time / ferro-ta median time.The script uses 1 warmup run and 7 measured runs per case, and now records the full per-run timing samples, not just one selected number.
Published JSON artifacts include machine/runtime metadata, git metadata, Rust toolchain and build-profile metadata, per-run variance statistics, and Python-tracked peak allocation snapshots.
Allocation snapshots are based on
tracemallocand capture Python-tracked allocations only; they are not full native RSS profiles.If your workload uses non-contiguous arrays, different dtypes, or different batch sizes, benchmark that exact workload. Those factors can materially change the result.
Reproduce the TA-Lib comparison
pip install ta-lib
python benchmarks/bench_vs_talib.py --sizes 10000 100000 --json benchmark_vs_talib.json
The JSON output is the main artifact to review when publishing performance claims.
Cross-library suite
Run the broader speed suite on 100,000 bars:
uv run pytest benchmarks/test_speed.py --benchmark-only --benchmark-json=benchmarks/results.json -v
Selected throughput examples from the checked-in table:
Indicator |
Throughput |
|---|---|
|
1.9 G bars/s |
|
454 M bars/s |
|
444 M bars/s |
|
259 M bars/s |
|
145 M bars/s |
|
70 M bars/s |
|
104 M bars/s |
|
33 M bars/s |
Perf-contract artifacts
Use the perf-contract runner when you want a compact, machine-readable artifact bundle for single-series latency, batch throughput, streaming throughput, and hotspot attribution:
uv run python benchmarks/run_perf_contract.py --output-dir benchmarks/artifacts/latest
See benchmarks/README.md for the detailed benchmark playbook and the
checked-in comparison tables.