Benchmarks ========== The benchmark suite is meant to support a narrow claim: ferro-ta is often faster on selected indicators, and the evidence is published in a reproducible form. What is published ----------------- The authoritative benchmark workflow lives in ``benchmarks/``: - Cross-library speed suite: ``benchmarks/test_speed.py`` - Cross-library accuracy suite: ``benchmarks/test_accuracy.py`` - TA-Lib head-to-head script: ``benchmarks/bench_vs_talib.py`` - Backtesting engine benchmark: ``benchmarks/bench_backtest.py`` - Table generation from benchmark JSON: ``benchmarks/benchmark_table.py`` - Perf-contract artifact bundle: ``benchmarks/run_perf_contract.py`` Backtesting engine — competitor comparison ------------------------------------------ Measured on Apple M-series, Python 3.13, Rust 1.91, using an SMA(20/50) crossover strategy with 0.1% commission and 5 bps slippage. Median of 5 runs. .. list-table:: Speed vs backtesting libraries (signal → equity curve) :header-rows: 1 * - Library - 1k bars - 10k bars - 100k bars - vs ferro-ta core (100k) * - **ferro-ta** ``backtest_core`` - 0.004 ms - 0.033 ms - 0.286 ms - — * - **ferro-ta** ``backtest_ohlcv_core`` - 0.004 ms - 0.037 ms - 0.332 ms - ~same * - NumPy vectorized (manual) - 0.013 ms - 0.042 ms - 0.459 ms - 1.6× slower * - vectorbt 0.28 - 1.32 ms - 1.31 ms - 2.90 ms - **10× slower** * - backtesting.py - 10.5 ms - 42.3 ms - 319.6 ms - **1,117× slower** * - backtrader 1.9 - 53.9 ms - 518 ms - n/a (skipped) - **>15,000× slower** Accuracy: ferro-ta positions and bar-returns are **bit-exact** against the NumPy reference implementation (max per-bar equity diff = 0.00e+00 with zero commission/slippage). Additional ferro-ta capabilities not present in the libraries above: .. list-table:: :header-rows: 1 * - Capability - ferro-ta result - NumPy baseline - Speedup * - Monte Carlo 1,000 sims (100k bars) - 50 ms (parallel Rayon + LCG) - 612 ms (Python loop) - **12×** * - 23 performance metrics, single call (100k bars) - 2.8 ms - 0.36 ms (2 metrics only) - 0.12 ms / metric * - Multi-asset 100 assets (100k bars) - 43 ms parallel / 88 ms serial - — - 2× parallel speedup * - Walk-forward fold indices (100k bars) - 0.3 µs - — - — Reproduce the backtest benchmark: .. code-block:: bash python benchmarks/bench_backtest.py --sizes 10000 100000 \ --json benchmarks/artifacts/latest/bench_backtest_results.json Latest checked-in TA-Lib artifact --------------------------------- The current checked-in TA-Lib comparison artifact benchmarks contiguous ``float64`` arrays at 10k and 100k bars on an ``Apple M3 Max`` with 14 logical cores, about 38.7 GB RAM, ``CPython 3.13.5``, and ``Rust 1.91.1`` using the default release profile (``lto = true``, ``codegen-units = 1``). Summary from ``benchmarks/artifacts/latest/benchmark_vs_talib.json``: .. list-table:: :header-rows: 1 * - Size - Rows - ferro-ta wins - Median speedup - TA-Lib wins or ties * - ``10,000`` - 12 - 6 - ``1.0850x`` - ``EMA``, ``RSI``, ``ATR``, ``STOCH``, ``ADX``, ``OBV`` * - ``100,000`` - 12 - 6 - ``1.0784x`` - ``EMA``, ``RSI``, ``ATR``, ``STOCH``, ``ADX``, ``OBV`` Examples from the 100k-bar run: .. list-table:: :header-rows: 1 * - Indicator - ferro-ta - TA-Lib - Speedup - Read * - ``SMA`` - ``0.0985 ms`` - ``0.2241 ms`` - ``2.2751x`` - clear ferro-ta win * - ``BBANDS`` - ``0.2122 ms`` - ``0.4966 ms`` - ``2.3402x`` - clear ferro-ta win * - ``MACD`` - ``0.5152 ms`` - ``0.7111 ms`` - ``1.3801x`` - ferro-ta win * - ``STOCH`` - ``1.7064 ms`` - ``0.7603 ms`` - ``0.4455x`` - TA-Lib win * - ``ADX`` - ``0.7910 ms`` - ``0.5769 ms`` - ``0.7294x`` - TA-Lib win * - ``ATR`` - ``0.5087 ms`` - ``0.5147 ms`` - ``1.0118x`` - tie on this machine Methodology notes ----------------- - The head-to-head script uses the same synthetic OHLCV generator, the same parameters, and the same contiguous ``float64`` array layout for both libraries. - Reported speedup is ``TA-Lib median time / ferro-ta median time``. - The script uses 1 warmup run and 7 measured runs per case, and now records the full per-run timing samples, not just one selected number. - Published JSON artifacts include machine/runtime metadata, git metadata, Rust toolchain and build-profile metadata, per-run variance statistics, and Python-tracked peak allocation snapshots. - Allocation snapshots are based on ``tracemalloc`` and capture Python-tracked allocations only; they are not full native RSS profiles. - If your workload uses non-contiguous arrays, different dtypes, or different batch sizes, benchmark that exact workload. Those factors can materially change the result. Reproduce the TA-Lib comparison ------------------------------- .. code-block:: bash pip install ta-lib python benchmarks/bench_vs_talib.py --sizes 10000 100000 --json benchmark_vs_talib.json The JSON output is the main artifact to review when publishing performance claims. Cross-library suite ------------------- Run the broader speed suite on 100,000 bars: .. code-block:: bash uv run pytest benchmarks/test_speed.py --benchmark-only --benchmark-json=benchmarks/results.json -v Selected throughput examples from the checked-in table: .. list-table:: :header-rows: 1 * - Indicator - Throughput * - ``ADD`` - 1.9 G bars/s * - ``CDLENGULFING`` - 454 M bars/s * - ``EMA`` - 444 M bars/s * - ``SMA`` - 259 M bars/s * - ``RSI`` - 145 M bars/s * - ``ATR`` - 70 M bars/s * - ``MACD`` - 104 M bars/s * - ``STOCH`` - 33 M bars/s Perf-contract artifacts ----------------------- Use the perf-contract runner when you want a compact, machine-readable artifact bundle for single-series latency, batch throughput, streaming throughput, and hotspot attribution: .. code-block:: bash uv run python benchmarks/run_perf_contract.py --output-dir benchmarks/artifacts/latest See ``benchmarks/README.md`` for the detailed benchmark playbook and the checked-in comparison tables.