time_series_base: A Rust Time Series Engine

Time series data sits at the center of most of what we build. Market data, quantum circuit telemetry, model training logs — they all share the same fundamental shape: ordered observations over time, with queries that care about windows, similarity, and fast sequential reads. We kept reinventing the same storage and query patterns across projects, so we consolidated everything into time_series_base: a 7-crate Rust workspace purpose-built for time series workloads.

Workspace Layout

The seven crates each own a clear boundary:

ts-common — shared types, error handling, and configuration. Every other crate depends on this.
ts-store — the hybrid data lake. A SQL index (SQLite for local, Postgres for deployed) tracks metadata and partition boundaries, while the actual time series data lives in partitioned flat files on disk or object storage.
ts-format — custom text and binary serialization formats optimized for columnar time series data, plus CSV and JSON adapters for interoperability.
ts-tss — the Temporal Similarity Search engine. This is the most specialized crate and the one that motivated the whole project.
ts-quant — quantitative analysis primitives: rolling statistics, return calculations, drawdown analysis, and feature extraction.
ts-gateway — a REST and gRPC gateway that exposes the storage and search engines as network services.
ts-cli — the command-line interface for ingestion, querying, and pipeline orchestration.

Temporal Similarity Search

The TSS engine supports two modes. Vector-based similarity search embeds fixed-length time series windows into a vector space (using signature features or learned embeddings) and indexes them for approximate nearest-neighbor retrieval. Sliding-window search computes a distance metric — DTW, Euclidean, or normalized correlation — across a sliding window over the stored series. The first mode is fast and scales to large datasets; the second is exact and better for short, precise pattern matching.

We use TSS extensively in the finance_lab pipeline for finding historical analogues to current market conditions and for validating that synthetic data preserves the similarity structure of real data.

The Hybrid Data Lake

Pure SQL databases struggle with the sequential-scan patterns that time series analysis demands. Pure file-based storage lacks queryability. We split the difference: the SQL index stores partition keys, time ranges, schema metadata, and user-defined tags. The actual data lives in partitioned flat files — one file per (series, time-partition) pair — in the custom binary format from ts-format.

This gives us fast metadata queries ("which series cover 2024 Q3 with daily frequency?") backed by SQL, and fast sequential reads over the data itself backed by memory-mapped file I/O. Partition boundaries are aligned to natural time intervals, so most queries touch a small number of files.

Frozen Python Tools

Some analysis tools predate the Rust rewrite. Rather than porting them immediately, we wrapped four Python CLI tools — regime-cluster, sig-regime, sig-compute, and market-basket — via subprocess calls from ts-cli. They run in a frozen conda environment pinned to known versions. This is not elegant, but it let us ship the Rust workspace without blocking on a full rewrite of mature Python code. The plan is to port the critical paths to Rust over the coming quarters and keep the Python tools as validation references.

MCP Server for LLM Agents

The ts-gateway crate includes an MCP (Model Context Protocol) server that exposes time series query and search capabilities to LLM agents. An agent can discover available series, run similarity searches, fetch windowed data, and trigger analysis pipelines — all through structured MCP tool calls. We use this internally to let Claude-based agents reason over financial data without writing custom integration code for each workflow.

Why Rust

Time series engines are I/O-bound and allocation-sensitive. Rust's zero-cost abstractions let us write high-level iterator chains that compile down to tight loops with no heap allocation in the hot path. Memory safety eliminates an entire class of bugs that plague C/C++ data engines, and the type system catches schema mismatches at compile time.