How We Built a Real-Time Fraud Detection System That Processes 12K Transactions/Second
The technical story behind VaultChain: event-driven architecture with Go, PostgreSQL partitioning, ML model serving, and how we hit <50ms latency at scale.
When a fintech client asked us to build a fraud detection system, the requirement was clear: every transaction must be scored before it clears. At 12,000 transactions per second during peak hours, that means sub-50ms end-to-end latency with zero dropped events. Here is how we built it.
The architecture has three layers. The ingestion layer receives transaction events via a Go service fronted by an AWS ALB. We chose Go over Node.js because goroutines gave us 10x better concurrency for I/O-bound work at this scale. Each event is immediately written to a Kafka topic partitioned by merchant ID, giving us ordering guarantees per merchant while allowing parallel processing across merchants.
The scoring layer runs a two-stage pipeline. Stage one applies deterministic rules: velocity checks (same card used 5 times in 60 seconds), geographic impossibility (transaction in Istanbul, then London 10 minutes later), and amount anomalies. These rules alone catch 40% of fraud with near-zero latency because they only require in-memory lookups against a Redis cluster.
Stage two runs the ML model. We serve a LightGBM model via a custom Go service, not a Python Flask wrapper. The model was trained on 18 months of labeled transaction data (2.3M samples, 0.8% fraud rate). Feature engineering was the hard part: we compute 47 features per transaction including rolling averages, time-of-day patterns, merchant category risk scores, and device fingerprint similarity. Model inference takes 8-12ms per transaction.
PostgreSQL handles the persistence layer, but not in the way most people use it. We partition the transactions table by date (monthly) with sub-partitions by risk score range. This keeps hot queries fast: investigating recent high-risk transactions hits a tiny partition instead of scanning terabytes. We also use BRIN indexes on timestamp columns, which are 100x smaller than B-tree indexes for time-series data.
The key engineering decision was making the system eventually consistent rather than strongly consistent. The transaction is approved or flagged within 50ms based on the real-time score. But the full audit trail, merchant risk profile update, and compliance report generation happen asynchronously via Kafka consumers. This separation is what makes the latency target achievable.
Results after 3 months: $2.1M in fraud prevented, 99.97% uptime, p95 latency of 38ms, and zero false positive rate increase compared to the client's previous rule-based system. The ML model catches fraud patterns that static rules miss, like slow-drip card testing where small transactions test card validity over days before a large charge.
The lesson: real-time systems at scale are not about choosing the fastest language or database. They are about designing the right data flow, separating hot path from cold path, and making deliberate trade-offs between consistency and latency. Every millisecond in the hot path was earned through profiling and measurement, not guessing.
Let's discuss your project
15 minutes, no commitment.