Building the Future of Real-Time Fraud Detection, Maksym Tkach, Frogo CTO

Frogo

Maksym Tkach, Chief Technology Officer at Frogo, sits down with SBC News ahead of the SBC Summit in Lisbon to discuss the company’s breakthrough innovations in fraud prevention. From solving the hardest latency challenges in real-time detection to building a scalable, cloud-native architecture with Go and microservices, he shares how Frogo became the trusted partner for high-risk industries like fintech and iGaming.

What’s the biggest technical challenge you solved while building Frogo’s real-time fraud detection engine?

That’s a great question. The single biggest challenge was resolving the fundamental conflict between detection complexity and real-time performance. Our clients, especially in fintech and iGaming, need to evaluate dozens, sometimes hundreds of data points and rules in a single transaction. But they also have strict SLAs, often under 500 milliseconds, because any added latency directly costs them money in lost conversions or rejected transactions. Doing both seemed impossible.

Our breakthrough came when we broke the problem down into two parts: rule orchestration and data retrieval.

For rule orchestration, we found that standard sequential rule engines were too slow. Many rules could be run in parallel but some depended on the output of others. To solve this, my team and I designed a proprietary dependency graph execution engine we call ‘LTee’. It stands for Layered Tree. Before processing a transaction, LTee analyzes the entire rule set and organizes it into layers. Layer 1 contains all independent rules that can be executed in parallel immediately. Layer 2 contains rules that depend on Layer 1’s results, and so on. This model allows us to achieve maximum parallelization while respecting data dependencies. In the worst case, it runs sequentially but for most transactions, it dramatically reduces execution time.

For data retrieval, the bottleneck was calculating complex aggregations on the fly, for example, ‘What’s the 95th percentile of this user’s transaction amount over the last 7 days?’. Querying a traditional database for this during a transaction is a non-starter for our latency budget. So, we made a strategic trade-off. We use Aerospike as a specialized time-series aggregation store. We pre-calculate and store metrics like standard deviation, percentiles and averages across various time windows and granularities. It consumes more storage, but it turns a computationally expensive query into a simple, sub-millisecond key-value lookup. For high-cardinality counts, like ‘how many distinct devices has this user logged in from?’, we use the proprietary approach algorithm to get near-instant, memory-efficient estimates.

By combining these solutions, we turned our engine from a sequential process into a highly parallelized system fueled by pre-computed data. The result was a reduction in our p99 latency from over 5000ms to under 300ms, even with complex rule sets. This didn’t just meet our SLAs; it became our core competitive advantage and allowed us to win major clients who had previously found other systems to be too slow.

Why did you choose Go and a microservice architecture and how has it shaped Frogo’s scalability?

This was a foundational decision driven by our need for performance, reliability, and developer velocity.

For the language, we chose Go for three specific reasons:

Exceptional Concurrency for Our Workload: Fraud detection is an I/O-bound problem. For a single API call, we might need to fetch user data from Aerospike, check a device fingerprint, and query a 3rd party service simultaneously. Go’s goroutines are lightweight and perfect for this. We can fire off thousands of concurrent operations efficiently, which is key to keeping our latency low. A language with heavier threads, like Java, would have been far less resource-efficient for our specific use case.
Performance and Simplicity: We adhere to the KISS principle. Go is simple to read and write, which speeds up onboarding for new engineers and reduces bugs. Critically, it compiles to a small, single binary with performance that approaches C++. This gives us raw speed without the complexity and memory safety issues.
A Perfect Fit for the Cloud-Native Era: The small, self-contained binaries are trivial to containerize with Docker, leading to fast build times, small image sizes, and quick startup times for our pods in Kubernetes.

For the architecture, we went with microservices to achieve true scalability and resilience:

Targeted, Independent Scaling: Our data ingestion service might see a spike during a client’s marketing campaign, while our machine learning inference service scales with overall transaction volume. We don’t have to scale a giant monolith; we scale only the specific component that’s under load.
Resilience: A bug or failure in our asynchronous reporting service has zero impact on our real-time transaction processing. This fault isolation is non-negotiable for a critical system like ours.

So, how has this combination shaped our scalability? It’s been a force multiplier. Go gives us highly efficient, concurrent services that are cheap to run. The microservice architecture lets us deploy, manage, and scale these services precisely where needed. This means we’ve built a system that scales not just on a technical performance level, but also on an organizational and financial level.

How do you handle massive traffic spikes without sacrificing detection accuracy or system availability?

That’s the core challenge in our business. We’ve built a multi-layered, defense-in-depth strategy to ensure we maintain both availability and accuracy, because our clients’ trust depends on it.

First, our architecture is designed to absorb shocks. We have a single, unified API endpoint that handles interactions based on the client’s needs. A client can choose to hold the connection for a synchronous response or disconnect immediately for an asynchronous webhook. Internally, regardless of the client’s choice, every request is immediately published to a NATS streaming queue. This acts as our system’s shock absorber, completely decoupling traffic ingestion from the actual processing.

Second, we’ve engineered a two-level elastic scaling model.

Level 1: In-Application Scaling. This is one of our key innovations. We developed a proprietary autoscaling library built directly into our Go services. This library monitors our NATS consumers for any processing delays or backlogs (e.g., blocked operations). If it detects a queue forming, it automatically increases the size of the goroutine worker pool within the running pod. This allows a single pod to dynamically use more CPU to process messages more intensively, giving us a near-instant response to a burst without the latency of waiting for a new pod to start.
Level 2: Infrastructure Scaling. This intense, dynamic CPU usage from our worker pools then becomes a perfect signal for the Kubernetes Horizontal Pod Autoscaler. As our services work harder and their CPU utilization climbs, Kubernetes seamlessly scales the number of pods horizontally.

Third, we have a plan for graceful degradation. If a truly unprecedented event occurs, our system can automatically enter a high-load state, temporarily disabling a few of the most computationally expensive ML models that have the least impact on overall accuracy. This sheds significant load while keeping the core fraud detection rules running at full speed.

Finally, we validate this strategy relentlessly. We conduct quarterly large-scale load tests and “game days” where we simulate client traffic spikes and even datacenter failures. This gives us hard data on our breaking points and proves that our two-level scaling model works as designed.

You’re using tools like Aerospike, AWS Neptune, and Elasticsearch. How do they work together in Frogo’s environment?

That’s a great question, as the synergy between these tools is central to our platform’s power. We see them as serving three distinct needs: real-time speed, deep relational insight, and historical analysis.

Let me walk you through the lifecycle of a single transaction:

The Hot Path (Aerospike): When a transaction request first hits our engine, our latency budget is in the low double-digit milliseconds. The first and most critical data enrichment happens against Aerospike. This is our high-speed store for aggregated, time-series data.
The Deep Path (Neptune): If the transaction is flagged as suspicious, we might query AWS Neptune. While higher-latency, it allows us to uncover sophisticated fraud like multi-account collusion by analyzing graph relationships.
The Audit & Investigation Path (Elasticsearch): Regardless of the outcome, every transaction is logged into Elasticsearch. This is our system of record, enabling fraud analysts to perform complex, multi-field searches in seconds.

The real magic happens in our analyst dashboard, which brings together the speed of Elasticsearch and the relational graph context of Neptune, offering both in a single, unified view.

In short: Aerospike gives us speed, Neptune gives us depth, and Elasticsearch gives us memory and analytical power.

What makes Frogo’s fraud prevention tech stand out, especially for high-risk industries like iGaming and fintech?

For industries like iGaming and fintech, fraud is a constantly evolving adversary. A generic, one-size-fits-all solution is doomed to fail. We built Frogo to be different, and our advantage comes down to three key differentiators:

Hybrid Intelligence Engine: We combine ultra-fast rule-based decisioning with advanced ML models and graph analysis. This hybrid approach gives clients both speed at scale and deep intelligence where it matters most.
Radical Adaptability: Our flexible policy engine allows clients to react to new fraud threats in hours, not weeks. Analysts can enable/disable rules and models dynamically, tailoring the system to evolving risks.
Partnership Model: Our analytics and ML teams work directly with clients to co-develop strategies and custom models. We don’t just sell technology—we integrate expertise into their operations.

This unique combination allows our clients not just to fight fraud, but to stay ahead of it.

Frogo has built its reputation on solving the hardest technical challenges in fraud detection balancing speed, accuracy and scalability. Its hybrid intelligence model, elastic scaling and seamless use of modern data technologies make it a trusted partner for fintech and iGaming companies worldwide.

Join Frogo at Booth #D183, SBC Summit, Lisbon, September 16–18, to discover how they’re shaping the future of fraud prevention.

See more breaking stories here.

Building the Future of Real-Time Fraud Detection, Maksym Tkach, Frogo CTO

What’s the biggest technical challenge you solved while building Frogo’s real-time fraud detection engine?

Leave a Comment Cancel reply