Teaching an Algorithm to Filter False Flags

Blogs Majid Mumtaz, CIA, ACA, FCCA, GRCA Feb 20, 2026

By using an iterative process and collaborating with his operations team on algorithmic anomaly detection, an auditor sees his false positive rate drop from 47% to 3%.

The data scientist looked at me skeptically. "You want the algorithm to flag 'unusual' transactions," he said, “but can you define unusual?" I couldn't, not in the mathematical terms he needed. That conversation, four years ago at a high-volume cloud kitchen operator — a company offering shared commercial kitchen spaces to fulfill online delivery orders — marked the beginning of my education in anomaly detection. I'd imagined deploying machine learning would be straightforward: Feed the system historical data, let it learn patterns, watch it catch fraud. Reality? Humbler and far more iterative.

Our first model flagged 47% of all transactions as anomalous. Operations leadership nearly pulled the plug after week one. What went wrong? We'd trained the algorithm on six months of data that included a major promotional campaign, a delivery zone expansion, and COVID-19 lockdown shifts. The model learned chaos, not normalcy. Our "suspicious" transactions were mostly Friday dinner rushes and weekend breakfast orders — peak business, not fraud.

That failure taught me the first principle of anomaly detection: The algorithm is only as good as your definition of normal. We started over. This time, we segmented data by day of week, time of day, location, and customer tenure before training. We excluded promotional periods and the first month after zone expansions. Dozens of micro-patterns instead of one universal "normal." The second model's false positive rate dropped to 12%. Still too high, but it was progress.

Then came the harder problem: teaching operations to trust it. When the system flagged a restaurant manager's cousin who was ordering meals for up to five different addresses using three different payment methods, operations pushed back. "He's a regular customer," they said. "We know him." They were right about him being regular; he was also regularly exploiting discount codes meant for new customers. But proving that required showing the pattern across weeks, demonstrating the financial impact, and, importantly, acknowledging when the algorithm was wrong.

I learned to present anomalies not as accusations but as questions. Instead of "This is fraud," we framed alerts as "This pattern deviates from this customer's history in these specific ways; can you help us understand why?" That shift in language changed everything. Operations moved from defensive to collaborative. They started adding context: "That's a corporate catering order" or "That customer called about a delivery issue, so we comped three orders." Each piece of feedback became training data.

The iteration process became our methodology. Every week, we reviewed the previous week's alerts in a 30-minute session with operations and finance. We tracked three metrics: true positives (genuine issues), false positives (normal business flagged incorrectly), and false negatives (issues we missed). When false positives spiked, we'd refine the model's parameters. When we missed something obvious in retrospect, we added new detection rules. The algorithm learned; but more importantly, we learned — about our business patterns, our blind spots, and the edge cases that no amount of historical data could predict.

Six months in, we hit our stride. The false positive rate stabilized at 3%. Operations trusted the daily anomaly report enough to build it into their workflow. We caught refund fraud we'd never suspected, identified cloud kitchen locations with inventory discrepancies, and flagged customers exploiting referral bonuses through fake accounts. But the real win wasn't the fraud we caught, it was the conversations the algorithm enabled. "Why did this pattern emerge?" became a routine question that revealed operational issues, system bugs, and process gaps, alongside actual fraud.

Here's what I wish I'd known at the start: Anomaly detection isn't about building a perfect model. It's about building a learning system that combines algorithmic pattern recognition with human judgment. Your algorithm will never understand that the regional manager's bulk order before a team event is normal, or that a sudden spike in orders from a specific neighborhood might be a local festival. Your job isn't to teach it everything; it's to teach it enough that its questions become useful.

If you're considering anomaly detection, start with these questions: What does normal look like in your context — not theoretically, but with your data's seasonality, outliers, and quirks? Who'll respond to the alerts, and do they have the context to interpret them? What feedback loop will you create to improve the model over time? And most critically: Are you prepared to be wrong, often, before you're right?

The algorithm I work with today still makes mistakes. But so do I. The difference is, we're learning together.

The views and opinions expressed in this blog are those of the author and do not necessarily reflect the official policy or position of The Institute of Internal Auditors (The IIA). The IIA does not guarantee the accuracy or originality of the content, nor should it be considered professional advice or authoritative guidance. The content is provided for informational purposes only.

Majid Mumtaz, CIA, ACA, FCCA, GRCA

Majid Mumtaz is an internal audit director based in Dubai, United Arab Emirates.

Learn more with our other resources

Articles

Teaching an Algorithm to Filter False Flags

By using an iterative process and collaborating with his operations team on algorithmic anomaly detection, an auditor sees his false positive rate drop from 47% to 3%.

Majid Mumtaz, CIA, ACA, FCCA, GRCA

Learn more with our other resources

COSO Issues GenAI Guidance

How Personal Branding Benefits Female Leaders

The Internal Audit Charter as a Strategic Engine

Strengthening Administrative Reporting Relationships