top of page

Anomaly Detection

Goldman Sachs, 2018

Risk managing an investment bank involves stressing every position in its books to hundreds of thousands of hypothetical market scenarios and picking the worst case. Working so far in the tail of the distribution needs extreme scrutiny as there is little to no room for error.

The Anomaly Detection system enables users to run financial, mathematical, existence, and other assertions on a dataset to determine the validity of each generated data point. This validated dataset is then good to use for Risk Management.

Stakeholders

Risk Management

~100 people

Responsible for validating and interpreting data to make risk management decisions.

Risk Technology

~30 people

Responsible for creating products and frameworks to support risk management functions.

Core Team

5 people

Consisted of 3 risk managers & 2 technologists.

Data viz.png

My Role

Product lead - Risk management

Data modeling

Functional Design and Prototyping

User acceptance testing

User needs

Data

  • Ability to handle large datasets often in tens of GB

  • Ability to store and use parameterized queries to fetch new versions of data

  • Ability to use new datasets in a plug-and-play experience.

  • Ability to schedule data gathering to run on the firm's servers

Review Tools

  • Ability to run any rule from a rule repository on any compatible dataset

  • Ability to add rules to the Repository with minimum code

  • Dynamic viewers to analyze results without throttling local user systems with features such as aggregation and lazy load to facilitate analysis of results

Orchestration

  • Ability to determine and trigger next steps on anomaly detection – handle, intervene, report

  • Prioritization of tasks in the queue based on different levels of urgency

  • Ability to manually edit the task queue

Governance

  • Maintaining an audit trail throughout the process

  • Data versioning

  • Enforce sign off by appropriate designatory

  • Summary view for divisional management

Solution

Because the product had to be future-ready and scalable, we designed the system in a layered architecture. This choice allowed us to build data pipelines separately from the business logic. Furthermore, once the communication protocols between layers were defined, we could add and edit layers to incorporate complex processes.

Connections Repository

Directory of authorized connections to various systems in the firm.

Data Layer

Contained parametrized queries to various systems to gather data required.

Transform Layer

Transforms were defined to ensure that the data is conducive to further processing.

Rule Repository

An open-source repository of all the rules that are written in the system. These rules were available to every user to reduce duplication of effort.

Rule Engine

The heart of the system, which picks up the rule to run on a dataset, interprets the result, and passes the report to the orchestrator.

Orchestrator

A trigger framework based on conditional constructs with connections to each layer.

Data representation

An interactive datacube with the ability to view, aggregate, pivot, filter, and graph data.

UI

Given that the primary users of this system were a business team, we designed and implemented visual interfaces to create data connections, rules, transforms, and orchestrations, in addition to the ability to push them as code to production systems.

Summarization and Reporting were forked into a different project given their widespread application outside of the realm of anomaly detection.

Impact

2 weeks

We reduced the time to regulatory deliverables from 6 weeks to 2 weeks using this product.

⬇ effort

New regulatory requirements which typically required dedicated processing and resourcing were now onboarded with minimal effort.

🔎 👾

Using a system-driven approach allowed us to find previously unknown bugs in both models and infrastructure. Fixing these bugs not only improved the accuracy of our work but also reduced the capital charge paid by the firm.

📉 Risk

By having a software system verify data, we eliminated operation risk owing to humans having to scrutinize millions of data points on spreadsheets.
Over a hundred different spreadsheets were deprecated when processes were memorialized in code.

Learnings

  • Maintaining the balance between speed, memory, and cost in large systems.

  • The power of modular and open source systems which offer exponential growth in intelligence.

  • Acknowledging the gap between technical users and system developers.

learnings.jpeg

Quirky facts

  • Outlier detection, which was part of the rules repository when the system was released, was the first time Machine Learning algorithms were implemented in the Risk Division

  • The first regulatory submission made using this system handled over 70GB of data.

  • The proprietary language used to build the system was single-threaded. We wrote code to simulate simple multi-threading to enable prioritization to enable timely release.

bottom of page