Anomaly Detection
Goldman Sachs, 2018
Risk managing an investment bank involves stressing every position in its books to hundreds of thousands of hypothetical market scenarios and picking the worst case. Working so far in the tail of the distribution needs extreme scrutiny as there is little to no room for error.
The Anomaly Detection system enables users to run financial, mathematical, existence, and other assertions on a dataset to determine the validity of each generated data point. This validated dataset is then good to use for Risk Management.
Stakeholders
Risk Management
~100 people
Responsible for validating and interpreting data to make risk management decisions.
Risk Technology
~30 people
Responsible for creating products and frameworks to support risk management functions.
Core Team
5 people
Consisted of 3 risk managers & 2Â technologists.
My Role
Product lead -Â Risk management
Data modeling
Functional Design and Prototyping
User acceptance testing
User needs
Data
Ability to handle large datasets often in tens of GB
Ability to store and use parameterized queries to fetch new versions of data
Ability to use new datasets in a plug-and-play experience.
Ability to schedule data gathering to run on the firm's servers
Review Tools
Ability to run any rule from a rule repository on any compatible dataset
Ability to add rules to the Repository with minimum code
Dynamic viewers to analyze results without throttling local user systems with features such as aggregation and lazy load to facilitate analysis of results
Orchestration
Ability to determine and trigger next steps on anomaly detection – handle, intervene, report
Prioritization of tasks in the queue based on different levels of urgency
Ability to manually edit the task queue
Governance
Maintaining an audit trail throughout the process
Data versioning
Enforce sign off by appropriate designatory
Summary view for divisional management
Solution
Because the product had to be future-ready and scalable, we designed the system in a layered architecture. This choice allowed us to build data pipelines separately from the business logic. Furthermore, once the communication protocols between layers were defined, we could add and edit layers to incorporate complex processes.
Connections Repository
Directory of authorized connections to various systems in the firm.
Data Layer
Contained parametrized queries to various systems to gather data required.
Transform Layer
Transforms were defined to ensure that the data is conducive to further processing.
Rule Repository
An open-source repository of all the rules that are written in the system. These rules were available to every user to reduce duplication of effort.
Rule Engine
The heart of the system, which picks up the rule to run on a dataset, interprets the result, and passes the report to the orchestrator.
Orchestrator
A trigger framework based on conditional constructs with connections to each layer.
Data representation
An interactive datacube with the ability to view, aggregate, pivot, filter, and graph data.
UI
Given that the primary users of this system were a business team, we designed and implemented visual interfaces to create data connections, rules, transforms, and orchestrations, in addition to the ability to push them as code to production systems.
Summarization and Reporting were forked into a different project given their widespread application outside of the realm of anomaly detection.
Impact
2 weeks
We reduced the time to regulatory deliverables from 6 weeks to 2 weeks using this product.
⬇ effort
New regulatory requirements which typically required dedicated processing and resourcing were now onboarded with minimal effort.
🔎 👾
Using a system-driven approach allowed us to find previously unknown bugs in both models and infrastructure. Fixing these bugs not only improved the accuracy of our work but also reduced the capital charge paid by the firm.
📉 Risk
By having a software system verify data, we eliminated operation risk owing to humans having to scrutinize millions of data points on spreadsheets.
Over a hundred different spreadsheets were deprecated when processes were memorialized in code.
Learnings
Maintaining the balance between speed, memory, and cost in large systems.
The power of modular and open source systems which offer exponential growth in intelligence.
Acknowledging the gap between technical users and system developers.
Quirky facts
Outlier detection, which was part of the rules repository when the system was released, was the first time Machine Learning algorithms were implemented in the Risk Division
The first regulatory submission made using this system handled over 70GB of data.
The proprietary language used to build the system was single-threaded. We wrote code to simulate simple multi-threading to enable prioritization to enable timely release.