<<

and general definitions and fraud/AML applications

Frankfurt, April 2019 Group Financial Crime Eric WAGNER Agenda

„difference between artificial intelligence and machine learning?“

1 Current progress? Different types of machine 2 learning 3

approach 4

examples 5 Erste Group applications “By far, the greatest danger of Artificial Intelligence is that and AML synergies people conclude too early that they understand it.” Eliezer Yudkowsky1 6 challenges “Nobody phrases it this way, but I think that artificial intelligence is almost a humanities discipline. It's really an attempt to understand human intelligence and human 7 cognition.” conclusio 2 Sebastian Thrun 8

1 US researche and author. Decision theory and longterm social and philosophical impacts of artificial intelligence 2 German computer scientist and robotic specialist. Former professor for artificial intelligence at Stanford University and vice president of . Unfortunately no common definition, but converging opinions and papers EXEMPLARY

General Artificial Intelligence Specific Artificial Intelligence „Humanity“ „nothing less than build a machine, a robot • empathy / mood • pass through its childhood, • opinion (having / reasoning) • learns languages like a child, • culture / faith / love • gains knowledge about the world by observing it with its own organs and „visual “ • ultimately contemplates about the whole • “human look” human knowledge and intellectual world” • Body movements and -functions • Mimic/gesture Joseph Weizenbaum, MIT AI Laboratory „interaction“ • perception (5+ senses) „Turing Test: a test person tries to identify if • intervention the unknown counterpart is human or a • context (spacial / logic) machine based on interactions [originally via keyboard and screen only] …„ „problem solving“ • recognize/deviate problem (situation) Alan Turing, 1950 • problem resolution (general and specific) • (Machine) Learning ca. 20 years to go for comprehensive artificial intelligence, but in specific areas „superhuman“ performance have already been achieved.

Estimation current status AI/ML No „average“ „Humanity“ performance human • empathy / mood • opinion (having / reasoning) • culture / faith / love

„visual nature“ • “human look” • Body movements and -functions • Mimic/gesture

„interaction“ • perception (5+ senses) • intervention • context (spacial / logic)

„problem solving“ • recognize/deviate problem (situation)

general specific • problem resolution (general and specific) • (Machine) Learning further focus In general Machine Learning can be separated in three categories …

Supervised Learning

regression (cont. output) clustering status optimization by rewards

classification (discrete output) outlier/anomaly detection Types

• labeled training data • no labeled training data • decision process • direct feedback • no feedback • delayed feedback via “reward • predict result • find the “hidden” structure system” • agent evaluates (new) status and initiates actions to further optimize

Characteristics own status

• recognition: handwriting, speech and • system diagnosis • autonomous cars, robots, elevators, pictures • security-/event detection etc. • recommendation: Spamfilter, Online- • analysis: social network, astronomy • (computer-)games (AlphaGo) ads, recommender systems • market segmentation • marketing strategy optimization • analysis: brainsignals, genes, share Examples prices, weather, … … which can be realized either by statistical machine learning methods (can be re-calculated) or neural networks (cannot be re-cacluated).

EXEMPLARY Unsupervised Learning

• Lineare/Ridge Regression1 • LVQ - Linear Vector Quantization (videocodecs: Apple Quicktime and audiocodecs: DTS, ) • K-Nearest Neighbor (regression/classification) • K-Means (fast, but heuristic for market segmentation, computer • Decision Trees2 (regression/classification) vision, geo statistics, agriculture)

• Logistic Regression3 • HBOS - Histogram Based Outlier Scoring (fraud, structural learning • SVM – Support Vector Machine (binary classification) defects) • Naive Bayes (document-/text classification) • One-Class SVM Statistical machine

(learning via „back propagation“) • NN (, RBM, ) Clustering • Deep NN (recurrent/LSTM, convolutional, • Kohonen SOM (Self-Organizing Map) • generative adversarial) • Neural Gas • DeepQ/Hierarchical Reinforcement NN neural networks

Reinforcement Learning

• rarely used an autonomous learning-method, • but as specific evaluation-algorithm of learning successes/-failures • Within supervised/unsupervised learning methods

1 avoidance of overfitting: recognize connections, separation of signal and noise incl. error estimation 2 Decision Trees, incl. Random Forest, Monte-Carlo Decision Tree etc. can be either regression (numerical output) or classification (discrete output) 3 binary problems and analysis of context probability with multiple features: market research, impact analysis Numerous successful Machine Learning implementations (realtime and faster/better than human) … (1/2)

Autonomous vehicles Picture recognition and description as well (NVIDIA DRIVE PX) as context queries

Object recognition and classification Object recognition and description

https://www.youtube.com/embed/0rc4RqYLtEU?end=125&fs= 0&modestbranding=1&showinfo=0&autoplay=1 Computer model generation of environment Natural language question analysis and -answering for object recognition incl. context/relation

https://www.youtube.com/embed/PjH_1hEoIDs?end=100&fs= 0&modestbranding=1&showinfo=0&autoplay=1 Numerous successful Machine Learning implementations (realtime and faster/better than human) … (2/2)

Autonomous learning of games and Medical Diagnostics simulation of intuitive moves

Autonomous learning of computer games (2014) via leads to Deep Q-Network (DQN) + Access to Care and Diagnostic Accuracy Reinforcement Learning

https://www.youtube.com/embed/n_- xKr3vF3M?fs=0&modestbranding=1&showinfo=0&autoplay=1

Simulation of „intuitive“ moves (AlphaGo 2016)

Monte Carlo Tree Search and for each Branch/Leaf Evaluation … 4 CNNs (3 Policy + 1 Value CNN)

IEEE Spectrum … and advanced Machine Learning based approaches for fraudulent use (1/5)

Spoofing / Disguising / Phishing everything

Spoofcard.com: SNAP_R | Black Hat | blackhat.com • Disguise Caller ID

• Disguise voice (sound as man SNAP_R (Social Network Automated Phishing with Reconnaissance) or woman) and add background AI came out in an experiment to get more users to click malicious links sounds than a human competitor by

• Call straight to voicemail • studied how Twitter users behave,

• Send spoof texts • then designed and implemented its own phishing bait.

https://www.spoofcard.com • The results of the experiment showed that the artificial hacker was able to compose and distribute more phishing tweets, and with a more substantial conversion rate.

Authentication Factor Description Key Vulnerabilities Security Key A compact device that Can be hacked/manipulated contains a secure IC chip Ownership which leverages public-key infrastructure Fingerprint Compares fingerprint on Usage of -Fingerprint Scanning record with new scans captured optically or electrically Vein Scanning Compares veins on record Can be „stolen“ and forged/faked with new scans captured Inherence optically

3D Facial Compares 3D characteristics 3D-print (mostly Android and Win-Hello, iOS less vulnerable) Recognition of a face on record with new scans captured optically 10 … and advanced Machine Learning based approaches for fraudulent use (2/5)

Fake Sound based on written Text: Tacotron 2 - … and even trained to sound like an existing human Generating Human-like Speech from Text (VoCo/Lyrebird.ai) https://google.github.io/tacotron/publications/tacotron2/index.html https://arxiv.org/abs/1711.10433https://arxiv.org/abs/1711.10433https://arxiv.org/abs/ 1711.10433

Generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts. Sequence of features (i.e. an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds) encoding an audio, capture not only

• pronunciation of words, but also

• various subtleties of human speech, including volume, speed and intonation.

Deepminds WaveNet (Parallel WaveNet: Fast High-Fidelity ), deployed already in Google Assistant

Fake Video based on Audio (in Realtime): Adapting Lip-Sync from Audio https://grail.cs.washington.edu/projects/AudioToObama/

Input: Given audio of President Barack Obama Output: Synthesize a photorealistic video of Obama speaking with accurate lip sync, composited into a target video clip. Training: Trained on many hours of his weekly address footage,

• a learns the mapping from raw audio features to mouth shapes.

• Given the mouth shape at each time instant, high quality mouth texture was synthesized and

• composite with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. https://www.youtube.com/embed/9Yq67CjDqvw?start=45&e nd=62&autoplay=1 11 … and advanced Machine Learning based approaches for fraudulent use (3/5)

Fake Video based on swapped face (NOT Realtime): (IN Realtime): FaceSwap https://www.fakeapp.org/ various implementations, but not fully convincing ... yet

https://hackernoon.com/exploring-deepfakes- 20c9947c22d9 http://faceswaplive.com/ https://github.com/arturoc/FaceSubstit ution https://vimeo.com/29348533

Fake Video based on target picture (in Realtime): Face2Face Re-Enactment in RealTime http://niessnerlab.org/projects/thies2018headon.html

Simulate another person's speech, face, facial movements/expressions, and mannerisms in real time.

With technology like this widely available can any media source be trusted at 'face value' anymore? -- Especially with such https://www.youtube.com/embed/7Dg49wv2c_g?start=10&end=40&autoplay=1 large databases of facial and bio- metric features ... 12 … and advanced Machine Learning based approaches for fraudulent use (4/5)

Image and (realtime) Video manipulation has become boundless …

https://www.youtube.com/watch?v=G06dEcZ-QTg https://www.kaggle.com/ewagner/tl-gan-demo-501dc6/edit

https://github.com/JoYoungjoo/SC-FEGAN https://arxiv.org/abs/1902.06838 13 … and advanced Machine Learning based approaches for fraudulent use (5/5)

Fake Video ID-Card identification (in Realtime): Realtime Video Manipulation

German BSI created fake ID with standard photo editing software on standard office computer and printed on office printer.

Then with standard graphic card and camera the video capture stream can be captured and manipulated in realtime to show Hologram security features including disturbance with finger

Create (photorealistic) Avatar in realtime (later with with realistic speech synthesized based on Text2Speech)

Meet Digital Mike Seymour Rachel / Soul Machines Siren (Collaborative Experiment) (+IBM for Emotion Detection) Vicon / Epic Games

https://www.roadtovr.com/siggraph-2017-meetmike-sets- https://www.soulmachines.com/ http://www.cgrecord.net/2018/03/real-time-digital- impressive-new-bar-for-real-time-virtual-human-visuals-in- character-siren-using.html vr/ Consortium includes Epic Games, The Wikihuman Project, 14 3Lateral, Cubic Motion, Tencent, Disney Research Zurich and Loom.AI When dealing with remote KYC currently, the barriers for Identity manipulation are even lower, even full automation via photorealistic Avatars and Google Duplex possible – mitigation measures are necessary

Synthetic Identity Identity Takeover Major issues for successful mitigations ID • Buy ID, driver license • Buy ID, driver license or passport • Limited NFC chip usage Exemplary Document or passport in Darknet in Darknet • Limited access to public authority databases • Print your own ID • Print your own ID based (only Slovakia, Bulgaria, Portugal and Norway) Fingerprint • Use own fingerprints • Take photo or fingerprints from • No real solution for clear identification or master-fingerprint touched objects or use master- available on consumer devices, i.e. separate fingerprint devices necessary • Use own fingerprints on own smartphone Facial • Use any picture from • Take any picture/video-still from • Insufficient TP/TN ratios in comparison with ID Recognition real person or real person picture generate a synthetic • Limited Liveness tests/checks one Video/Voice • Generate synthetic • Take a a voice sample and use to • Insufficient TP/TN ratios in comparison with voice speech2text and text2speech for voice sample • Use voice sample and authentication manipulate Video/Face+ • See facial recognition • Take a picture and use motion • Insufficient TP/TN ratios in comparison with ID Body capture or face re-enactment to picture / selfie generate manipulated live stream • Insufficient TP/TN ratios for manipulation detection

In order to improve remote KYC, • the European Commission introduced 3 Expert Groups (Financial Services, Healthcare and Education) • to define necessary enhancements for the eIDAS standard and

• to avoid insecure biometric id systems (i.e. Synthetic Identities and Identity Takeovers) 15 Overview on eIDAS/remote KYC components which shall be defined until Q3/2019 to combat Financial Crime …

Use Cases for Financial Document Types, Services, i.e. Account i.e. Type 1-4 documents, Opening/Payment, Lending, including the respective Investment “freshness”

Which attributes are Which document required (minimum) types are eligible for Obligors, i.e. for each Use Case which LoA Technical & Private Individuals, Levels of Assurance, Functional Legal Entities and KYC Attributes i.e. Low, Substantial and Requirements, incl. Connectors pi2pi + Which How is each level High What are the authentication minimum le2pi obligors are of assurance for mechanisms each KYC technical and considered Which verifiers attribute defined functional have which level requirements to What is the KYC of assurance for current/future extent of the each KYC solutions eIDAS/rKYC attribute framework

Full KYC, i.e. Customer Verifiers, i.e. Obligor, Identification and RITP and Issuer Customer Due Diligence What are the (standard and liabilities for each enhanced) verifier and obligor in which LoA Liability Framework

16 ... which should be accompanied with further approaches to detect fraud. (1/4)

payments: Machine Learning: • Identification of fraudulent • HBOS (Histogram Based Outlier Scoring) transactions in realtime • Random Forest/Genetic Algorithms (optimization of weights)

Dimensions (Features)

• Definition of user specific transaction Histogram Bin Ranges profiles baed on 30+ connection-, detection payments- and derived attributes Recombination

Outlier/Anomaly (=features) • Deviations are scored based on the distance towards each profile and final score calculated based on weighted sum Selection Mutation

customer behavior: Machine Learning: • Identification of lies (visual) • Convolutional Neural Network • Feature Bagging • Computer Vision + 3D Face Reconstruction analysis Face recognition and „Pinocchio effect“ „micro expressions“ ... which should be accompanied with further approaches to detect fraud. (2/4)

Customer behavior: Machine Learning: currently insufficient Machine • Identification of lies (acoustic) • neural networks (LTSM, Naive Learning: Bayes, etc.) • Computer linguistics (semantic • voice stress analysis (micro tremor) analysis, dialog- and discourse • Computer linguistics (speech analysis)1 recognition, tokenization, morphologic analysis, syntactic

analysis analysis)

• heavy breathing (fast heartbeat/blood flow), • higher pitch (tightened vocal cords)

Speech recognition and • frequent hawking (salivation decreases) • repetition of words or phrases

Customer behavior2 : Machine Learning: • Detection of fraudulent or disappointed/angry • sentiment analysis, Senti Circle, k-Means etc. employees/customers via Email- /Communication analysis. • Search for specific and/or negative keywords, f.ex. Related to “fraud triangle” Geometry Density Dispersion

(pressure, opportunity, rationalization) like SentiCircle‘s Feature “ignored”, “exhausted”, “reckless”, “trouble”, Vectors sentiments “set aside”, “biased”, … K-means / SVM / …

Semantic Sentiment Patterns Detection of feelings and 1 Examples • Pre-/overanswering of questions (being prepared) • detailed deviations in repetitive stories • language (pauses, short answers, monotonous voice, avoidance of pronouns (“I”). generalizations and exaggerations, negated repetition of questions) • distancing expression ("the car" instead of "my car") • Rejecting whole statements if only one detail is wrong, admitting only minor (criminal) acts 2 Source ACFE ... which should be accompanied with further approaches to detect fraud. (3/4)

transaction/behavior: Machine Learning: • Identification of fraudulent behavior • SOM, PCA, t-SCN

fraudulent trading credit fraud • analysis of FDAX trading data • analysis about loan portfolio • visualization: • presentation: (Cluster/Outlier) • size (=number) and proximity of cubes • size (=number) and proximity of cubes (=similarity) (=similarity) • gradient (characteristics of learned • gradient (characteristic of the learned parameters, such as number of trades, parameters - all important customer and number of contracts, min-size, max-size, credit data) avg-size, high-price, low-price and value- • highlighted yellow area: High lending volumes weighted avg-price) with long maturities to clients with bad credit • highlighedrked red area – „hidden iceberg“: history, no money in the account, no collateral order at short intervals and small denominations and guarantor (even in one branch and by the

visualizations visualizations of conspicuities branch manager)

source: Fraud Management In Kreditinstituten, 2013 ... which should be accompanied with further approaches to detect fraud. (4/4)

DATA FRAUD SCAM BREACHES Perpetrator initiates trx. Customer is tricked into trx. (increasing) (decreasing) (increasing) Enhanced Machine Learning to detect Human Data Breaches • Less Hacking • BEC – Spear Phishing • simple, single SPAM/Phishing classifications, (Business) with lateral • More internal Fraud • Email Interception potentially enhanced with Rules or (better) generate movements synthetic training cases Human Trend towards mobile: • Move from Phishing to between • but also Business Context and Plausibility during (Private) • Phishing to Vishing (f.ex. Microsoft Business / Private (email-) conversations via download App Support Scam) via credential • enhanced NER and sentiment analysis stuffing for ATO malware • as well as conversation/transaction status flow • SS7-Protocol flaw (as humans tend combined with internal system usage (log-files) remains to re-use email- address and Introduce ML for (un)encrypted network traffic password) surveillance Authen- • SCA is not temper proof (neither software1- nor hardware-based) tication • Customers are not aware of upcoming need (PSD2) / don’t see necessity3 Transpor- • Unencrypted MFA still in use tation • TLS 1.3 ,GDPR and PSD2/QWACs enforce end2end encryption

Introduce ML which can distinguish between Fraud and Scam

1 Even Google Authenticator App or Erste sID App can be misused as the hacker gets the Code from the victim via phone/malware 2 UK banks with Non-SCA: Co-operative Bank, Clydesdale and Yorkshire Bank, Lloyds Bank (and sisters Bank of Scotland and Halifax), Metro Bank, NatWest and RBS, Santander and TSB, Source: Finextra 31.1.2019 3 20-35% of customers in major EU member states are not aware of upcoming need for more authentaction during transactions (PSD2) and 50-75% think there are already enough or too many security checks, Source: FICO Blog Erste Group takes first steps away from rule-based systems towards statistical Machine Learning

what was implemented? how is success being measured? „Confusion Matrix“ payment transaction fraud IS Fraud (P) NO Fraud (N) • status quo: simple rule-based system, mainly based on amount thresholds, which are manually investigated Fraud is # TruePositives # False Positives predicted (correct) (False alarms) • since 11/2017: • HBOS – Histogram Based Outlier Scoring No Fraud is # FalseNegatives # TrueNegatives (unsupervised, univariate, statistical ML) predicted (overseen) (correct) • parameter weight optimization via genetic algorithm „ROC“-Kurve • since Q1/2018: further ML optimizations, also multivariate, unsupervised approaches (also together (Receiver-Operating-Characteristic) with FinTechs) (AUC=0.944)

Other (non-fraud) implementations • ChatBots1 • Predictive Maintenance: IT system failure prediction • Predictive Analytics: predict customer needs “ ~78% of all frauds can be found at costs of ~9% false alarms preventing ~99% of potential financial losses” True Positive Rate (TPR) Rate Positive True

1 NLP (natural language processing) and neural networks False Positive Rate (FPR) Since AML is mostly a follow-up to Fraud, there are potential synergies with ML / Fraud Detection

Possible synergy effect 1 - Replacement of rule-based abnormalities in AML by ML approach Initial analysis has shown that • Those AML rules that target behavioral problems (up to 75% of all rules) could be replaced by ML approaches (such as HBOS) and • the alarms resulting from these rules (up to 75% of all AML alarms) could be reduced by up to 50% (avoidance of alarm repetitions, inadequate whitelisting, etc.)

Ex-Ante fraud detection (ML) Ex-Post AML Monitoring (Rule-Based) calculation of fraud probability relativization / avoidance of rules that aim at unusual Fraud (Score) Score behavior

Possible synergy effect 2 - Optimization and addition of rule-based systems through ML approaches Further optimization approaches by ML methods are • Identification of optimized rules for control system through multivariate clustering approaches • Further clustering of the alarms generated by the control system and analysis / execution of the clusters and no longer of individual alarms clustering of data AML Monitoring clustering of alerts expected results: derivation of optimized rules based on previously and • aggregation of several thousand alarms up to - 60% alarm reduction based on analysis of data continuously optimized on approximately 100+ clusters up to +40% TruePositives clusters rules • investigation logic based on analysis of up to - 50% cost alarm clusters

Klassische, regelbasierte AML Systeme • Severity – High • Action – Investigate within the day PEP 21% Offshore 15% AmountThrshld 8% Norkom Abnormal … Activity • Severity – Low • Action – Send to monthly random with Tax Haven sampling Actimize Countries Optimierte Generierte Country 17% AML Regeln AML Alarme AvgDailyAmt 10% Tonbeller New Customers Beneficiary 5% Deposit Activity … … Conclusio: In order to have a realistic chance to substantially fight Financial Crime, all risk/mitigation vectors have to be addressed and continuously enhanced (1/2)

Impediment Impediment Details Proposed Mitigation Area Know-How • Too many lawyers and insufficient • Encouragement from EU Commission to Legacy IT/DataScientists nn both sides: NCAs and both NCAs and Obliged Entities to start Obliged Entities building up respective know-how in Financial Crime area • Obliged Entities/NCAs building know-how and experience in Data and AI/ML by prototypes/ pilots (fraud, CRM, etc.) and sandboxes Culture Legacy • 10-15 year old rule-based thinking • Encouragement from EU Commission (similar to DNB in 2017) to declare • Obliged Entities fear local NCAs who do not Innovative Technologies/Approaches as best comprehend new approaches and demand practice in Financial Crime (Transaction numerous alerts as „good measure“ of AML Monitoring, etc.) appropriateness (quantity over quality) • Obliged Entities/NCAs start dialogue to build mutual trust and define joint approaches Legal Legacy • Banking Secrecy • Enhance AMLD (5th or 6th) to specifically regulate more, controlled Personal Data • GDPR access and exchange with Public Authorities • NCA interpretations of AMLDs and Financial Services to fight Financial Crime (even w/o explicit customer consent)1) • Limited AMLD within banking group

1) similar to 4th AMLD, but enhancement for exchange between banking groups and between banking group and public authorities Conclusio: In order to have a realistic chance to substantially fight Financial Crime, all risk/mitigation vectors have to be addressed and continuously enhanced (2/2)

Impediment Area Impediment Details Proposed Mitigation Decentralization • Financial Crime acts cross-border and • Evaluate EU centralized AML/Sanction Utility, cross-institution, whereas Anti-Financial which gathers all necessary static/transaction data Crime cooperation limited to law and apply advanced analytics to detect Financial enforcement (Interpol, Europol) and (now Crime beginning) NCAs Heterogenous • 200+ authentication solution providers in • Enhance EU „eIDAS 2.0“ as mandatory Identity Europe standard for Private Sector (Financial Management Institutions, Health, Education, etc.) covering full • Non national standards in Identity KYC (CI, CDD/EDD) and mitigating human and attributes and Level of Assurance document authentication risks in a temper- • eIDAS currently focused on (differing) resistant, privacy by design approach (self- national govern services with handful of sovereign identity) attributes which challenges are the most relevant in practice?

„Over-/Underfitting“ proper scope of training data and procedures, so that, for example, a neural network correctly classifies / computes not only the trained patterns, but also new data. difficult identification of suitable recognize which (mostly) combination of Statistical Machine Learning Method and / ML method sets or Neural Networks is the most appropriate for a given problem (Data Science)

problem of dimensionality computing / learning volume increases exponentially with number of features (attributes) → dimensionality reduction can lead to relevant knowledge / information losses. usability and availability of training finding the "right" training data (IBM Healthcare needed to ask nurses to collect data additional data to get a usable training set)

dominance of supervised, supervised, discriminating learning (mostly classifier) can be strongly parallelized distinctive learning (eg via GPUs of gaming PCs) unsupervised, generative learning (key for general Artificial Intelligence) on, for example, probabilistic, graphical models are difficult to train because they can not be parallelized „Overselling“ of successes spectacular / "superhuman" successes are usually based on a lot of "special tuning" of the training data and ML structures availability of Data Science / ML insufficient know-how / budget for sustainable and synergetic use. Regulator/ resources and regulator/ management mostly not familiar with basics and possibilities and are therefore management acceptance critical towards further developments / optimizations International successful implementations show that the development of special know-how and infrastructures becomes mandatory (also in the fraud area).

In 20 years:

l networked computers organize people's lives and take a number of decisions from them.

l machines will even be able to speak human language.

“Artificial intelligence will reach human levels by around 2029. Follow that out further to, say, 2045, we will have multiplied the intelligence, the human biological machine intelligence of our civilization a billion-fold.” Ray Kurzweil1

recommended next steps: next level AI company l build of Data Science and AI/ML know-how (techcompany + neuralnetwork ≠ ai company): l start of prototypes / pilots in non-regulatory l strategic data acquisition environment (fraud, CRM, etc.) l unified datalake l parallel start of a dialogue with the regulators l pervasive automation and management in order to gradually build 2 l new job descriptions understanding for AI / ML

1 2 US inventor, futurist and author. Head of technical development at Google. ai product managers: not describing traditional products, but providing data samples and machine learning objectives to engineers