Semantic Analytics
Oct 2016 – NorthField Asia Research Seminar, Sydney
Dr Alex Johnston, Director of Client Technology
THOMSON REUTERS: WHO ARE WE? Thomson Reuters is the world’s leading source of intelligent information for businesses and professionals. We are powered by the world’s most trusted news organization.
Recently Sold
F&R has now enjoyed 7 consecutive quarters of positive growth driven by investments in content, service, new platform. Achieved 30% EBITDA margin target in 4Q15, an improvement of more than 400 basis points since 2013. FINANCIAL & RISK: WHAT WE DO
Driving Performance We serve more than 40,000 customers and 400,000 end-users in more than 150 countries:
• 2 million news stories per year • 5,000+ investment firms and hedge funds supported world-wide Enabling Connectivity • $250 billion in bond trading supported daily • $420 billion+ in FX trading per day • 40,000+ regulatory alerts supplied to the world’s banks per year • 2 million+ individuals and entities that can pose a potential risk Managing Risk & Regulation to the international business community are tracked daily • 11 million+ Messaging interactions daily • 2.5 million~ price updates distributed per second to the financial markets OUR CUSTOMERS VOICE: FINDING INFORMATION IS PAIN
“I spend over 10% of my time on Google Activity Pain looking for information that others may not have.” Analyst, Large UK Hedge fund Information overload 1. Increasingly difficulty to keep up with “Data points get lost in translation. Data – 20% available information to develop better Amazon analyst puts it in, but Best-Buy Assemble insight or ideas analyst does not get it” Director, 2. Cannot effectively surface company $4B US L/S equity fund management views and intent through Data – 15% traditional methods/sources Synthesize “They all still do it manually” Understanding relationships External 10% Equities, Market data team $20B+ hedge fund 3. Inability to better understand the Meetings relationships of a company with “I spend a lot of my time reading research Data – 20% customers and suppliers reports, gathering economic data, getting Interpret 4. Inability to predict events or track international company filings ” catalysts that impact companies and industries Senior Analyst – $8B US Value Fund Financial 15% Integrating information Modeling “Fundamental guys are still working with 5. Cannot link and integrate internal rudimentary tools compared to quants” research/data to external research and Head, Large US Broker Principal Investing Communicate 20% data sources Findings 6. Unable to improve insights into a company “The problem has gotten worse with or industry by mining new sources of more data and more information” unstructured data Director, Research/Tech, $10B Multi-strat Source: Customer meetings, TR internal analyst survey THOMSON REUTERS CONTENT COVERAGE NEWS & COMMENTARY REFERENCE DATA SPECIALIZED DATA RISK & COMPLIANCE INTELLECTUAL . Commentary . Index Constituents and Weightings . Commodities Fundamentals . Know Your Client (KYC) PROPERTY . Global and Domestic News . Industry Classifications . Deals & Transactions . Operational Risk Management . Intellectual Property . Newsletters . Security Identifiers Intelligence . Regulatory Risk Management . Copyrights . Significant Developments . Terms and Conditions . Mutual Fund Data (Lipper) LEGAL DISPUTES . Patents/Applications . Video . Trademarks . Quantitative Analytics and . Commodities Research & Forecasts . Arbitration COMPANY DATA Models . Administrative Case Law VALUE CHAIN DATA . MACRO-ECONOMIC . Broker Research . Private Equity Data . Jury Verdicts . Suppliers . Business Classifications DATA . Tax Case Law . Distributors . Credit (CDS) RISK & REGULATORY . Court Dockets . Network of relationships . Company News . Type, relevance and . Country Data . Official California Code of . Court Filings . Competitors Regulations characteristics of . Economic Indicators and Polls . Corporate Actions LAWS & REGULATIONS relationships . Industrial Activity . KYC Org ID . Debt & Syndicated Loans . Bills (Legislation) . People Screening . Entity Risk (Corporate Structures) . Regulatory Intelligence . Court Rules MARKET DATA & PRICING . ESG Data (Ranking and Ratings) . Risk Screening . Financial Regulations . Equities . Estimates . Science Regulations . Commodities & Energy . Events & Transcripts . Tax Regulations SCIENTIFIC DATA . Derivatives & Options . Fundamentals . Statutes . Fixed Income . M&A . Biomarkers . Treaties Authority . Foreign Exchange . Officers & Directors . Chemistry . FX and Interest Rate Polls . Ownership & Bond Holdings . Clinical Trials . Futures . Private Company Data . Disease Reports . Global Aggregates . Shareholder Activism Intelligence . Drug Experimental Results . Indexes and Benchmarks . StarMine® Scores . Drug Reports •People . Loan Pricing . Transactions . Drugs / Compounds . Valuation . Genomics •Organization . Zoological Records •...
TRIPLE (subject, predicate, object): Google, is succeeded by, Alphabet SEMANTIC CONCEPTS Example Entities
Account, Acquisition, Anniversary, Asset, Business Activity, City, Company, Continent, CorporateAction, Country, Currency, Document, Event, Editor, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Film, Fund, Industry, Holiday, IndustryTerm, Instrument, Journalist, LipperClassification, Location, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalFeature, OperatingSystem, Organization, Person, Pharmaceutical Drug, PhoneNumber, PoliticalEvent, Position, Product, Project, ProgrammingLanguage, ProvinceOrState, PublishedMedium, Quote, RadioProgram, RadioStation, Region, Sentiment, SportsEvent, SportsGame, SportsLeague, Technology, Transaction, TVShow, TVStation, URL , WorldCheck Example Relationships
Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, ArmedAttack, ArmsPurchaseSale, Arrest, Bankruptcy, BonusSharesIssuance, BusinessRelation, Buybacks, CandidatePosition, CompanyAccountingChange, CompanyAffiliates, CompanyCompetitor, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyEmployeesNumber, CompanyExpansion, CompanyForceMajeure, CompanyFounded, CompanyInvestigation, CompanyInvestment, CompanyLaborIssues, CompanyLayoffs, CompanyLegalIssues, CompanyListingChange, CompanyLocation, CompanyMeeting, CompanyNameChange, CompanyProduct, CompanyReorganization, CompanyRestatement, CompanyTechnology, CompanyTicker, CompanyUsingProduct, ConferenceCall, ContactDetails, Conviction, CreditRating, Deal, DebtFinancing, DelayedFiling, DiplomaticRelations, Dividend, EmploymentChange, EmploymentRelation, EnvironmentalIssue, EquityFinancing, Extinction, FamilyRelation, FDAPhase, IndicesChanges, Indictment, IPO, JointVenture, ManMadeDisaster, Merger, MilitaryAction, MovieRelease, MusicAlbumRelease, NaturalDisaster, PatentFiling, PatentIssuance, PersonAttributes, PersonCareer, PersonCommunication, PersonEducation, PersonEmailAddress, PersonLocation, PersonParty, PersonRelation, PersonTravel, PoliticalEndorsement, PoliticalRelationship, PollsResult, ProductIssues, ProductRecall, ProductRelease, Quotation, SecondaryIssuance, StockSplit, Trial, VotingResult BIG OPEN LINKED DATA – RELATIONSHIPS & SEMANTICS TR Framework TR Capability Issues/Benefits
Stitching ‘The • 4.ExploreInability toRelationships predict events or track Analytics catalysts that impact companies and Graph’ • Analyseindustries Risk Impacts
Intelligent • 6.ExtractUnable toConcepts improve insights with intoAI a Tag company or industry by mining new Tagging • Tagsources & Locate of unstructured Semantically data
PermID & 5. Cannot link and integrate internal • Reduce Technology Cost Identify Semantic research/data to external research • Correlateand data sources Data easier Web* INFORMATION OVERLOAD: SEARCH IS BROKEN
According to a recent study by IDC, “The High Cost of Not Finding Information,” the average knowledge worker spends up to 2.5 hours per day searching for or gathering information or data. This includes searches, email queries and other related tasks that all result in a massive amount of time spent trying to find information that already exists. This equates to approximately 400 or so hours per employee, per year searching or gathering information. Using these numbers, we can calculate that a firm such as Goldman Sachs, with approximately 32,000 employees, earning on average $105,000/employee, would be spending approximately $646 million per year on enterprise search.
What is the relationship between Bill Gates and Warren Buffett? LINKED DATA: DISSECT RELATIONSHIPS
Graph finds the signal in the noise
Source: • TR DataFusion • TR DataLake • 4 Steps (30%) THE GRAPH – DISSECTING RISKS
Event (News) Impact on Risk (Slavery) in the Supply Portfolio Chain
Songlka > Wal Mart
entities
Danaher > Volkswagen
Distance (bacon number) between entities and types of relationship (supplier, parent company, location…) brings meaning and insight to information THE GRAPH – EXPLORING INVESTMENT IDEAS Stemcells and Microbot (Pvt company) reveal reverse M&A - Liquidity Events merger
Historical Officer Desmond O’Connell may have liquidity event: of interest to Pvt Banking
Desmond also has investments in Abiomed & Serologicals
THE GRAPH – EXPLORING INVESTMENT IDEAS Research indicates Infineon disfavors acquisition of Investment Exploration International Rectifier - may shut down operation, affecting Intel who it supplies
Infineon also supplies Microsoft who just announced Layoffs
NXP Semiconductor is an alternative investment to Intel, without the same supply issues
CONNECTING GRAPHS AND CONTENT
Relationship Managers Investment Advisors Risk Managers Public Relations
Thomson Reuters Graph Panama Papers
Internal Analytics
Thematic Investing
8 billion relationships
SCORING GRAPH: QUANTIFYING RELATIONSHIPS NEWS IMPACT ON PORTFOLIO Powered by THOMSON REUTERS LABS
Relationship Weight
Tier 1 supplier 0.80
Subsidiary 0.90 Portfolio
Competitor 0.60 )
Company i l Customer 0.20 …
Tier 2 supplier 0.50
Length ( Length Subsidiary of T1 0.75 News / Path supplier Risk 0.8 Path (i) Customer of T1 0.10 Event supplier Supplier of 0.15 competitor
Algorithms traverse edges to find most relevant paths per use case. Path strength is a new metric 16 What Does it mean? Tools that combine Relationship & Algorithm Variables and link content Pattern & Metrics Identification
• Use PermID, ontology & tools to • Visually traverse a semantic web of • New Variable Types ? merge discrete datasets concepts to identify relationships News Impact e.g. news event with price between concepts (companies, Supply Chain Risk industries, people, prices) Officer Risk • Symbology and tooling support Publicity Exposure/Impact merging structured and unstructured • Better: Use AI to crawl the semantic content from any source web and uncover relationships • New metrics? Semantic Distance • Graph algorithms can identify ‘The Bacon Number’ relationships for other algorithms to Impact / Penetration verify
USE CASES ARE BROADBASED AND DIVERSE
Graph tech is rapidly moving into professional scenarios • “25% of enterprises will use graph db by 2017” - Forrester • “Graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions.” – Gartner
“Don’t just give me what I asked for – tell me what I need to know.”
20