Dependency Management 2.0 – a Semantic Web Enabled Approach

Total Page:16

File Type:pdf, Size:1020Kb

Dependency Management 2.0 – a Semantic Web Enabled Approach Dependency Management 2.0 – A Semantic Web Enabled Approach Ellis Emmanuel Eghan A Thesis In the Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Computer Science) at Concordia University Montreal, Quebec, Canada July 2019 © Ellis Emmanuel Eghan, 2019 CONCORDIA UNIVERSITY SCHOOL OF GRADUATE STUDIES This is to certify that the thesis prepared By: Ellis Emmanuel Eghan Entitled: Dependency Management 2.0 – A Semantic Web Enabled Approach and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: _______________________________________________________ Chair Dr. Govind Gopakumar _______________________________________________________ External Examiner Dr. Giuliano Antoniol _______________________________________________________ Examiner Dr. Ferhat Khendek _______________________________________________________ Examiner Dr. Dhrubajyoti Goswami _______________________________________________________ Examiner Dr. Nikolaos Tsantalis _______________________________________________________ Supervisor Dr. Juergen Rilling Approved by ________________________________________________________________ Dr. Leila Kosseim, Graduate Program Director September 3, 2019 ___________________________________________________ Dr. Amir Asif, Dean Gina Cody School of Engineering and Computer Science Abstract Dependency Management 2.0 – A Semantic Web Enabled Approach Ellis Emmanuel Eghan, Ph.D. Concordia University, 2019 Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework iii (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis. iv To Betty, Michael, and Isabel v ACKNOWLEDGEMENTS First and foremost, all praises to God for blessing, protecting, and guiding me throughout my studies. I could never have accomplished this without my faith. A heartfelt thanks goes to my supervisor, Dr. Juergen Rilling, for having given me such an opportunity. Dr. Rilling did more than just advising: his continuous support, patience, motivation, and immense knowledge pushed me beyond my boundaries and shaped me, both as a person and a researcher. Throughout this journey, Dr. Rilling was always available to openly discuss new research ideas, ask challenging questions, and therefore elevate this work to the best it can be. His unique personality as a supervisor and friend is the main reason behind the success of this research. I also extend my thanks to members of the examining committee including Dr. Nikolaos Tsantalis, Dr. Ferhat Khendek, Dr. Dhrubajyoti Goswami, and Dr. Giuliano Antoniol (External), whose work and valuable feedback expanded my horizon and helped me discover new research opportunities that significantly improved my research. My gratitude also goes to Ambient Software Evolution Group members at Concordia University: Dr. Sultan Alqahtani, Dr. Rabe Abdalkareem, SayedHassan Khatoonabadi, Chris Forbes, Parisa Moslehi, and Yasaman Sarlati. A special thanks to two unique friends and colleagues, Dr. Sultan Alqahtani and Parisa Moslehi, who were there during the past years to discuss and share research ideas and challenges. Our discussions led to new published work and more research opportunities. Above all, I thank all the members of my family. First and foremost, I want to thank my parents, Victor and Joyce Eghan, who generously and wholeheartedly gave me their unconditional love and endless support throughout these years. I am grateful for your trust and confidence in me and for giving me the freedom to pursue my dreams. Last, but definitely most prominently, I thank my wife, Betty, for her unwavering love and encouragement during the pursuit of my studies. Thank you for always believing in me, and for reminding me to endure during the tough times. I consider myself lucky to have encountered you. I love you. vi Table of Contents List of Figures ............................................................................................................................... xi List of Tables .............................................................................................................................. xiv List of Acronyms ......................................................................................................................... xv 1 Introduction ........................................................................................................................... 1 1.1 Our Thesis ..................................................................................................................................... 2 1.2 Summary of Research Contributions ............................................................................................ 3 1.3 Related Publications ...................................................................................................................... 5 1.4 Thesis Organization ...................................................................................................................... 5 2 Motivation .............................................................................................................................. 7 3 Background and Related Work ......................................................................................... 12 3.1 The Semantic Web in a Nutshell ................................................................................................. 12 3.2 Ontologies in Software Engineering ........................................................................................... 15 3.3 Mining Software Repositories (MSR) ........................................................................................ 17 3.4 Build Systems and Dependency Management ............................................................................ 19 3.5 Chapter Summary ....................................................................................................................... 23 4 A Unified Ontology-based Modeling Approach for Software Build and Dependency Repositories ................................................................................................................................. 24 4.1 Introduction ................................................................................................................................. 24 4.2 Software Build System ONtology (SBSON): Knowledge Modeling and Engineering .............
Recommended publications
  • Identifying Challenges for OSS Vulnerability Scanners - a Study & Test Suite
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. X, NO. X, JUNE 202X 1 Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite Andreas Dann∗, Henrik Platey, Ben Hermannz, Serena Elisa Pontay, Eric Boddenx Abstract—The use of vulnerable open-source dependencies is a known problem in today’s software development. Several vulnerability scanners to detect known-vulnerable dependencies appeared in the last decade, however, there exists no case study investigating the impact of development practices, e.g., forking, patching, re-bundling, on their performance. This paper studies (i) types of modifications that may affect vulnerable open-source dependencies and (ii) their impact on the performance of vulnerability scanners. Through an empirical study on 7,024 Java projects developed at SAP, we identified four types of modifications: re-compilation, re-bundling, metadata-removal and re-packaging. In particular, we found that more than 87% (56%, resp.) of the vulnerable Java classes considered occur in Maven Central in re-bundled (re-packaged, resp.) form. We assessed the impact of these modifications on the performance of the open-source vulnerability scanners OWASP Dependency-Check (OWASP) and Eclipse Steady, GitHub Security Alerts, and three commercial scanners. The results show that none of the scanners is able to handle all the types of modifications identified. Finally, we present Achilles, a novel test suite with 2,505 test cases that allow replicating the modifications on open-source dependencies. Index Terms—Security maintenance, Open-Source Software, Tools, Security Vulnerabilities. F 1 INTRODUCTION and recall. However, developers and distributors frequently HE use of open-source software (OSS) is an established fork, patch, re-compile, re-bundle, or re-package existing T practice in software development, even for industrial OSS [2], [3], [4], [8].
    [Show full text]
  • View Whitepaper
    INFRAREPORT Top M&A Trends in Infrastructure Software EXECUTIVE SUMMARY 4 1 EVOLUTION OF CLOUD INFRASTRUCTURE 7 1.1 Size of the Prize 7 1.2 The Evolution of the Infrastructure (Public) Cloud Market and Technology 7 1.2.1 Original 2006 Public Cloud - Hardware as a Service 8 1.2.2 2016 - 2010 - Platform as a Service 9 1.2.3 2016 - 2019 - Containers as a Service 10 1.2.4 Container Orchestration 11 1.2.5 Standardization of Container Orchestration 11 1.2.6 Hybrid Cloud & Multi-Cloud 12 1.2.7 Edge Computing and 5G 12 1.2.8 APIs, Cloud Components and AI 13 1.2.9 Service Mesh 14 1.2.10 Serverless 15 1.2.11 Zero Code 15 1.2.12 Cloud as a Service 16 2 STATE OF THE MARKET 18 2.1 Investment Trend Summary -Summary of Funding Activity in Cloud Infrastructure 18 3 MARKET FOCUS – TRENDS & COMPANIES 20 3.1 Cloud Providers Provide Enhanced Security, Including AI/ML and Zero Trust Security 20 3.2 Cloud Management and Cost Containment Becomes a Challenge for Customers 21 3.3 The Container Market is Just Starting to Heat Up 23 3.4 Kubernetes 24 3.5 APIs Have Become the Dominant Information Sharing Paradigm 27 3.6 DevOps is the Answer to Increasing Competition From Emerging Digital Disruptors. 30 3.7 Serverless 32 3.8 Zero Code 38 3.9 Hybrid, Multi and Edge Clouds 43 4 LARGE PUBLIC/PRIVATE ACQUIRERS 57 4.1 Amazon Web Services | Private Company Profile 57 4.2 Cloudera (NYS: CLDR) | Public Company Profile 59 4.3 Hortonworks | Private Company Profile 61 Infrastructure Software Report l Woodside Capital Partners l Confidential l October 2020 Page | 2 INFRAREPORT
    [Show full text]
  • UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations
    UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Learning Structured and Causal Probabilistic Models for Computational Science Permalink https://escholarship.org/uc/item/0xf1t2zr Author Sridhar, Dhanya Publication Date 2018 License https://creativecommons.org/licenses/by-nc/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SANTA CRUZ LEARNING STRUCTURED AND CAUSAL PROBABILISTIC MODELS FOR COMPUTATIONAL SCIENCE A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE by Dhanya Sridhar September 2018 The Dissertation of Dhanya Sridhar is approved: Lise Getoor, Chair Marilyn Walker Kristian Kersting Lori Kletzer Vice Provost and Dean of Graduate Studies Copyright © by Dhanya Sridhar 2018 Table of Contents List of Figures vi List of Tables viii Abstract xii Dedication xiv Acknowledgments xv 1 Introduction 1 1.1 Challenges in Computational Science . .3 1.2 Structured Probabilistic Approaches . .6 1.3 Contributions . 10 2 Preliminaries 13 2.1 Probabilistic Graphical Models . 13 2.1.1 Markov Random Fields . 14 2.1.2 Bayesian Networks . 15 2.1.3 Inference and Learning . 17 2.2 Statistical Relational Learning . 19 2.3 Hinge-loss Markov Random Fields and Probabilistic Soft Logic . 21 3 Modeling Online Debates 24 3.1 Debate Stance Classification . 25 3.2 Online Debate Forums . 29 3.3 Related Work . 31 3.4 Modeling Debate Stance . 34 3.5 PSL Models . 38 3.6 Cost-Penalized Learning . 39 3.7 Experimental Results . 41 iii 3.7.1 Evaluating Modeling Choices .
    [Show full text]
  • Automated Knowledge Base Extension Using Open Information
    AUTOMATED KNOWLEDGE BASE EXTENSION USING OPEN INFORMATION by arnab kumar dutta durgapur, india Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik der Universität Mannheim Mannheim, Deutschland, 2015 dekan: Prof. Dr. Heinz Jürgen Müller, Universität Mannheim, Deutschland referent: Prof. Dr. Heiner Stuckenschmidt, Universität Mannheim, Deutschland korreferent: Ao. Prof. Dr. Fabian Suchanek, Télécom ParisTech University, France Tag der mündlichen Prüfung: 4 Februar 2016 Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it. And, like any great relationship, it just gets better and better as the years roll on. So keep looking until you find it. Don’t settle. - Steve Jobs (1955 - 2011) ABSTRACT Open Information Extractions (OIE) (like Nell,Reverb) frameworks provide us with domain independent facts in natural language forms containing knowledge from varied sources. Extraction mechanisms for structured knowledge bases (KB) (like DBpedia,Yago) often fail to retrieve such facts due to its resource specific extraction schemes. Hence, the structured KBs can extend themselves by augment- ing their coverage with the facts discovered by OIE systems. This possibility mo- tivates us to integrate these two genres of extractions into one interactive frame- work. In this work, we present a complete, ontology independent, generalized architecture for achieving this integration.
    [Show full text]
  • A BZ Media Publication
    SDT333 cover_Layout 1 12/22/16 2:26 PM Page 1 A BZ Media Publication JANUARY 2017 • ISSUE NO. 333 • $9.95 • www.sdtimes.com SDT333 Full Page Ads_Layout 1 12/21/16 9:15 AM Page 2 Data Quality Made Easy. Your Data, Your Way. NAME @ Melissa Data provides the full spectrum of data Our data quality solutions are available quality to ensure you have data you can trust. on-premises and in the Cloud – fast, easy to use, and powerful developer tools, We profile, standardize, verify, match and integrations and plugins for the Microsoft enrich global People Data – name, address, and Oracle Product Ecosystems. email & phone. Start Your Free Trial www.MelissaData.com/sd-times Germany India www.MelissaData.de www.MelissaData.in United Kingdom Australia www.MelissaData.co.uk www.MelissaData.com.au www.MelissaData.com 1-800-MELISSA SDT333 page 3_Layout 1 12/21/16 1:45 PM Page 3 Contents ISSUE 333 • JANUARY 2017 NEWS FEATURES 6 News Watch 2016: The Year of Artificial Intelligence 18 2017 trends in software development 20 Top retailers are open to hacks 23 Android 7.1: What you can expect 24 HashiCorp and the state of automation page 10 26 Notes from Node.js Interactive 29 The Software Testing World Cup chronicles Digital transformation is essential to the future of business 33 CollabNet enters DevOps arena 39 Red Hat powers the API economy page 34 COLUMNS 48 GUEST VIEW by Alan Ho Move fast while avoiding Diffusing the monolith time bomb automated testing pitfalls 49 ANALYST VIEW by Al Hilwa Immigration, progress and politics 50 INDUSTRY WATCH by David Rubinstein 2017: The future starts now page 41 Software Development Times (ISSN 1528-1965) is published 12 times per year by BZ Media LLC, 225 Broadhollow Road, Suite 211, Melville, NY 11747.
    [Show full text]
  • Unifying Logical and Statistical AI with Markov Logic
    review articles DOI:10.1145/3241978 Markov logic can be used as a general framework for joining logical and statistical AI. BY PEDRO DOMINGOS AND DANIEL LOWD Unifying Logical and Statistical AI with Markov Logic To handle the complexity and uncer- FOR MANY YEARS, the two dominant paradigms in tainty present in most real-world prob- artificial intelligence (AI) have been logical AI and lems, we need AI that is both logical and statistical AI. Logical AI uses first-order logic and statistical, integrating first-order logic related representa tions to capture complex and graphical models. One or the other relationships and knowledge about the world. key insights However, logic-based approaches are often too brittle ˽ Intelligent systems must be able to handle the complexity and uncertainty of the real to handle the uncertainty and noise pres ent in many world. Markov logic enables this by unifying first-order logic and probabilistic graphical applications. Statistical AI uses probabilistic models into a single representation. Many representations such as probabilistic graphical deep architectures are instances of Markov logic. models to capture uncertainty. However, graphical ˽ A extensive suite of learning and models only represent distributions over inference algorithms for Markov logic has been developed, along with open source propositional universes and must be customized to implementations like Alchemy. ˽ Markov logic has been applied to natural handle relational domains. As a result, expressing language understanding, information complex concepts and relationships in graphical models extraction and integration, robotics, social network analysis, computational is often difficult and labor-intensive. biology, and many other areas. 74 COMMUNICATIONS OF THE ACM | JULY 2019 | VOL.
    [Show full text]
  • Human in the Loop Enrichment of Product Graphs with Probabilistic Soft Logic
    Human in the Loop Enrichment of Product Graphs with Probabilistic Soft Logic Marouene Sfar Gandoura Zografoula Vagena Nikolaos Vasiloglou [email protected] [email protected] [email protected] RelationalAI RelationalAI RelationalAI ABSTRACT on the internet in the form of product manuals, product reviews, Product graphs have emerged as a powerful tool for online retailers blogs, social media sites, etc. to enhance product semantic search, catalog navigation, and rec- To tackle data completeness issues and to also take advantage ommendations. Their versatility stems from the fact that they can of the abundance of relevant information on the web, many retail- uniformly store and represent different relationships between prod- ers [4, 13] are currently looking into creating product graphs, i.e. ucts, their attributes, concepts or abstractions etc, in an actionable knowledge graphs that capture all the relevant information about form. Such information may come from many, heterogeneous, dis- each of their products and related entities. parate, and mostly unstructured data sources, rendering the product graph creation task a major undertaking. Our work complements existing efforts on product graph creation, by enabling field experts to directly control the graph completion process. We focus on the subtask of enriching product graphs with product attributes and we employ statistical relational learning coupled with a novel human in the loop enhanced inference workflow based on Probabilistic Soft Logic (PSL), to reliably predict product-attribute relationships. Our preliminary experiments demonstrate the viability, practicality and effectiveness of our approach and its competitiveness comparing with alternative methods. As a by-product, our method generates probabilistic fact validity labels from an originally unlabeled data- base that can subsequently be leveraged by other graph completion methods.
    [Show full text]
  • Hinge-Loss Markov Random Fields and Probabilistic Soft Logic
    Journal of Machine Learning Research 18 (2017) 1-67 Submitted 12/15; Revised 12/16; Published 10/17 Hinge-Loss Markov Random Fields and Probabilistic Soft Logic Stephen H. Bach [email protected] Computer Science Department Stanford University Stanford, CA 94305, USA Matthias Broecheler [email protected] DataStax Bert Huang [email protected] Computer Science Department Virginia Tech Blacksburg, VA 24061, USA Lise Getoor [email protected] Computer Science Department University of California, Santa Cruz Santa Cruz, CA 95064, USA Editor: Luc De Raedt Abstract A fundamental challenge in developing high-impact machine learning technologies is bal- ancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural lan- guage. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge- loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures.
    [Show full text]
  • Technology Report 1H 2018
    16 70,000 45 160,000 40 14 60,000 140,000 35 12 120,000 50,000 TEV ($mm) TEV ($mm) 30 10 100,000 40,000 25 8 80,000 30,000 20 6 60,000 15 20,000 Transaction Count Transaction Count 4 40,000 10 10,000 2 5 20,000 0 0 0 0 Date TEV Target Acquirer(s) Deal Overview Announced (mm) BMC Software is a global leader in digital enterprise software solutions. The acquisition of 5/29/18 BMC continues KKR's long record of supporting B2B software companies, such as its current $8,300 portfolio of Epicor and Calabrio. Microsoft agreed to acquire GitHub in a push to empower developers, boost enterprise use of 6/4/18 GitHub, and continue to build out its developer tools. More than 28 million developers use 7,500 GitHub as a community to build, collaborate and share ideas. Athenahealth provides cloud-based business services and mobile applications for medical groups and health systems. Elliott has made an activist takeover offer, believing that it could 5/7/18 6,900 help provide the operational change necessary for athenahealth to fundamentally change the Healthcare IT industry. Partners Group will take a majority stake in software development firm GlobalLogic. 5/21/18 GlobalLogic helps clients construct innovative digital products that enhance customer 2,000 engagement. Web.com is a global provider of full range internet services and online marketing tools. SIRIS 6/21/18 Capital wants to acquire Web.com believing that it can add the strategic and operational 1,885 expertise to help grow Web.com.
    [Show full text]
  • ESIP Software Guidelines: Bibliography and Resources
    ESIP Software Guidelines: Bibliography and Resources References “26.4. Unittest — Unit Testing Framework — Python 3.5.2 Documentation.” Accessed November 23, 2016. https://docs.python.org/3.5/library/unittest.html. “Accessibility - W3C.” Accessed June 15, 2016. https://www.w3.org/standards/webdesign/accessibility#wai. “Agile Project Management.” Accessed October 20, 2016. https://www.pivotaltracker.com/. “A JavaScript Library for Building User Interfaces - React.” Accessed October 26, 2016. https://facebook.github.io/react/. Alter, George, George C Banks, Denny Borsboom, Sara D Bowman, Steven J Breckler, Stuart Buck, Chris Chambers, et al. Transparency and Openness Promotion (TOP) Guidelines. Open Science Framework, 2016. osf.io/9f6gx. “Apache Subversion.” Accessed October 19, 2016. https://subversion.apache.org/. “API Blueprint | API Blueprint.” Accessed October 20, 2016. https://apiblueprint.org/. Atlassian. “Bitbucket | The Git Solution for Professional Teams.” Bitbucket. Accessed October 20, 2016. https://bitbucket.org/. “Backbone.js.” Accessed October 26, 2016. http://backbonejs.org/. “Best Practice Library | Section508.gov.” Accessed May 24, 2016. http://section508.gov/content/learn/best-practice-library. “Bootstrap · The World’s Most Popular Mobile-First and Responsive Front-End Framework.” Accessed October 20, 2016. http://getbootstrap.com/. Brutlag, Jake. “Speed Matters.” Google Research Blog, June 23, 2009. https://research.googleblog.com/2009/06/speed-matters.html. Burger, Matthias, Klaus Juenemann, and Thomas Koenig. RUnit: R Unit Test Framework (version 0.4.31), 2015. https://cran.r-project.org/web/packages/RUnit/index.html. Burgess, Annie. “2015 AIST Evaluations Overview.” Federation of Earth Science Information Partners, 2016. http://testbed.esipfed.org/sites/default/files/2015_AIST_Evaluations_Overview.pdf. Car, Nicholas. “Data Reuse Fitness Assessment Using Provenance.” Denver, CO, 2016.
    [Show full text]
  • Probabilistic Soft Logic Dr
    Probabilistic Soft Logic Dr. Fayyaz ul Amir Afsar Minhas PIEAS Biomedical Informatics Research Lab Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan http://faculty.pieas.edu.pk/fayyaz/ Adapted and copied from: Probabilistic Soft Logic: A New Framework for Statistical Relational Learning https://pdfs.semanticscholar.org/presentation/2cf1/22badd9491bbd9c1272e20c69cf9bca0af15.pdf Probabilistic Soft Logic: A Scalable Probabilistic Programming Language for Modeling Richly Structured Domains Golnoosh Farnadi, Lise Getoor LINQS Group, UCSC Hinge-Loss Markov Random Fields and Probabilistic Soft Logic (JMLR 2017) Probabilistic soft lofic: A short Introduction CIS 622: Machine Learning in Bioinformatics PIEAS Biomedical Informatics Research Lab IID • Typical Machine learning assumes that examples are IID CIS 622: Machine Learning in Bioinformatics PIEAS Biomedical Informatics Research Lab 2 IID in real-world • A lot of real-world data are not IID • Data points often have relationships amongst them or model relationships between entities – Structured Data – Network Data – Heterogeneous networks • Examples – Social networks (FB) – Document Nets (Wikipedia) – Biological networks – NLP Data – Video Data • Most machine learning is “model independent” – Modeling of causal (cause-effect) relationships • Judea Pearl “Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution” • Judea Pearl “The Book of Why” – How can we model these relationships?
    [Show full text]
  • Collective Spammer Detection in Evolving Multi-Relational Social Networks
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/278968259 Collective Spammer Detection in Evolving Multi-Relational Social Networks Conference Paper · August 2015 DOI: 10.1145/2783258.2788606 CITATIONS READS 17 282 4 authors, including: Shobeir Fakhraei Lise Getoor University of Southern California University of Maryland, College Park 16 PUBLICATIONS 169 CITATIONS 238 PUBLICATIONS 9,126 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Medical Decision Making for individual patients View project All content following this page was uploaded by Shobeir Fakhraei on 26 June 2015. The user has requested enhancement of the downloaded file. Collective Spammer Detection in Evolving Multi-Relational Social Networks Shobeir Fakhraei James Foulds Madhusudana Shashanka ∗ y University of Maryland University of California Santa Cruz, CA, USA if(we) Inc. College Park, MD, USA San Francisco, CA, USA [email protected] [email protected] [email protected] Lise Getoor University of California Santa Cruz, CA, USA [email protected] ABSTRACT Keywords Detecting unsolicited content and the spammers who create Social Networks, Spam, Social Spam, Collective Classifica- it is a long-standing challenge that affects all of us on a daily tion, Graph Mining, Multi-relational Networks, Heteroge- basis. The recent growth of richly-structured social net- neous Networks, Sequence Mining, Tree-Augmented Naive works has provided new challenges and opportunities in the Bayes, k-grams, Hinge-loss Markov Random Fields (HL- spam detection landscape. Motivated by the Tagged.com1 MRFs), Probabilistic Soft Logic (PSL), Graphlab. social network, we develop methods to identify spammers in evolving multi-relational social networks.
    [Show full text]