Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Lyon-Nist241assmoct9.Pdf
NIST Special Publication 500-241 Information Technology: A Quick-Reference List of Organizations and Standards for Digital Rights Management Gordon E. Lyon NIST Special Publication 500-241 Information Technology: A Quick-Reference List of Organizations and Standards for Digital Rights Management Gordon E. Lyon Convergent Information Systems Division Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8951 October 2002 U.S. Department of Commerce Donald L. Evans, Secretary Technology Administration Phillip J. Bond, Under Secretary of Commerce for Technology National Institute of Standards and Technology Arden L. Bement, Jr., Director Reports on Information Technology The Information Technology Laboratory (ITL) at the National Institute of Standards and Technology (NIST) stimulates U.S. economic growth and industrial competitiveness through technical leadership and collaborative research in critical infrastructure technology, including tests, test methods, reference data, and forward-looking standards, to advance the development and productive use of information technology. To overcome barriers to usability, scalability, interoperability, and security in information systems and networks, ITL programs focus on a broad range of networking, security, and advanced information technologies, as well as the mathematical, statistical, and computational sciences. This Special Publication 500-series reports on ITL’s research in tests and test methods for information technology, and its collaborative activities with industry, government, and academic organizations. Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose. -
Approved DITA 2.0 Proposals
DITA Technical Committee DITA 2.0 proposals DITA TC work product Page 1 of 189 Table of contents 1 Overview....................................................................................................................................................3 2 DITA 2.0: Stage two proposals.................................................................................................................. 3 2.1 Stage two: #08 <include> element.................................................................................................... 3 2.2 Stage two: #15 Relax specialization rules......................................................................................... 7 2.3 Stage two: #17 Make @outputclass universal...................................................................................9 2.4 Stage two: #18 Make audience, platform, product, otherprops into specializations........................12 2.5 Stage two: #27 Multimedia domain..................................................................................................16 2.6 Stage two: #29 Update bookmap.................................................................................................... 20 2.7 Stage two: #36 Remove deprecated elements and attributes.........................................................23 2.8 Stage two: #46: Remove @xtrf and @xtrc...................................................................................... 31 2.9 Stage 2: #73 Remove delayed conref domain.................................................................................36 -
A 3-Tier Application for Topic Map Technology
Proceedings of the 6th WSEAS International Conference on Applied Computer Science, Tenerife, Canary Islands, Spain, December 16-18, 2006 356 A 3-tier Application for Topic Map Technology A. PAPASTERGIOU, G. GRAMMATIKOPOULOS*, A. HATZIGAIDAS, G. TRYFON Department of Electronics, (*) Department of Esthetics Alexander Technological Educational Institute of Thessaloniki, EPEAEK II Sindos, 57400, Thessaloniki, Macedonia, GREECE Abstract: In the context of this paper the design and construction of a 3-tier application system for topic map technology is presented in detail. The proposed system employs Enterprise Topic Map Server (ETMS) that manages topic maps (TM) and establishes communication with client applications and persistent database. Yellow Web Services have been customized in order to achieve communication issues and exchange of TM. A client application is addressed to TM developers for editing and visualization of TM. Furthermore, the definition of TM Schema as well as the possibility to process rules and queries are supported. Another key aspect of the present paper is to analyze how different technologies utilized to formulate such a system. Key-words: Topic map, Server, Web services, Client, 3-tier, TM schema, Retrieval 1 Introduction definition of TM Schema as well as the possibility to The Topic Map model describes a mechanism for process rules and queries are also supported. In the representing knowledge about the structure of next sections the overall architecture of the proposed information and organizing it into topics [1,19]. Topics system as well as technologies that are implied in have occurrences and associations that define order to build the system are represented in details. relationships between the topics, creating thus a semantic network over the information [8,18]. -
Mapping Topic Maps to Common Logic
MAPPING TOPIC MAPS TO COMMON LOGIC Tamas´ DEMIAN´ Advisor: Andras´ PATARICZA I. Introduction This work is a case study for the mapping of a particular formal language (Topic Map[1] (TM)) to Common Logic[2] (CL). CL was intended to be a uniform platform ensuring a seamless syntactic and semantic integration of knowledge represented in different formal languages. CL is based on first-order logic (FOL) with a precise model-theoretic semantic. The exact target language is Common Logic Interchange Format (CLIF), the most common dialect of CL. Both CL and TM are ISO standards and their metamodels are included in the Object Definition Metamodel[3] (ODM). ODM was intended to serve as foundation of Model Driven Architecture (MDA) offering formal basis for representation, management, interoperability, and semantics. The paper aims at the evaluation of the use of CL as a fusion platform on the example of TM. II. The Topic Maps TM is a technology for modelling knowledge and connecting this structured knowledge to relevant information sources. A central operation in TMs is merging, aiming at the elimination of redundant TM constructs. TopicMapConstruct is the top-level abstract class in the TM metamodel (Fig. 1). The later detailed ReifiableConstruct, TypeAble and ScopeAble classes are also abstract. The remaining classes are pairwise disjoint. Figure 1: The class hierarchy and the relation and attribute names of the TM metamodel. TopicMap is a set of topics and associations. Topic is a symbol used within a TM to represent exactly one subject, in order to allow statements to be made about that subject. -
Naiad: a Timely Dataflow System
Naiad: A Timely Dataflow System Derek G. Murray Frank McSherry Rebecca Isaacs Michael Isard Paul Barham Mart´ın Abadi Microsoft Research Silicon Valley {derekmur,mcsherry,risaacs,misard,pbar,abadi}@microsoft.com Abstract User queries Low-latency query are received responses are delivered Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput Queries are of batch processors, the low latency of stream proces- joined with sors, and the ability to perform iterative and incremental processed data computations. Although existing systems offer some of Complex processing these features, applications that require all three have re- incrementally re- lied on multiple platforms, at the expense of efficiency, Updates to executes to reflect maintainability, and simplicity. Naiad resolves the com- data arrive changed data plexities of combining these features in one framework. A new computational model, timely dataflow, under- Figure 1: A Naiad application that supports real- lies Naiad and captures opportunities for parallelism time queries on continually updated data. The across a wide class of algorithms. This model enriches dashed rectangle represents iterative processing that dataflow computation with timestamps that represent incrementally updates as new data arrive. logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism. requirements: the application performs iterative process- We show that many powerful high-level programming ing on a real-time data stream, and supports interac- models can be built on Naiad’s low-level primitives, en- tive queries on a fresh, consistent view of the results. abling such diverse tasks as streaming data analysis, it- However, no existing system satisfies all three require- erative machine learning, and interactive graph mining. -
Semantic Mapping in ROS XAVIER GALLART DEL BURGO
Semantic Mapping in ROS XAVIER GALLART DEL BURGO Master’s Thesis at CVAP/CAS, The Royal Institute of Technology, Stockholm, Sweden Supervisor: Dr. Andrzej Pronobis Examiner: Prof. Danica Kragic August, 2013 Abstract In the last few years robots are becoming more popular in our daily lives. We can see them guiding people in museums, helping surgeons in hospitals and autonomously cleaning houses. With the aim of enabling robots to cooperate with humans and to perform human-like tasks we need to provide them with the capability of understanding human envi- ronments and representing the extracted knowledge in such a way that humans can interpret. Semantic mapping can be defined as the process of building a representation of the environment, incorporating semantic knowledge obtained from sensory information. Semantic properties can be extracted from various sources such as objects, topology of the envi- ronment, size and shape of rooms and room appearance. This thesis proposes an implementation of semantic mapping for mo- bile robots which is integrated in a framework called Robot Operat- ing System (ROS). The system extracts spatial properties like rooms, objects and topological information and combines them with common- sense knowledge into a probabilistic framework which is capable of in- ferring room categories. The system is tested in simulations and in real-world scenarios and the results show how the system explores an unknown environment, creates an accurate map, detects objects, infers room categories and represents the results in a map where each room is labelled according to its functionality. Acknowledgements First and foremost, I would like to express my gratitude to my super- visor, Andrzej Pronobis, for giving me the opportunity to develop my thesis in one of my passions, the robots. -
Topic Map Aided Publishing
A Case Study of Assembly Media Archive Topic Map Aided Publishing Grip Studios Interactive, Aki Kivelä 2.9.2004 Assembly’04 Media Archive WWW publishing platform. Publishes images, videos and related metadata real-time. Integrated to Assembly’s [1] publishing environment. Uses Topic Maps [2] to store knowledge. Supports heterogeneous data sources. Short development time . Short life span. [1] Assembly’04 http://www.assembly.org/ [2] Steve Pepper. The TAO of Topic Maps, finding the way in the age of infoglut http://www.gca.org/papers/xmleurope2000/pdf/s11-01.pdf 2 Publishing Environment of the Assembly Media Archive video production team VideoStore Media Archive Elaine topicmap image production team clients 3 Video Production Process Process and services were designed FTP by Assembly’04 crew. HTTP SERVER HTTP video production team VideoStore Media Archive Elaine topicmap image production team clients HTTP, WWW FORM HTTP, XML 4 Image Production Process Uploading images and metadata to Media Archive video production team VideoStore Media Archive Elaine topicmap image production team clients JPEG, XTM (TOPIC MAPS) TCP/IP, ADMINISTRATION APPLICATION 5 Merging Elaine Media Archive video & image knowledge Merging Topic Maps from heterogeneous sources Video XSL [3] Video XTM XML Topic Map XTM MERGE [4] Merged XTM Topic Map Image Image XTM XTM Topic Map Topic Map Administration Ontology Application Media Archive XTM Topic Map [3] XSL Style Sheets http://www.w3c.org/Style/XSL/ [4] Merging Topic Maps http://www.topicmaps.org/xtm/index.html#desc-merging 6 Topic Map Topic Maps Map of information resources Collection of Topics, Associations between topics and related information resources (Occurrences) Steve Pepper. -
Published Subjects: Introduction and Basic Requirements
Published Subjects: Introduction and Basic Requirements OASIS Published Subjects Technical Committee Recommendation, 2003-06-24 Document identifier: pubsubj-pt1-1.01-cs Location: http://www.oasis-open.org/committees/documents.php?wg_abbrev=tm- pubsubj Editor: Steve Pepper <[email protected]> Contributors: Bernard Vatant (TC Chair), Suellen Stringer-Hye (TC Secretary), James David Mason (TC Liaison with ISO/IEC JTC1/SC34), Thomas Bandholtz, Vivian Bliss, Patrick Durusau, Peter Flynn, Eric Freese, Lars Marius Garshol, Kim Sung Hyuk, Motomu Naito, Eamonn Neylon, Mary Nishikawa, Michael Priestley, H. Holger Rath, Don Smith Abstract: This document provides an introduction to Published Subjects and basic requirements and recommendations for publishers of Published Subjects. Status: Committee Specification Copyright © 2003 The Organization for the Advancement of Structured Information Standards [OASIS] Table of Contents 1. Introduction 2. A Gentle Introduction to Published Subjects 2.1 Subjects and Topics 2.2 The Identification of Subjects 2.3 The Addressability of Subjects 2.4 Subject Indicators and Subject Identifiers 2.4.1 Subject Identification for Humans: Subject Indicators 2.4.2 Subject Identification for Computers: Subject Identifiers 2.4.3 Distinguishing between Subject Addresses and Subject Identifiers 2.4.4 Example: Identifying the Subject "Apple" 2.5 Published Subjects 2.5.1 Shortcomings of the above scenario 2.5.2 Publishers in the loop 2.5.3 Example: A Published Subject for "Apple" 2.6 The Adoption of PSIs 3. Requirements and Recommendations for PSIs 3.1 Requirements for PSIs 3.1.1 A PSID must be a URI 3.1.2 A PSID must resolve to a PSI 3.1.3 A PSI must explicitly state its PSID 3.2 Recommendations for PSIs 3.2.1 A PSI should provide human-readable metadata 3.2.2 A PSI may provide machine-readable metadata 3.2.3 Human-readable and machine-readable metadata should be consistent but need not be equivalent 3.2.4 A PSI should indicate its intended use as a PSI 3.2.5 A PSI should identify its publisher 4. -
XML Metadata Standards and Topic Maps Outline
XML Metadata Standards and Topic Maps Erik Wilde 16.7.2001 XML Metadata Standards and Topic Maps 1 Outline what is XML? z a syntax (not a data model!) what is the data model behind XML? z XML Information Set (basically, trees...) what can be described with XML? z describing the content syntactically (schemas) z describing the content abstractly (metadata) XML metadata is outside of XML documents ISO Topic Maps z a "schema language" for meta data 16.7.2001 XML Metadata Standards and Topic Maps 2 1 Extensible Markup Language standardized by the W3C in February 1998 a subset (aka profile) of SGML (ISO 8879) coming from a document world z data are documents defined in syntax z no abstract data model problems in many real-world scenarios z how to compare XML documents z attribute order, white space, namespace prefixes, ... z how to search for data within documents z query languages operate on abstract data models z often data are not documents 16.7.2001 XML Metadata Standards and Topic Maps 3 Why XML at all? because it's simple z easily understandable, human-readable because of the available tools z it's easy to find (free) XML software because of improved interoperability z all others do it! z easy to interface with other XML applications because it's versatile z the data model behind XML is very versatile 16.7.2001 XML Metadata Standards and Topic Maps 4 2 XML Information Set several XML applications need a data model z style sheets for XML (CSS, XSL) z interfaces to programming languages (DOM) z XML transformation languages (XSLT) z XML -
Increasing the Evidentiary Value of Oral Histories for Use in Digital Libraries
DESIGNING A GRIOTTE FOR THE GLOBAL VILLAGE: INCREASING THE EVIDENTIARY VALUE OF ORAL HISTORIES FOR USE IN DIGITAL LIBRARIES A Dissertation by RHONDA THAYER DUNN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2011 Major Subject: Computer Science DESIGNING A GRIOTTE FOR THE GLOBAL VILLAGE: INCREASING THE EVIDENTIARY VALUE OF ORAL HISTORIES FOR USE IN DIGITAL LIBRARIES A Dissertation by RHONDA THAYER DUNN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, John J. Leggett Committee Members, Frank M. Shipman III Anat Geva Walter D. Kamphoefner Head of Department, Valerie Taylor August 2011 Major Subject: Computer Science iii ABSTRACT Designing a Griotte for the Global Village: Increasing the Evidentiary Value of Oral Histories for Use in Digital Libraries. (August 2011) Rhonda Thayer Dunn, B.S., Xavier University of Louisiana; M.C.S., Howard University; M.U.P., Texas A&M University Chair of Advisory Committee: Dr. John J. Leggett A griotte in West African culture is a female professional storyteller, responsible for preserving a tribe‘s history and genealogy by relaying its folklore in oral and musical recitations. Similarly, Griotte is an interdisciplinary project that seeks to foster collaboration between tradition bearers, subject experts, and computer specialists in an effort to build high quality digital oral history collections. To accomplish this objective, this project preserves the primary strength of oral history, namely its ability to disclose ―our‖ intangible culture, and addresses its primary criticism, namely its dubious reliability due to reliance on human memory and integrity. -
ML Cheatsheet Documentation
ML Cheatsheet Documentation Team Sep 02, 2021 Basics 1 Linear Regression 3 2 Gradient Descent 21 3 Logistic Regression 25 4 Glossary 39 5 Calculus 45 6 Linear Algebra 57 7 Probability 67 8 Statistics 69 9 Notation 71 10 Concepts 75 11 Forwardpropagation 81 12 Backpropagation 91 13 Activation Functions 97 14 Layers 105 15 Loss Functions 117 16 Optimizers 121 17 Regularization 127 18 Architectures 137 19 Classification Algorithms 151 20 Clustering Algorithms 157 i 21 Regression Algorithms 159 22 Reinforcement Learning 161 23 Datasets 165 24 Libraries 181 25 Papers 211 26 Other Content 217 27 Contribute 223 ii ML Cheatsheet Documentation Brief visual explanations of machine learning concepts with diagrams, code examples and links to resources for learning more. Warning: This document is under early stage development. If you find errors, please raise an issue or contribute a better definition! Basics 1 ML Cheatsheet Documentation 2 Basics CHAPTER 1 Linear Regression • Introduction • Simple regression – Making predictions – Cost function – Gradient descent – Training – Model evaluation – Summary • Multivariable regression – Growing complexity – Normalization – Making predictions – Initialize weights – Cost function – Gradient descent – Simplifying with matrices – Bias term – Model evaluation 3 ML Cheatsheet Documentation 1.1 Introduction Linear Regression is a supervised machine learning algorithm where the predicted output is continuous and has a constant slope. It’s used to predict values within a continuous range, (e.g. sales, price) rather than trying to classify them into categories (e.g. cat, dog). There are two main types: Simple regression Simple linear regression uses traditional slope-intercept form, where m and b are the variables our algorithm will try to “learn” to produce the most accurate predictions. -
1.2 Le Zhang, Microsoft
Objective • “Taking recommendation technology to the masses” • Helping researchers and developers to quickly select, prototype, demonstrate, and productionize a recommender system • Accelerating enterprise-grade development and deployment of a recommender system into production • Key takeaways of the talk • Systematic overview of the recommendation technology from a pragmatic perspective • Best practices (with example codes) in developing recommender systems • State-of-the-art academic research in recommendation algorithms Outline • Recommendation system in modern business (10min) • Recommendation algorithms and implementations (20min) • End to end example of building a scalable recommender (10min) • Q & A (5min) Recommendation system in modern business “35% of what consumers purchase on Amazon and 75% of what they watch on Netflix come from recommendations algorithms” McKinsey & Co Challenges Limited resource Fragmented solutions Fast-growing area New algorithms sprout There is limited reference every day – not many Packages/tools/modules off- and guidance to build a people have such the-shelf are very recommender system on expertise to implement fragmented, not scalable, scale to support and deploy a and not well compatible with enterprise-grade recommender by using each other scenarios the state-of-the-arts algorithms Microsoft/Recommenders • Microsoft/Recommenders • Collaborative development efforts of Microsoft Cloud & AI data scientists, Microsoft Research researchers, academia researchers, etc. • Github url: https://github.com/Microsoft/Recommenders • Contents • Utilities: modular functions for model creation, data manipulation, evaluation, etc. • Algorithms: SVD, SAR, ALS, NCF, Wide&Deep, xDeepFM, DKN, etc. • Notebooks: HOW-TO examples for end to end recommender building. • Highlights • 3700+ stars on GitHub • Featured in YC Hacker News, O’Reily Data Newsletter, GitHub weekly trending list, etc.