Linked Web Apis Dataset Web Apis Meet Linked Data
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 1 (2015) 1–5 1 IOS Press Linked Web APIs Dataset Web APIs meet Linked Data Editor(s): Rinke Hoekstra, Universiteit van Amsterdam, The Netherlands Solicited review(s): Christoph Lange, University of Bonn, Germany; Enrico Daga, KMI, The Open University, UK; Tobias Kuhn VU Amsterdam, The Netherlands Milan Dojchinovski and Tomas Vitvar Web Intelligence Research Group, Faculty of Information Technology, Czech Technical University in Prague E-mail: {milan.dojchinovski,tomas.vitvar}@fit.cvut.cz Abstract. Web APIs enjoy a significant increase in popularity and usage in the last decade. They have become the core technology for exposing functionalities and data. Nevertheless, due to the lack of semantic Web API descriptions their discovery, sharing, integration, and assessment of their quality and consumption is limited. In this paper, we present the Linked Web APIs dataset, an RDF dataset with semantic descriptions about Web APIs. It provides semantic descriptions for 11,339 Web APIs, 7,415 mashups and 7,717 developer profiles, which make it the largest available dataset from the Web APIs domain. The dataset captures the provenance, temporal, technical, functional, and non-functional aspects. In addition, we describe the Linked Web APIs Ontology, a minimal model which builds on top of several well-known ontologies. The dataset has been interlinked and published according to the Linked Data principles. Finally, we describe several possible usage scenarios for the dataset and show its potential. Keywords: Web APIs, Linked Data, Web services, Linked Web APIs, ontology 1. Introduction ery and selection of APIs, on the other hand the Web API providers can execute queries to get better insight Web APIs have become the first-class citizens and analysis results from the Web APIs ecosystem. on the Web and the core functionality of any Web In order to achieve these goals, we have developed application. Targeting the developer audience, they the Linked Web APIs dataset. It provides information lower the entry barriers for accessing valuable en- about Web APIs, mashups which utilize Web APIs terprise data and functionalities. Back in late 2008, in compositions, and mashup developers. The primary ProgrammableWeb.com1, the largest Web API and source for the dataset is the ProgrammableWeb.com mashup directory, reported only 1,000 Web APIs. This directory, which acts as central repository for Web count increased to 5,000 APIs in Feb 2014 and over APIs descriptions. The dataset re-uses several well- 13,000 APIs in June 2015. There are several benefits of known ontologies developed by the Semantic Web having these Web API descriptions provided as Linked community. In order to conform to the Linked Data Data. The Web API descriptions are contextualized, principles2, we have also linked the dataset with four they can be referenced, re-used and combined. The Web APIs data is linked, therefore API consumers can effectively discover new Web APIs. Last but not least, on the one hand the developers can benefit from so- phisticated discovery and selection queries for discov- 1http://www.programmableweb.com 2http://www.w3.org/DesignIssues/LinkedData.html 1570-0844/15/$27.50 c 2015 – IOS Press and the authors. All rights reserved 2 M. Dojchinovski et al. / Linked Web APIs Dataset central LOD datasets: DBpedia3, Freebase4, Linked- which describes a developer, we extracted its user- GeoData5 and GeoNames6. name, homepage and short bio. The city and country The remainder of this paper is structured as follows. of residence, its given and family name and the gender Section 2 describes the source of information and how were extracted only if these information were publicly the data was collected. Section 3 describes the devel- available. oped ontology for modelling relevant Web APIs infor- We also captured the relationships between the Web mation. The creation of the Linked Web APIs dataset APIs, mashups and developers. In other words, for and its technical details are described in Section 4. The each mashup we extracted the list of Web APIs which approach for interlinking the dataset with other LOD were used by the mashup and also the list of mashups datasets is described in Section 5. Section 6 discusses created by each developer. The dataset also captures the quality of the ontology and the dataset. Section 7 the temporal aspects - the creation time of the Web presents selected use cases and the results from a sur- APIs, mashups and the time a user registered his pro- vey on the potential and the usefulness of the dataset. file. Section 8 discusses related vocabularies and potential In order to collect the data, we have implemented data sources. Finally, Section 9 concludes the paper. a script which systematically browses and parses rel- evant pages. The parsing mechanism has been imple- mented using the jsoup Java HTML parser8. For the 2. The Data Source crawler, we used proper etiquette and we configured the crawl delay to one page every four seconds. In our work, we have considered ProgrammableWeb as a primary source of information for creating the dataset. It adopts characteristics of a social Web plat- 3. The Ontology form where Web API providers can publish and share information about offered Web APIs and consequently The Linked Web APIs ontology9 is a minimal model increase their visibility. The API directory also allows that captures the most relevant information which are developers to find appropriate APIs for their projects, related to Web APIs and mashups. The ontology builds or they can learn from showcases of existing mashup on top of existing and well established ontologies and applications. appropriately extends them. The selection of the on- The implemented knowledge extraction process tologies was driven by the following four crucial re- consists of four steps: (1) parsing and extraction of quirements: valuable information from pages describing Web APIs, – Provenance: It is important to keep information mashups and developers, (2) pre-processing, cleanup about Who (developers) created What (mashups) and consolidation of information, (3) linking with and How (using which APIs). In addition, infor- LOD resources, and (4) lifting in RDF and publishing mation about APIs a provider provides need to be the data as Linked Data. captured as well. An example of a Web page which describes a Web – Functional and Non-functional Properties: What API is the one which describes the Twitter API7. For functionalities a Web API or mashup offers is each Web API we extracted its title, short summary more than important, as well as their usage lim- describing its functionalities, assigned tags and cate- its and fees, supported security and authentication gories, technical information, such as supported for- mechanisms. mats and protocols, as well as non-functional proper- – Technical Properties: Information about the sup- ties such as its homepage, usage limits, usage fees, se- ported protocols, formats and the Web API end- curity, etc. Similarly, for each mashup we extracted point location is as important, as it allows a Web its title, short free-text description of its functionali- API consumer to search only for APIs with pre- ties, assigned tags and the homepage. From each page ferred technical capabilities. 3http://dbpedia.org/ 4https://www.freebase.com/ 5http://linkedgeodata.org/ 8 6http://www.geonames.org/ http://jsoup.org/ 7http://www.programmableweb.com/api/twitter 9http://linked-web-apis.fit.cvut.cz/ns/core/index.html M. Dojchinovski et al. / Linked Web APIs Dataset 3 lwapis:assignedCategory Category Tag lwapis:assignedCategory lwapis:assignedTag rdfs:label ; rdfs:label ; dcterms:title . dcterms:title . DataFormat Mashup WebAPI rdfs:subClassOf wl:NonFunctionalParameter ; rdfs:subClassOf prov:Entity ; rdfs:label ; rdfs:label ; rdfs:subClassOf msm:Service , dcterms:title . dcterms: title ; lwapis:assignedTag prov:Entity ; dcterms:description ; rdfs:label ; lwapis:supportedFormat prov:generatedAtTime ; dcterms: title ; dcterms:modified ; dcterms:description ; lwapis:rating . prov:generatedAtTime ; Protocol prov:wasGeneratedBy dcterms:modified ; lwapis:endpoint ; rdfs:subClassOf wl:NonFunctionalParameter ; lwapis:rating ; rdfs:label ; lwapis:usageFees ; dcterms:title . Activity lwapis:usageLimits ; lwapis:usedAPI lwapis:requiresDeveloperKey ; a prov:Activity ; lwapis:supportedProtocol prov:wasGeneratedBy lwapis:requiresAccount ; dcterms: title . lwapis:requiresSSL . prov:wasAssociatedWith Agent @prefix lwapis: <http://linked-web-apis.fit.cvut.cz/ns/core#> . @prefix dcterms: <http://purl.org/dc/terms/> . a prov: Agent, foaf:Agent ; prov:wasAttributedTo @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . rdfs:label ; dcterms:creator @prefix foaf: <http://xmlns.com/foaf/0.1/> . dcterms:title ; dcterms:publisher prov:wasAttributedTo @prefix prov: <http://www.w3.org/ns/prov#> . foaf:name . dcterms:creator @prefix msm: <http://iserve.kmi.open.ac.uk/ns/msm#> dcterms:publisher @prefix wl: <http://www.wsmo.org/ns/wsmo-lite#> Fig. 1. The Linked Web APIs Ontology. – Temporal Information: When a mashup or Web dra:ApiDocumentation class from the Hydra11 vocab- API was created is a valuable information as well. ulary. We introduce the lso:usedAPI property which For example, to analyze the recent trends in the refines the semantics of the prov:used property so it API ecosystem or to discover the most recent Web can be used to explicitly identify usage of a Web API APIs or mashups. in a mashup creation. The temporal information about the creation time of a mashup or Web API is expressed Figure 1 shows the overall Linked Web APIs on- using the prov:generatedAtTime property. It represents tology. The ontology contains three central classes: the time when the API documentation has been gener- lso:WebAPI – describe Web APIs, lso:Mashup – de- ated withing the snapshot. Since this is the first official scribe mashup compositions, which utilize one or more snapshot of the dataset, the value of this property in- Web API, and lso:Agent – represent all kinds of en- dicates the first version of each resource. Any changes tities involved in the creation and/or consumption of in the following snapshots will be captured with the Web APIs and mashups.