F# Data: Accessing Structured Data Made Easy

F# Data: Accessing Structured Data Made Easy

F# Data: Accessing structured data made easy Tomas Petricek University of Cambridge [email protected] 1. Introduction let data = Http.Request("http://weather.org/?q=Prague") match JsonValue.Parse data with Programming may not be the new literacy, but it is finding its way ( ) Record root into many areas of modern society. For example, making sense of j ( ) ! large amounts of data that are increasingly made available through match Map.nd "main" root with open government data initiatives1 is almost impossible without j Record(main) ! some programming skills. In media, data journalism reflects this match Map.nd "temp" main with development. Data journalists still write articles and focus on sto- j Number(num) ! printfn "Lovely %f degrees!" num ries, but programming is at the core of their work. j _ ! failwith "Incorrect format" Improving support for data access in programming language can j _ ! failwith "Incorrect format" make understanding data simpler, more usable and reproducible: j _ ! failwith "Incorrect format" • Simpler. Many programming languages treat data as foreign The pattern matching assumes that the response has a particular entities that have to be parsed or processed. Instead, data should format (as described in the documentation). The root node must be treated as first-class entities fully integrated with the rest of be a record with a "main" field, which has to be another record the programming language. containing a "temp" field with a numerical value. When the format • Usability. Modern developer tools make coding easier with is incorrect, the data access simply fails with an exception. auto-complete and early error checking. Unfortunately, these The code is complicated, because data is parsed into a fully gen- typically rely on static types which, in turn, make programming eral data structure that we then process. The code is not benefiting with standard untyped data harder. from the generality of the data structure – quite the opposite! • Reproducible. Data journalists often use a wide range of tools Dynamically-typed. Doing the same in JavaScript is shorter and (including Excel, scripts and other ad-hoc tools). This makes it simpler (not surprisingly, as JSON has been designed after a subset hard to reproduce the analysis and detect errors when the input of JavaScript). Using jQuery to perform the request, we can write: data changes. jQuery.ajax("http://weather.org/?q=Prague"; function(data) f The presented work reconcilles the simplicity of data access var obj = JSON.parse(data); in dynamically-typed programming languages with the usability write("Lovely "; obj.main.temp; " degrees!"); and reproducibility provided by statically-typed languages. More g); specifically, we develop F# type providers for accessing data in structured data formats such as CSV, XML and JSON, which are Although the code is shorter, writing it is not easier than writing frequently used by open government data initiatives as well as other the original statically-typed version. Even though some JavaScript web-based data sources. editors provide auto-completion, they will fail to help us here, because they have no knowledge of the shape of the object obj. 2. Motivation: Accessing structured data So, the author will have to open the documentation and guess the available fields from the provided sample. Despite numerous schematization efforts, most data on the web is available without an explicit schema. At best, the documenta- Type providers. This paper presents the F# Data library that im- tion provides a number of typical requests and sample responses. plements type providers for accessing structured data formats such For simplicity, we demonstrate the problem using the OpenWeath- as XML, JSON and CSV. Using the JSON type provider, we can erMap service, which can be used to get the current weather for a write code with the same functionality in three lines, but with full obj given city2. The page documents the URL parameters and shows editor support including auto-complete on the object: one sample JSON response to illustrate the response structure. typeW = JsonProvider "http://weather.org/?q=Prague" let obj = W.GetSample() Statically-typed. In a statically typed functional language like F#, printfn "Lovely %f degrees!" obj.main.temp we could use a library for working with HTTP and parsing JSON to call the service and read the temperature. Here, the parsing On the first line, JsonProvider "..." invokes a type provider at library returns a value of a JsonValue data type and we use pattern compile-time with the URL as a sample. The type provider infers 3 matching to extract the value we need : the structure of the response from the sample and provides a type that has a statically known property main, returning an object with 1 In the US (http://data.gov) and in the UK (http://data.gov.uk). a property temp that provides the temperature as a number. 2 See “Current weather data”: http://openweathermap.org/current This gives us the best of both worlds – the simplicity of dynamic 3 We abbreviate the full URL: http://api.openweathermap.org/data/ typing with the usability, safety and associated tooling common in 2.5/weather?q=Prague&units=metric statically-typed languages. 1 2015/4/13 3. Background: Type providers let items = asArray(JsonValue.Parse(data)) for item in items do This paper presents a collection of type providers for integrating printf "%s " asString (getProp "name" item) structured data into the F# programming language. As outlined in Option.iter (printf "(%f)") the previous example, our key technical contribution is the algo- (Option.map asFloat (tryGetProp "age" item)) rithm that infers appropriate type from an example document and and the type providers that expose the type. In this section, we give The generated type Entity is erased to a type JsonValue, which a brief overview of the type provider mechanims and of related ap- represents any JSON value and is returned by the Parse method. proaches to integrating data into programming languages. The remaining properties are erased to calls to various operations of the type provider runtime such as asArray, getProp or asFloat 3.1 How type providers work that attempt to convert a JSON value into the required structure Documents in the JSON format consists of several possible kinds (and produce a run-time exception if this is not possible). of values. The OpenWeatherMap example in the introduction used The (hidden) type erasure process turns the static provided types only (nested) record and a numerical value. To demonstrate other into code that we might write without type providers. In partic- aspects, we look at a more complex example that also invloves ular, checked member names become unchecked strings. A type collections and strings: provider cannot remove all possibilities for a failure – indeed, an [ { "name": "Jan", "age": 25 }, exception still occurs if the input does not have the right format, { "name": "Alexander", "age": 3.5 }, but it simplifies writing code and removes most errors when a rep- { "name": "Tomas" } ] resentative sample is provided. Say we want to print the names of people in the list with an age if it is available. Assuming people.json contains the above sample 3.2 Type systems and data integration and data is a string value that contains another data set in the same The F# Data library connects two lines of research that have been format, we can use JsonProvider as follows: previously disconnected. The first is extending the type systems of type People = JsonProvider "people.json" programming languages and the second is inferring the structure of let items = People.Parse(data) real-world data sources. for item in items do The type provider mechanism has been introduced in F# [17, printf "%s " item.name 18] and used in areas such as semantic web [14]. The library Option.iter (printf "(%f)") item.age presented in this paper is the most widely used library of type providers and it is also novel in that it shows the programming In contrast to the earlier example, the example now uses a local file language theory behind a concrete type provider. people.json as a representative sample for the type inference, but then processes data (available at run-time) from another source. Extending the type systems. A number of systems integrate exter- Type providers. The notation JsonProvider "people.json" on nal data formats into a programming language. Those include XML the first line passes a static parameter to the type provider. Static [8, 16] and databases [4]. In both of these, the system either requires parameters are resolved at compile-time, so the file name has to the user to explicitly define the schema (using the host language) be a constant. The provider analyzes the sample and generates a or it has an ad-hoc extension that reads the schema (e.g. from a type that we name People. In F# editors, the type provider is also database). LINQ [10] is more general, but relies on code genera- executed at development-time and so the same provided types are tion when importing the schema. used in code completion. The work that is most similar to F# Data is the XML and The JsonProvider uses a type inference algorithm discussed SQL integration in C! [11]. It extends C# with types capable of below and infers the following types from the sample: representing structured data formats, but it does not infer the types type Entity from samples and it modifies the C# language (rather than using a = general purpose embedding mechanism). member name : string member Age option decimal Aside from type providers, a number of other advanced type : system features could be used to tackle the problem discussed in type People = this paper. The Ur [2] language has a rich system for working with member GetSample : unit ! Entity[] records; meta-programming [15], [5] and multi-stage programming member Parse : string ! Entity[] [19] could be used to generate code for the provided types.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us