Introduction

Part 1 of this series looked at how devices in the ‘Internet of Things’ can sense their surroundings and make sensor measurements available over the web in formats such as CSV, XML and JSON. The same formats are used to publish data from a variety of other sources. This article gives a few examples of these other sources.

Data From The BBC

The British Broadcasting Corporation is a public body established by Royal Charter. As such it has a number of obligations, among them the expectation that it will deliver to the public the benefit of emerging communication technologies and services. Part and parcel of this is a commitment to linked data - which, in practice, means that the BBC is aiming to provide machine-readable data on its radio and TV programmes via the web.

The gateway to all of this will be an API (Application Programmers Interface) called BBC Nitro. Currently (April 2015), registering to use Nitro is possible only for BBC employees - despite the BBC Developer Portal stating that the service would be opened up in 2014! Nevertheless, it is possible to get some idea of what using BBC data might be like by accessing the corporation’s earlier experiments with open data, some of which (as of April 2015) appear to be still available.

Two examples that I’ve found are an XML data feed giving the schedule for Radio 1 in England and a JSON data feed giving upcoming Sci-Fi programmes on TV. Try both of these out now. The screenshot below shows a portion of XML data from the first of them, as displayed by the Chrome browser.

Screenshot of XML data representing the BBC Radio 1 schedule

Another interesting BBC data feed is this one, which breaks down radioplay by artist across the BBC’s radio stations:

http://www.bbc.co.uk/programmes/music/artists/charts.json

(The same data can be retrieved as XML simply by replacing json with xml in the URL above.)

Earth & Environment Data

The Met Office

The Met Office is the UK’s national weather forecasting service. It is currently beta-testing a service called DataPoint, which it describes thus:

DataPoint is a way of accessing freely available Met Office data feeds in a format that is suitable for application developers. It is aimed at professionals, the scientific community and student or amateur developers, in fact anyone looking to re-use Met Office data within their own innovative applications.

DataPoint offers a wide range of useful meteorological data. For example, it provides a five-day forecast of temperature, wind speed & direction, precipitation and other variables for specific locations in the UK, either as a visualisation like that shown below or as raw data in XML or JSON formats.

Screenshot of a Met Office weather forecast

DataPoint also provides map layers in the PNG image format for both weather forecasts and actual observations. Forecast layers show cloud cover, rainfall, temperature and pressure as isobars. Observation layers show rainfall, lightning storms, and satellite images in the visible and IR regions of the spectrum. Layer retrieval is a two-stage process in which you must first request details of all the available layers, in either XML or JSON formats. This information can then be used to construct the specific URL of the desired layer. The example shown below is a composite of a visible-spectrum satellite image with layers showing forecasted rainfall and pressure.

Map overlays from Met Office DataPoint

One important thing to note about DataPoint is that users must register with the service in order to obtain an API key. This is a unique string of characters that identifies you as a legitimate user of the service. It is used for authentication purposes and to track your usage of the service. All requests made to DataPoint must include your API key.

API keys are actually a fairly common requirement for use of web services. ThingSpeak, discussed in Part 1, requires one. Many services will provide an API key for free but will limit the number of times that you can invoke the service free of charge; for example, weather forescasting service forecast.io will allow you to make up to 1,000 API calls per day for free but will charge you $0.0001 per call thereafter.

USGS Earthquake Hazards Program

The United States Geological Survey’s Earthquake Hazards Program is one of my favourite data source examples. Their website provides comprehensive real-time feeds of seismological data in a variety of different formats, as the screenshot below illustrates.

Screenshot of USGS earthquake data feeds

The Spreadsheet Format link on this page takes you to another page containing various CSV data feeds. The Atom Syndication and QuakeML links are for two different XML-based formats, the former being for consumption by RSS readers and the latter for professional geoscientists. The GeoJSON Summary link in the For Developers panel on the left is for a JSON-based format called GeoJSON.

For each format, feeds are grouped by time, covering the past hour, past day, past 7 days and past 30 days. In each of these groups there is an ‘all earthquakes’ feed plus separate feeds for different levels of severity, covering ‘significant’ earthquakes and those with magnitudes of 4.5 or more, 2.5 or more, 1.0 or more.

Screenshot of links for earthquake data feeds

The quantity of data that you obtain from these feeds will very much depend on which one you choose; for example, the feed for significant earthquakes occurring in the past hour will be empty most of the time, whereas the feed for all earthquakes from the last 30 days will typically give you many thousands of events each time that you access it.

A GeoJSON feed is a list of seismic events, each of which is represented as shown below. A glossary explains what the various data fields mean. (A few of them have been omitted in the interests of clarity.) This particular example is for the magnitude 7.4 quake that occurred near the Solomon Islands on 13 April 2014.

{
  "type": "Feature",
  "properties": {
    "mag": 7.4,
    "place": "111km S of Kirakira, Solomon Islands",
    "time": 1397392578710,
    "updated": 1397421536312,
    "tz": 660,
    "felt": null,
    "cdi": null,
    "mmi": 7.51,
    "alert": "green",
    "status": "reviewed",
    "tsunami": 1,
    "sig": 842,
    "net": "us",
    "nst": null,
    "dmin": 2.89,
    "rms": 1.06,
    "gap": 17,
    "magType": "mww",
    "type": "earthquake",
    "title": "M 7.4 - 111km S of Kirakira, Solomon Islands"
  },
  "geometry": {
    "type": "Point",
    "coordinates": [162.0692, -11.451, 35]
  },
  "id": "usc000piqj"
}

Open Data Initiatives

Data.gov.uk is at the heart of the UK government’s Transparency agenda and currently (April 2015) makes over 24,000 datasets available to the public. You can search for a dataset by keyword or conduct geographic searches based on postcode, latitude & longitude or a rectangular region dragged out on a map. You can also drill down via menus that classify datasets by license, theme or format. CSV and XML are well represented, but there are comparatively few JSON datasets currently available. Note that ‘open’ does not necessarily imply ‘easily machine readable’; some of the datasets are provided only as Excel spreadsheets, PDF files or Microsoft Word documents, for example - formats that can be much harder to process using software.

The screenshot below shows the most popular health-related CSV datasets available from the site. You can view this page yourself by visiting

http://data.gov.uk/data/search?theme-primary=Health&res_format=CSV

Screenshot of health-related CSV datasets on data.gov.uk

Independent organisations also exist to champion the development and use of open data, a key player being the Open Data Institute (ODI). In addition to promoting innovation at a national level, the ODI operates a number of regional nodes to support and coordinate activity locally, a good example being ODI Leeds.

Another example of how open data is having an impact at the local level is the Leeds Data Mill, promoted as “a place for organisations to share their open data to change the way we live, work and play in the city”. Leeds Data Mill’s small but growing collection includes datasets as varied as locations and number of available spaces in council car parks, footfall data for eight locations in the city centre, details of roadworks (via Leeds City Council’s Live Roadworks API) and records of pest control callouts.

Screenshot of pest control datasets available at Leeds Data Mill



Comments

comments powered by Disqus