In the two competitions held within the LinkedUp project, there were several interesting applications that were submitted. Apart from critically reviewing these applications for their ingenuity and usefulness, we also analysed them from an ethical perspective. What do we mean by this? The linked-data applications were scrutinised for their copyright and privacy compliance. In this blog post we describe our findings primarily from copyright perspective, more specifically on an important and often neglected topic called ‘attribution’. Our analysis showed that developers either paid little or no attention to attribution, not necessarily deliberately but more out of lack of awareness on this subject. In this blog, we first explain what we mean by attribution, then describe its components, its usefulness to developers and finally we highlight what developers can and should do to attribute their sources properly.
While Linked Data applications were submitted either as Web or mobile software systems, these applications exploited and made use of one or several external data sources. These data sources were of multiple media types, for example, some images were art works from museums, videos of course lectures, map data, medical data from public databases. However, making use of such multi-media external data sources often requires careful consideration to the legal constraints that are attached to each data. Before we discuss specific issues, we must understand the legal concepts that underpin them.
Attribution – Author’s right to be credited
Copyright is a form of protection provided to the authors of ‘original works of authorship’ including literary, dramatic, musical, artistic, and certain other intellectual works, both published and unpublished (see The United States Patent and Trademark Office, General Information Concerning Patents). Attribution is the ‘the act of establishing a particular person as the creator of a work of art’. An attribution statement identifies the name of the creator (among other details) acknowledging the source ‘appropriately’ and it is attached to a software application. When attribution statements are identified and accessed easily, it is more likely that others may want to reuse the data source.
Attribution is very similar to how research papers are cited, where authors/creators are given their due credit, especially when their creation involves time and resources. Thus, attribution is not just an author’s right but it is also the ‘right’ thing to do; software engineers should acknowledge their data sources properly.
Several popular licences which cover open data such as Open Government License (OGL), Open Data Commons (ODC) and later versions of Creative Commons (CC) have attribution as one of their key conditions. Although the format of attribution is not necessarily identical to each other, they contain information fields that are similar. Bespoke licences may or may not require attribution and this information will be attached under the ‘terms and conditions’ or ‘copyright’ notice relating to the data source.
On the LinkedUp project competitions, we found the software systems made use of data which were of various types: text, photos, audios, videos and databases. This is important because the format of attribution depends on the type of media. For example, databases may be attributed differently when compared a photo. In addition to the varying according to the media type, attribution also varies according to the license attached to the data source. For example, the attribution of CC differs to OCL.
We now look at data fields that are relevant for proper attribution. Here, we combine guidance taken from CC and OGL to produce a uniform list of attribution fields.
Title – What is the name of the material? If the data source has a title, this should be included in the attribution statement. If a title is not provided, there is no obligation to fill this field.
Author – Who owns the material? This field captures the name of author or authors of the data source. Sometimes, the author/licensor may want you to give credit to some other entity, like a company or pseudonym. For public sector data, released under OGL in the UK, this field refers to the department/institution which produced the data. In some exceptional cases, the licensor or author may not want to be attributed at all. In any case, the attribution requirements specified by author should be met.
Source – Where can I find it? Provide access details for the data source, so others can also use. This is usually a URL or hyperlink to where the data resides.
Year – When was it published? The year of publishing the data source, this is particularly important when attributing data sources from public sector organisations in the UK.
License - How can I use it? Make note of the type of license that is attached to the use of the data source, along with any additional information included by the author/licensor. It is also recommended to provide a link to the full text of the license. If a data source comes with any copyright notices, then they should also be attached. Here, a notice refers to the disclaimer of warranties; or a notice of previous modifications which may be quite important to potential users of the data source. Regarding modifications, it is important to record any modifications you may have carried out on the data source and cite it accordingly.
Now that we described the most important attribution fields, we now look at how they may be used in actually making an attribution statement. While content of attribution statements generally do not vary, their arrangement and format can vary depending on the media and the licence used. In this section we provide examples of attribution format from CC, OGL and ODC licensing schemes which cover most of open data sources.
CC uses a straightforward attribution statement, in an abstracted form, it has the following format:
“<Title with source URL>”by <Author, linked to profile page> is licensed under <license type linked to license deed>
An example of this attribution for a photo is shown in the example below.
Creative Commons 10th Birthday Celebration San Francisco” by tvol is licensed under CC BY 2.0
This is a proper attribution because it has the following attribution fields:
Title: “Creative Commons 10th Birthday Celebration San Francisco”
Author: “tvol” – linked to his profile page
Source: “Creative Commons 10th Birthday Celebration San Francisco” – linked to original Flickr page
License: “CC BY 2.0” – linked to license deed
Modified or derived data
It may sometimes be necessary to modify the original work to create new derivatives. In such cases, the nature of the modifications should be explicitly stated when making an attribution. A suggested format for attributions covering derived or modified work would be:
This work, “<Title of modified work with source URL>”, is a derivative of “<Title of original work with source URL>” by <Author of original work, linked to profile page>, used under <original license type linked to license deed>. “<Title of modified work with source URL>” is licensed under <license type linked to license deed> by <Author of modified work linked to profile page>
Given below is an example of an attribution statement for a derived work of the earlier example:
This work, "90fied", is a derivative of "Creative Commons 10th Birthday Celebration San Francisco" by tvol, used under CC BY. "90fied" is licensed under CC BY by Alice and Bob.
If you note, this attribution contains fields for both the original work and the derived work.
Attribution for multiple sources
When an application uses multiple data sources which are licensed under heterogeneous licensing schemes, then the each part of the application must individually attribute to the original work and its associated license as shown below. Assume a software system contains two sub systems 1 and 2 which use two separate data sources then their attribution statement could be:
As shown above, each sub-system individually attributes the data source and its associated licenses. If however, a single software system uses multiple data sources, authored by the same entity and covered by the same license, then the format could be much simpler as shown below:
This <title of work> uses: <title of source -1 > which is licensed under <license type linked to license deed> by <Author linked to profile page> <title of source -2 > which is licensed under <license type linked to license deed> by <Author linked to profile page> . . <title of source -n> which is licensed under <license type linked to license deed> by <Author linked to profile page>
Datasets from public sector bodies in the UK
In the UK, thousands of public sector datasets have been released under the Open Government Licence (OGL). OGL is an open licensing model and tool for public sector bodies (in the UK) to license the re-use of their data easily. Use of datasets under the OGL is free and allows data to be used and re-used for commercial and/or non-commercial purposes. When using this data, an attribution statement using the following format must be attached:
<Title>, <author department/organisation>, <year of publication>, <applicable copyright or database right notice>. This information is licensed under the terms of the Open Government Licence [http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2].
If your application uses multiple sources with OGL and it is impractical to attribute them individually, then the following attribution statement can be used:
Contains public sector information licensed under the Open Government Licence v2.0.
However, OGL recommends maintaining a record or list of sources and attributions in another file or location, if it is not practical to include these prominently within your product.
Attribution within media
Although, Google automatically generates such attribution information for its mapping service, users of the service are discouraged from disabling this feature. Others such as Fouresquare recommend a similar approach when using their data in mobile applications and it might be a good practice to adopt.