Posted by: Ian | December 14, 2011

Introducing the new APIs

Its been a long time coming (OK, I’ve been distracted by other things too), however the new APIs using a new dataset, are nearly ready.

The new calls return far more data, and in a consistent way!

The new dataset is a better merging of OpenDOAR and ROAR (and it updates from those “Authoritative” sources on weekly), and adds in records from the UK Access Management Federation (harvesting daily) and the webometrics list of 12,000 universities (http://www.webometrics.info/ – harvested on an ad-hoc basis)

The OA Organisation Identification Service (as we are now starting to call it) is now predominantly a list of [academic] organisations, with details of networks and repositories associated with them…. it is no longer a list of repositories and their organisations (as ROAR & OpenDOAR are)

How big is it?

How does 14,000 Organisations, 2,700 repositories, and 6,700 networks grab you? There are 17,00 URLs and 33,000 names for these objects…. its big! …. and growing bigger all the time!

If you can find me more good sources of Repositories or Academic Organisations, I’ll see about including them too!

What data is returned?


When you get data on an organisation, you get:

org_id The ID for the org (can be used in other API calls)
lat The Latitude held for the organisation
long The Longitude held for the organisation
identites A list of names (and URLs) for the organisation (see below for details)
Data is also pulled in from the identities data:
… the following are taken from the first identity record:

org_name
org_npri
org_acronym
org_npref
org_iri

…. and these are taken from the first matching (else non-matching) URL for the first identity:

org_url
org_upri
org_checked_good
org_date_checked

When you get data on a repository, you get:

repo_id The ID for the repository (can be used in other API calls)
lat The Latitude held for the repository
long The Longitude held for the repositiry
postaddress The address the repository is located at
countrycode The country the repository is in
oaibaseurl The URL for OAI harvesting
softwarename What software it uses (EPrints, DSpace, flubber, etc)
softwareversion What version of the software
description The main description for the repository
comment A list of additional comments for the repositories
types A list of repository types the repository is (institutional, data, etc)
content A list of content types the repository accepts (Pre-prints, data, etc)
external_ids A list of external ids [OpenDOAR_123, etc]
language A list of languages used in the repository interface
sword A list of servicedocument locations for the repository
identites A list of names (and URLs) for the organisation (see below for details)
… the following are taken from the first identity record:

repo_name
repo_npri
repo_acronym
repo_npref
repo_iri
…. and these are taken from the first matching (else non-matching) URL for the first identity:

repo_url
repo_upri
repo_checked_good
repo_date_checked

When you get data on a network, you get:

net_id The ID for the network (can be used in other API calls)
inetnum The IP range for the network (123.234.0.0-123.234.63.255)
dec_lower The first IP number of the range (123.234.0.0, from above)
dec_upper The last IP number of the range (123.234.63.255, from above)
identites A list of name(s) for the network (see below for details) – there are no URLS, obviously
… the following are taken from the first identity record:

net_name
net_npri
net_acronym
net_npref
net_iri

identities

Each entry in the array is a name for the object, with whichever name is defined as “Primary” at the start of the list.

Each identity object contains the following keys (if they exist in the database):

name The name of the object (‘Poppleton Univeristy’, ‘Plink-Plonk Repository’, etc)
acronym Any acronym the object may be known as (‘PU’, ‘PPR’, etc)
npref A true/false flag that indicates which is the preferred term.
(Absent means true, not false…. or “There is no statement that the name is not the preferred term” )
pri A true/false flag that indicates if the name is marked as Primary.Again, this flag in not always defined, as there may be only one option, or there may be know definite name that is the primary name.
iri The Open Linked-Data iri to get the linked-data record
nid The database ID for the name
urls A sub-element containing URL data for the object, as associated with the particular name.

urls

In the database, there is an association between names and URLs. This is to enable objects to have multi-lingual names, and appropriate urls for each language (eg: Ukranian, Russian, and English)

The urls element contains two keys: “matching” and “non-matching”, both of which are lists on url objects:

'urls' => {
              'matching' => [
                                {....},
                                {....}
                               ],
          'non_matching' => [
                                {....},
                                {....}
                               ]
             }

If a URL is flagged as Primary, it is placed at the front of the appropriate list

Within each url object, the following data is returned:

url The actual URL
pri Whether the URL is marked as a primnary one
live A true/false flag to indicate if the URL returns [a non-error] web page
date The date that the URL was last checked.
Note that no history is kept of the alive/not-alive checking.
Hosts that are alive are re-checked weekly, hosts that are not flagged as alive are checked on a daily basis
uid The database ID for the URL

Comprehensive enough? want more? speak to me….

Advertisements

Responses

  1. […] The repos sub-elements are, in this situation, listed as described in this post . […]

  2. […] For all other returns, the data is a list of data records (as per the desciption here) […]

  3. […] The data object returned is a set of net objects (indexed by net_id), within which is a list of org objects associated with that network. Within each org object is a list of repo objects. All objects conform to the specification here. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: