Junction API

The Open Access Repository Junction “Discovery” tool has a suite of three APIs for interacting with the data:

  1. /api
  2. cgi/list/type, cgi/list/content, cgi/list/country, cgi/list/lang, cgi/list/org, cgi/list/net
  3. cgi/get_orgs, cgi/get_repos, cgi/get_nets

All the calls interact with the same dataset, and all calls can return data in a number of formats. This returned format is defined with the “format” parameter:

format

  • The format the data is returned in: Currently xml, text or json
  • [default is json]

Note – if using a machine-to-machine call (such as Perl’s LWP::UserAgent) you can also define the format in at “Accepts” header:

HTTP GET /api
Accepts: application/xml, q=1.0

where the “Accepts” types are:

application/xml
application/json
plain/text

The scripts

/api

This is the primary point of interaction for the dataset. /api is a single point that can respond in a number of ways, however it will always try to find the repositories that relate to organisations.

/api can be given a specific locus to deduce repositories, be it an IP address or an ID code to specify the organisation, or it will deduce a locus based on the calling client. The script can be asked to restrict the returned list by repository type or accepted content.

/cgi/list/*

This suite of functions lists what OA-RJ knows for particular categories.

/cgi/get_*

A suite of three calls developed for internal AJAX functions within opendepot.org, but made available for general use.

The scripts in detail

/api

The api works by finding all the organsiations within a locus, and then the repositories that match those organisations.

  • If an org_id is given, then we simply return the data for that organisation (or those organisations, if multiple org parameters are given.)
  • If the search is based on IP, then multiple organisations can be associated with the network range, each of which can have multiple appropriate repositories.
  • If neither is defined, then api will so an IP-based search, based on the IP of the client calling the script

use:

http://opendepot.org/api?format=xml&ip=130.128/13&type=2&content=1&nest=1

Parameters

ip – The IP(s) to find repositories for.

  • If unspecified, the IP is set to that of the system making the call.
  • Can have multiple ip parameters (ip=152.78.118.51&ip=129.215.169.88)
  • Can have CIDR ranges 129.215/16 (note: it will tell if you get the CIDR wrong)
  • Does not do ranges (129.215.169.0-129.215.169.128)

Note that IP searching is….. complicated:

  1. First try to find a network that contains [any of] the IP[s] given.
  2. If that fails, find any networks that are contained by the IP given (basically for CIDR ranges)
  3. Finally, look for networks that overlap the upper or lower bounds of the IP range given.

org – The organisation to find repositories for.

  • The value needs to be the OA-RJ code for that organisation (which can be found using the cgi/list/org call)

NOTE: IP addresses and org_id searches are AND’d, not OR’d

  • a search for ip=129.215/16&org=16410 will produce 0 (zero) results: Org_id 16410 is MIT, so you are asking for any MIT repositories associated with the Edinburgh University network.

type

  • The type of repository.
  • For more details on type see the documentation below on /cgi/list/type
  • Note: Our experience is that everything is in types 2 to 5, and other repository types are not represented.

content

  • The type of content the repository takes in.
  • For more details on content see the documentation below on /cgi/list/content
  • Note: Our experience is that no repositories are registered as being of restricting their content type preprints only (code 2) or postprints only (code 1)

nest

  • Whether the data comes back nested by network & organisation, or is flattened down to a list of repositories, with the organisation & network data merged in.
  • [Default is flat format]

Nested verses flat data

Given that the script first finds the organisations for a locus, and then the repositories that match the organisations, the initial data is actually hierarchical – with repositories nested within organisations.

The data can be return in this nested format, however the default is to reformat the data and to return a list of repositories, each of which contains data stating the organisation & network association for that repository.

Data returns.

Data is returned in either JSON, Text, or XML formats, however the underlying structure is consistent.

Core format

The core format for a successful search is:

JSON XML

{
“to”: “http://opendepot.org/api?nested=1”,
“status”: “ok”,
“message”: {
“nets”: [
……
]
}
}

<opt>
<to>http://opendepot.org/api?nested=1</to&gt;
<status>ok</status>
<message>
<nets>
…….
</nets>
</message>
</opt>

The core format for an un-successful search is:

JSON XML

{
“to”:”http://opendepot.org/api?ip=18.1.1.1.1&#8243;,
“status”:”fail”,
“message”:{
“repos”:[],
“warn”:”Invalid IP range (18.1.1.1.1)”
}
}

<opt>
<message>
<warn>Invalid IP range (18.1.1.1.1)……</warn>
</message>
<status>fail</status>
<to>http://opendepot.org/api?format=xml&ip=18.1.1.1.1</to&gt;
</opt>

{
“to”:”http://opendepot.org/api?format=xml&ip=129.215.169%2f16&#8243;,
“status”:”fail”,
“message”:{
“repos”:[],
“Warn”:”Unable to deduce IP range (129.215.169/16)”
}
}

<opt>
<message>
<warn>Unable to deduce IP range (129.215.169/16</warn>
</message>
<status>fail</status>
<to>http://opendepot.org/api?format=xml&ip=129.215.169%2f16</to&gt;
</opt>

A successful search, with zero results will return:

JSON XML

{
“to”:”http://opendepot.org/api?ip=130%2F13&#8243;,
“status”:”ok”,
“message”:{
“repos”:[],
“warn”:”No results Found”
}
}

<opt>
<message>
<warn>No results Found</warn>
</message>
<status>ok</status>
<to>http://opendepot.org/api?format=xml&ip=130%2F13</to&gt;
</opt>

Results

Flat format

JSON XML

…..
“message”: {
“repos”: [
{ ….. },
{
“whois_id”: “5918”,
“org_name”: “University of Edinburgh”,
“content”: [
“Research papers (pre- and postprints)”,
“Conference and workshop papers”,
“Unpublished reports and working papers”,
“Books chapters and sections”,
“Datasets”
],
“url”: “http://www.era.lib.ed.ac.uk/“,
“repo_id”: “12066”,
“remarks”: “Partners: SHERPA”,
“acronym”: “ERA”,
“language”: [
“English”
],
“name”: “Edinburgh Research Archive”,
“org_url”: “http://www.ed.ac.uk“,
“description”: “University repository across all disciplines. Registered users can set up email alerts to notify them of newly added relevant content.”,
“inetnum”: “129.215.0.0 – 129.215.255.255”,
“org_id”: “16480”,
“comment”: “ERA is a digital repository of research produced at The University of Edinburgh. Here we present a selection of our best research including full-text digital Theses and Dissertations, book chapters, working papers, technical reports, journal pre-prints and peer-reviewed journal reprints.”,
“namepreferred”: 1,
“whois_name”: null
},
{ …… }
]
}
}

<message>
<repos>
</repos>
<repos>
<name>Edinburgh Research Archive</name>
<acronym>ERA</acronym>
<comment>ERA is a digital repository of research produced at …….</comment>
<content>Research papers (pre- and postprints)</content>
<content>Conference and workshop papers</content>
<content>Unpublished reports and working papers</content>
<content>Books   chapters and sections</content>
<content>Datasets</content>
<description>University repository across all disciplines…..</description>
<inetnum>129.215.0.0 – 129.215.255.255</inetnum>
<language>English</language>
<namepreferred>1</namepreferred>
<org_id>16480</org_id>
<org_name>University of Edinburgh</org_name>
<org_url>http://www.ed.ac.uk</org_url>
<remarks>Partners: SHERPA</remarks>
<repo_id>12066</repo_id>
<url>http://www.era.lib.ed.ac.uk/</url>
<whois_id>5918</whois_id>
<whois_name></whois_name>
</repos>
<repos>
</repos>
</repos>
</message>

Nested format

JSON XML
“message”: {
“nets”: [
{
“orgs”: [
{
“org_id”: “16480”,
“repos”: [
{
“language”: [
“English”
],
“acronym”: “ERA”,
“name”: “Edinburgh Research Archive”,
“content”: [
“Research papers (pre- and postprints)”,
“Conference and workshop papers”,
“Unpublished reports and working papers”,
“Books   chapters and sections”,
“Datasets”
],
“description”: “University repository across all disciplines……”,
“comment”: “ERA is a digital repository of research ……”,
“url”: “http://www.era.lib.ed.ac.uk/“,
“repo_id”: “12066”,
“namepreferred”: 1,
“remarks”: “Partners: SHERPA”
},
{},
],
“org_name”: “University of Edinburgh”,
“org_url”: “http://www.ed.ac.uk
}
],
“whois_id”: “5918”,
“inetnum”: “129.215.0.0 – 129.215.255.255”,
“whois_name”: “Edinburgh University”
}
]
}

<message>
<nets>
<inetnum>129.215.0.0 – 129.215.255.255</inetnum>
<orgs>
<org_id>17100</org_id>
<org_name>Science and Technology Facilities Council</org_name>
<org_url>http://www.ngs.ac.uk/</org_url>
<repos>
</repos>
</orgs>
<orgs>
<org_id>16480</org_id>
<org_name>University of Edinburgh</org_name>
<org_url>http://www.ed.ac.uk</org_url>
<repos>
<name>Edinburgh Research Archive</name>
<acronym>ERA</acronym>
<comment>ERA is a digital repository of research…..</comment>
<content>Research papers (pre- and postprints)</content>
<content>Conference and workshop papers</content>
<content>Unpublished reports and working papers</content>
<content>Books   chapters and sections</content>
<content>Datasets</content>
<description>University repository across all disciplines…..</description>
<language>English</language>
<namepreferred>1</namepreferred>
<remarks>Partners: SHERPA</remarks>
<repo_id>12066</repo_id>
<url>http://www.era.lib.ed.ac.uk/</url>
</repos>
<repos>
</repos>
</orgs>
<whois_id>5918</whois_id>
<whois_name>Edinburgh University</whois_name>
</nets>
</message>

DTD

For “flat” data:

<!ELEMENT opt (to, status, message) >
<!ELEMENT message ([repos|warn]) >
<!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?,
                 comment?, url?, sword?, content*, types*, language*, whois_id, inetnum?,
                 whois_name?, org_id, org_name?, org_url? ) >

<!ELEMENT status  ('ok'|'fail') >

<!ELEMENT inetnum  (inet-inet) >

<!ELEMENT to       ("url") >
<!ELEMENT org_url  ("url") >
<!ELEMENT url      ("url") >
<!ELEMENT sword    ("url") >

<!ELEMENT inet     ("IP Number") >

<!ELEMENT count    ("digit") >
<!ELEMENT whois_id ("digit") >
<!ELEMENT org_id   ("digit") >
<!ELEMENT repo_id  ("digit") >

<!ELEMENT warn        (#PCDATA) >
<!ELEMENT whois_name  (#PCDATA) >
<!ELEMENT org_name    (#PCDATA) >
<!ELEMENT name        (#PCDATA) >
<!ELEMENT acronym     (#PCDATA) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT remarks     (#PCDATA) >
<!ELEMENT comment     (#PCDATA) >
<!ELEMENT content     (#PCDATA) >
<!ELEMENT language    (#PCDATA) >
<!ELEMENT types       (#PCDATA) >
<!ELEMENT namepreferred : [1|0] >

For nested data:

<!ELEMENT opt (to, status, message) >
<!ELEMENT message ([nets|warn]) >
<!ELEMENT nets (whois_id, inetnum?, whois_name?, orgs?) >
<!ELEMENT orgs (org_id, org_name?, org_url?, repos?, count?) >
<!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?,
                 comment?, url?, sword?, content*, types*, language*) >

<!ELEMENT status  ('ok'|'fail') >

<!ELEMENT inetnum  (inet-inet) >

<!ELEMENT to       ("url") >
<!ELEMENT org_url  ("url") >
<!ELEMENT url      ("url") >
<!ELEMENT sword    ("url") >

<!ELEMENT inet     ("IP Number") >

<!ELEMENT count    ("digit") >
<!ELEMENT whois_id ("digit") >
<!ELEMENT org_id   ("digit") >
<!ELEMENT repo_id  ("digit") >

<!ELEMENT warn        (#PCDATA) >
<!ELEMENT whois_name  (#PCDATA) >
<!ELEMENT org_name    (#PCDATA) >
<!ELEMENT name        (#PCDATA) >
<!ELEMENT acronym     (#PCDATA) >
<!ELEMENT description (#PCDATA) >
<!ELEMENT remarks     (#PCDATA) >
<!ELEMENT comment     (#PCDATA) >
<!ELEMENT content     (#PCDATA) >
<!ELEMENT language    (#PCDATA) >
<!ELEMENT types       (#PCDATA) >
<!ELEMENT namepreferred : [1|0] >

/cgi/list/*

This suite of six calls returns all items of that type, as known to the database.

If a parameter “full” is supplied, with a value, then the list will return not just the items of that type, but also all repositories that are associated with that data item (ie, all repositories noted as having a language of “fr” will be listed under the French item.)

/cgi/list/type

This lists the type (or classification) of repository.

Code Meaning
1 Undetermined – Repositories whose type has not yet been assessed
2 Institutional (Institutional or departmental repositories)
3 Disciplinary (Cross-institutional subject repositories)
4 Aggregating (Archives aggregating data from several subsidiary repositories)
5 Governmental (Repositories for governmental data)
6 Subject (Research Cross-Institutional)
7 Journal (e-Journal/Publication)
8 Thesis
9 Database (Database/A&I Index)
10 Learning (Learning and Teaching Objects)
11 Other
12 Demonstration

When a repository type is needed by /api, it is the code number you need.

/cgi/list/content

This lists the type of content that repositories accept

Code Meaning
1 Research papers (pre- and postprints)
2 Research papers (preprints only)
3 Research papers (postprints only)
4 Bibliographic references
5 Conference and workshop papers
6 Theses and dissertations
7 Unpublished reports and working papers
8 Books & chapters and sections
9 Datasets
10 Learning Objects
11 Multimedia and audio-visual materials
12 Software
13 Patents
14 Other special item types

When a repository type is needed by /api, it is the code number you need.

cgi/list/country

This lists all the countries known

cgi/list/lang

List lists all the languages known

cgi/list/org

Lists all the known organisations.

When an “org” parameter is passed to the /api, it is the code number you need.

cgi/list/net

Lists all the known networks.

Note that this call is slightly different from the other 5, in that the “full” parameter will return a list of networks, and the orgs that are associated with them (not repositories!)

/cgi/get_*

This suite of three calls was originally built to serve AJAX calls within the OpenDepot.org interface.

The three calls use the same set of parameters:

  • q is the mandatory query field – the text that the search is based on
  • field is an optional parameter, and can be used to restrict which field you with to search in
  • format defines how the data is to be returned:
    • prototype is the format used by EPrints, and other Scriptilicious-like systems
    • json is the base JQuery format
    • text is a screen-readable text format that can be parsed by some means
    • [json is the default]

get_orgs & get_repos

This lists all organisations or repositories (as appropriate) that match query in either the fields name or url

cgi/get_orgs?q=ed.ac
cgi/get_orgs?q=ed.ac&field=url
cgi/get_orgs?q=edinb
cgi/get_orgs?q=edinb&field=name
cgi/get_repos?q=ed.ac
cgi/get_repos?q=ed.ac&field=url
cgi/get_repos?q=edinb&format=text
cgi/get_repos?q=edinb&field=name&format=xml

get_nets

This lists all networks that match query in either the fields name or ip

cgi/get_nets?q=129.215
cgi/get_nets?q=129.215&field=ip
cgi/get_nets?q=edinb
cgi/get_nets?q=edinb&field=name

Responses

  1. […] The script will return data in a variety of formats, using the same definitions as the general api (see the “format” section of the API documentation) […]

  2. […] Junction API Posted by: Theo | February 17, 2011 […]

  3. […] Open Access Repository Junction APIs (information on HE and FE organisations and their repositories, mainly but not exclusively UK) […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: