The Open Access Repository Junction “Discovery” tool has a suite of three APIs for interacting with the data:
- /api
- cgi/list/type, cgi/list/content, cgi/list/country, cgi/list/lang, cgi/list/org, cgi/list/net
- cgi/get_orgs, cgi/get_repos, cgi/get_nets
All the calls interact with the same dataset, and all calls can return data in a number of formats. This returned format is defined with the “format” parameter:
format
- The format the data is returned in: Currently xml, text or json
- [default is json]
Note – if using a machine-to-machine call (such as Perl’s LWP::UserAgent) you can also define the format in at “Accepts” header:
HTTP GET /api
Accepts: application/xml, q=1.0
where the “Accepts” types are:
application/xml
application/json
plain/text
The scripts
/api
This is the primary point of interaction for the dataset. /api is a single point that can respond in a number of ways, however it will always try to find the repositories that relate to organisations.
/api can be given a specific locus to deduce repositories, be it an IP address or an ID code to specify the organisation, or it will deduce a locus based on the calling client. The script can be asked to restrict the returned list by repository type or accepted content.
/cgi/list/*
This suite of functions lists what OA-RJ knows for particular categories.
/cgi/get_*
A suite of three calls developed for internal AJAX functions within opendepot.org, but made available for general use.
The scripts in detail
/api
The api works by finding all the organsiations within a locus, and then the repositories that match those organisations.
- If an org_id is given, then we simply return the data for that organisation (or those organisations, if multiple org parameters are given.)
- If the search is based on IP, then multiple organisations can be associated with the network range, each of which can have multiple appropriate repositories.
- If neither is defined, then api will so an IP-based search, based on the IP of the client calling the script
use:
http://opendepot.org/api?format=xml&ip=130.128/13&type=2&content=1&nest=1
Parameters
ip – The IP(s) to find repositories for.
- If unspecified, the IP is set to that of the system making the call.
- Can have multiple ip parameters (
ip=152.78.118.51&ip=129.215.169.88
) - Can have CIDR ranges
129.215/16
(note: it will tell if you get the CIDR wrong) - Does not do ranges (
129.215.169.0-129.215.169.128
)
Note that IP searching is….. complicated:
- First try to find a network that contains [any of] the IP[s] given.
- If that fails, find any networks that are contained by the IP given (basically for CIDR ranges)
- Finally, look for networks that overlap the upper or lower bounds of the IP range given.
org – The organisation to find repositories for.
- The value needs to be the OA-RJ code for that organisation (which can be found using the cgi/list/org call)
NOTE: IP addresses and org_id searches are AND’d, not OR’d
- a search for ip=129.215/16&org=16410 will produce 0 (zero) results: Org_id 16410 is MIT, so you are asking for any MIT repositories associated with the Edinburgh University network.
type
- The type of repository.
- For more details on type see the documentation below on /cgi/list/type
- Note: Our experience is that everything is in types 2 to 5, and other repository types are not represented.
content
- The type of content the repository takes in.
- For more details on content see the documentation below on /cgi/list/content
- Note: Our experience is that no repositories are registered as being of restricting their content type preprints only (code 2) or postprints only (code 1)
nest
- Whether the data comes back nested by network & organisation, or is flattened down to a list of repositories, with the organisation & network data merged in.
- [Default is flat format]
Nested verses flat data
Given that the script first finds the organisations for a locus, and then the repositories that match the organisations, the initial data is actually hierarchical – with repositories nested within organisations.
The data can be return in this nested format, however the default is to reformat the data and to return a list of repositories, each of which contains data stating the organisation & network association for that repository.
Data returns.
Data is returned in either JSON, Text, or XML formats, however the underlying structure is consistent.
Core format
The core format for a successful search is:
JSON | XML |
---|---|
|
|
The core format for an un-successful search is:
JSON | XML |
---|---|
|
|
|
|
A successful search, with zero results will return:
JSON | XML |
---|---|
|
|
Results
Flat format
JSON | XML |
---|---|
|
|
Nested format
JSON | XML |
---|---|
“message”: { “nets”: [ { “orgs”: [ { “org_id”: “16480”, “repos”: [ { “language”: [ “English” ], “acronym”: “ERA”, “name”: “Edinburgh Research Archive”, “content”: [ “Research papers (pre- and postprints)”, “Conference and workshop papers”, “Unpublished reports and working papers”, “Books chapters and sections”, “Datasets” ], “description”: “University repository across all disciplines……”, “comment”: “ERA is a digital repository of research ……”, “url”: “http://www.era.lib.ed.ac.uk/“, “repo_id”: “12066”, “namepreferred”: 1, “remarks”: “Partners: SHERPA” }, {}, ], “org_name”: “University of Edinburgh”, “org_url”: “http://www.ed.ac.uk” } ], “whois_id”: “5918”, “inetnum”: “129.215.0.0 – 129.215.255.255”, “whois_name”: “Edinburgh University” } ] } |
|
DTD
For “flat” data:
<!ELEMENT opt (to, status, message) > <!ELEMENT message ([repos|warn]) > <!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?, comment?, url?, sword?, content*, types*, language*, whois_id, inetnum?, whois_name?, org_id, org_name?, org_url? ) > <!ELEMENT status ('ok'|'fail') > <!ELEMENT inetnum (inet-inet) > <!ELEMENT to ("url") > <!ELEMENT org_url ("url") > <!ELEMENT url ("url") > <!ELEMENT sword ("url") > <!ELEMENT inet ("IP Number") > <!ELEMENT count ("digit") > <!ELEMENT whois_id ("digit") > <!ELEMENT org_id ("digit") > <!ELEMENT repo_id ("digit") > <!ELEMENT warn (#PCDATA) > <!ELEMENT whois_name (#PCDATA) > <!ELEMENT org_name (#PCDATA) > <!ELEMENT name (#PCDATA) > <!ELEMENT acronym (#PCDATA) > <!ELEMENT description (#PCDATA) > <!ELEMENT remarks (#PCDATA) > <!ELEMENT comment (#PCDATA) > <!ELEMENT content (#PCDATA) > <!ELEMENT language (#PCDATA) > <!ELEMENT types (#PCDATA) > <!ELEMENT namepreferred : [1|0] >
For nested data:
<!ELEMENT opt (to, status, message) > <!ELEMENT message ([nets|warn]) > <!ELEMENT nets (whois_id, inetnum?, whois_name?, orgs?) > <!ELEMENT orgs (org_id, org_name?, org_url?, repos?, count?) > <!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?, comment?, url?, sword?, content*, types*, language*) > <!ELEMENT status ('ok'|'fail') > <!ELEMENT inetnum (inet-inet) > <!ELEMENT to ("url") > <!ELEMENT org_url ("url") > <!ELEMENT url ("url") > <!ELEMENT sword ("url") > <!ELEMENT inet ("IP Number") > <!ELEMENT count ("digit") > <!ELEMENT whois_id ("digit") > <!ELEMENT org_id ("digit") > <!ELEMENT repo_id ("digit") > <!ELEMENT warn (#PCDATA) > <!ELEMENT whois_name (#PCDATA) > <!ELEMENT org_name (#PCDATA) > <!ELEMENT name (#PCDATA) > <!ELEMENT acronym (#PCDATA) > <!ELEMENT description (#PCDATA) > <!ELEMENT remarks (#PCDATA) > <!ELEMENT comment (#PCDATA) > <!ELEMENT content (#PCDATA) > <!ELEMENT language (#PCDATA) > <!ELEMENT types (#PCDATA) > <!ELEMENT namepreferred : [1|0] >
/cgi/list/*
This suite of six calls returns all items of that type, as known to the database.
If a parameter “full” is supplied, with a value, then the list will return not just the items of that type, but also all repositories that are associated with that data item (ie, all repositories noted as having a language of “fr” will be listed under the French item.)
/cgi/list/type
This lists the type (or classification) of repository.
Code | Meaning |
---|---|
1 | Undetermined – Repositories whose type has not yet been assessed |
2 | Institutional (Institutional or departmental repositories) |
3 | Disciplinary (Cross-institutional subject repositories) |
4 | Aggregating (Archives aggregating data from several subsidiary repositories) |
5 | Governmental (Repositories for governmental data) |
6 | Subject (Research Cross-Institutional) |
7 | Journal (e-Journal/Publication) |
8 | Thesis |
9 | Database (Database/A&I Index) |
10 | Learning (Learning and Teaching Objects) |
11 | Other |
12 | Demonstration |
When a repository type is needed by /api, it is the code number you need.
/cgi/list/content
This lists the type of content that repositories accept
Code | Meaning |
---|---|
1 | Research papers (pre- and postprints) |
2 | Research papers (preprints only) |
3 | Research papers (postprints only) |
4 | Bibliographic references |
5 | Conference and workshop papers |
6 | Theses and dissertations |
7 | Unpublished reports and working papers |
8 | Books & chapters and sections |
9 | Datasets |
10 | Learning Objects |
11 | Multimedia and audio-visual materials |
12 | Software |
13 | Patents |
14 | Other special item types |
When a repository type is needed by /api, it is the code number you need.
cgi/list/country
This lists all the countries known
cgi/list/lang
List lists all the languages known
cgi/list/org
Lists all the known organisations.
When an “org” parameter is passed to the /api, it is the code number you need.
cgi/list/net
Lists all the known networks.
Note that this call is slightly different from the other 5, in that the “full” parameter will return a list of networks, and the orgs that are associated with them (not repositories!)
/cgi/get_*
This suite of three calls was originally built to serve AJAX calls within the OpenDepot.org interface.
The three calls use the same set of parameters:
- q is the mandatory query field – the text that the search is based on
- field is an optional parameter, and can be used to restrict which field you with to search in
- format defines how the data is to be returned:
- prototype is the format used by EPrints, and other Scriptilicious-like systems
- json is the base JQuery format
- text is a screen-readable text format that can be parsed by some means
- [json is the default]
get_orgs & get_repos
This lists all organisations or repositories (as appropriate) that match query in either the fields name or url
cgi/get_orgs?q=ed.ac
cgi/get_orgs?q=ed.ac&field=url
cgi/get_orgs?q=edinb
cgi/get_orgs?q=edinb&field=name
cgi/get_repos?q=ed.ac
cgi/get_repos?q=ed.ac&field=url
cgi/get_repos?q=edinb&format=text
cgi/get_repos?q=edinb&field=name&format=xml
get_nets
This lists all networks that match query in either the fields name or ip
cgi/get_nets?q=129.215
cgi/get_nets?q=129.215&field=ip
cgi/get_nets?q=edinb
cgi/get_nets?q=edinb&field=name
[…] The script will return data in a variety of formats, using the same definitions as the general api (see the “format” section of the API documentation) […]
By: Repository Junction List APIs « OA-RJ project blog on November 2, 2010
at 9:00 am
[…] Junction API Posted by: Theo | February 17, 2011 […]
By: RSP Winter School presentation « OA-RJ project blog on February 17, 2011
at 10:59 am
[…] Open Access Repository Junction APIs (information on HE and FE organisations and their repositories, mainly but not exclusively UK) […]
By: #hhhglas Preparations » Nicola Osborne on March 24, 2011
at 6:50 pm