The Open Access Repository Junction “Discovery” tool has a suite of three APIs for interacting with the data:
- cgi/list/type, cgi/list/content, cgi/list/country, cgi/list/lang, cgi/list/org, cgi/list/net
- cgi/get_orgs, cgi/get_repos, cgi/get_nets
All the calls interact with the same dataset, and all calls can return data in a number of formats. This returned format is defined with the “format” parameter:
- The format the data is returned in: Currently xml, text or json
- [default is json]
Note – if using a machine-to-machine call (such as Perl’s LWP::UserAgent) you can also define the format in at “Accepts” header:
HTTP GET /api
Accepts: application/xml, q=1.0
where the “Accepts” types are:
This is the primary point of interaction for the dataset. /api is a single point that can respond in a number of ways, however it will always try to find the repositories that relate to organisations.
/api can be given a specific locus to deduce repositories, be it an IP address or an ID code to specify the organisation, or it will deduce a locus based on the calling client. The script can be asked to restrict the returned list by repository type or accepted content.
This suite of functions lists what OA-RJ knows for particular categories.
A suite of three calls developed for internal AJAX functions within opendepot.org, but made available for general use.
The scripts in detail
The api works by finding all the organsiations within a locus, and then the repositories that match those organisations.
- If an org_id is given, then we simply return the data for that organisation (or those organisations, if multiple org parameters are given.)
- If the search is based on IP, then multiple organisations can be associated with the network range, each of which can have multiple appropriate repositories.
- If neither is defined, then api will so an IP-based search, based on the IP of the client calling the script
ip – The IP(s) to find repositories for.
- If unspecified, the IP is set to that of the system making the call.
- Can have multiple ip parameters (
- Can have CIDR ranges
129.215/16(note: it will tell if you get the CIDR wrong)
- Does not do ranges (
Note that IP searching is….. complicated:
- First try to find a network that contains [any of] the IP[s] given.
- If that fails, find any networks that are contained by the IP given (basically for CIDR ranges)
- Finally, look for networks that overlap the upper or lower bounds of the IP range given.
org – The organisation to find repositories for.
- The value needs to be the OA-RJ code for that organisation (which can be found using the cgi/list/org call)
NOTE: IP addresses and org_id searches are AND’d, not OR’d
- a search for ip=129.215/16&org=16410 will produce 0 (zero) results: Org_id 16410 is MIT, so you are asking for any MIT repositories associated with the Edinburgh University network.
- The type of repository.
- For more details on type see the documentation below on /cgi/list/type
- Note: Our experience is that everything is in types 2 to 5, and other repository types are not represented.
- The type of content the repository takes in.
- For more details on content see the documentation below on /cgi/list/content
- Note: Our experience is that no repositories are registered as being of restricting their content type preprints only (code 2) or postprints only (code 1)
- Whether the data comes back nested by network & organisation, or is flattened down to a list of repositories, with the organisation & network data merged in.
- [Default is flat format]
Nested verses flat data
Given that the script first finds the organisations for a locus, and then the repositories that match the organisations, the initial data is actually hierarchical – with repositories nested within organisations.
The data can be return in this nested format, however the default is to reformat the data and to return a list of repositories, each of which contains data stating the organisation & network association for that repository.
Data is returned in either JSON, Text, or XML formats, however the underlying structure is consistent.
The core format for a successful search is:
The core format for an un-successful search is:
A successful search, with zero results will return:
“name”: “Edinburgh Research Archive”,
“Research papers (pre- and postprints)”,
“Conference and workshop papers”,
“Unpublished reports and working papers”,
“Books chapters and sections”,
“description”: “University repository across all disciplines……”,
“comment”: “ERA is a digital repository of research ……”,
“remarks”: “Partners: SHERPA”
“org_name”: “University of Edinburgh”,
“inetnum”: “184.108.40.206 – 220.127.116.11”,
“whois_name”: “Edinburgh University”
For “flat” data:
<!ELEMENT opt (to, status, message) > <!ELEMENT message ([repos|warn]) > <!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?, comment?, url?, sword?, content*, types*, language*, whois_id, inetnum?, whois_name?, org_id, org_name?, org_url? ) > <!ELEMENT status ('ok'|'fail') > <!ELEMENT inetnum (inet-inet) > <!ELEMENT to ("url") > <!ELEMENT org_url ("url") > <!ELEMENT url ("url") > <!ELEMENT sword ("url") > <!ELEMENT inet ("IP Number") > <!ELEMENT count ("digit") > <!ELEMENT whois_id ("digit") > <!ELEMENT org_id ("digit") > <!ELEMENT repo_id ("digit") > <!ELEMENT warn (#PCDATA) > <!ELEMENT whois_name (#PCDATA) > <!ELEMENT org_name (#PCDATA) > <!ELEMENT name (#PCDATA) > <!ELEMENT acronym (#PCDATA) > <!ELEMENT description (#PCDATA) > <!ELEMENT remarks (#PCDATA) > <!ELEMENT comment (#PCDATA) > <!ELEMENT content (#PCDATA) > <!ELEMENT language (#PCDATA) > <!ELEMENT types (#PCDATA) > <!ELEMENT namepreferred : [1|0] >
For nested data:
<!ELEMENT opt (to, status, message) > <!ELEMENT message ([nets|warn]) > <!ELEMENT nets (whois_id, inetnum?, whois_name?, orgs?) > <!ELEMENT orgs (org_id, org_name?, org_url?, repos?, count?) > <!ELEMENT repos (repo_id, name?, acronym?, namepreferred?, description?, remarks?, comment?, url?, sword?, content*, types*, language*) > <!ELEMENT status ('ok'|'fail') > <!ELEMENT inetnum (inet-inet) > <!ELEMENT to ("url") > <!ELEMENT org_url ("url") > <!ELEMENT url ("url") > <!ELEMENT sword ("url") > <!ELEMENT inet ("IP Number") > <!ELEMENT count ("digit") > <!ELEMENT whois_id ("digit") > <!ELEMENT org_id ("digit") > <!ELEMENT repo_id ("digit") > <!ELEMENT warn (#PCDATA) > <!ELEMENT whois_name (#PCDATA) > <!ELEMENT org_name (#PCDATA) > <!ELEMENT name (#PCDATA) > <!ELEMENT acronym (#PCDATA) > <!ELEMENT description (#PCDATA) > <!ELEMENT remarks (#PCDATA) > <!ELEMENT comment (#PCDATA) > <!ELEMENT content (#PCDATA) > <!ELEMENT language (#PCDATA) > <!ELEMENT types (#PCDATA) > <!ELEMENT namepreferred : [1|0] >
This suite of six calls returns all items of that type, as known to the database.
If a parameter “full” is supplied, with a value, then the list will return not just the items of that type, but also all repositories that are associated with that data item (ie, all repositories noted as having a language of “fr” will be listed under the French item.)
This lists the type (or classification) of repository.
|1||Undetermined – Repositories whose type has not yet been assessed|
|2||Institutional (Institutional or departmental repositories)|
|3||Disciplinary (Cross-institutional subject repositories)|
|4||Aggregating (Archives aggregating data from several subsidiary repositories)|
|5||Governmental (Repositories for governmental data)|
|6||Subject (Research Cross-Institutional)|
|9||Database (Database/A&I Index)|
|10||Learning (Learning and Teaching Objects)|
When a repository type is needed by /api, it is the code number you need.
This lists the type of content that repositories accept
|1||Research papers (pre- and postprints)|
|2||Research papers (preprints only)|
|3||Research papers (postprints only)|
|5||Conference and workshop papers|
|6||Theses and dissertations|
|7||Unpublished reports and working papers|
|8||Books & chapters and sections|
|11||Multimedia and audio-visual materials|
|14||Other special item types|
When a repository type is needed by /api, it is the code number you need.
This lists all the countries known
List lists all the languages known
Lists all the known organisations.
When an “org” parameter is passed to the /api, it is the code number you need.
Lists all the known networks.
Note that this call is slightly different from the other 5, in that the “full” parameter will return a list of networks, and the orgs that are associated with them (not repositories!)
This suite of three calls was originally built to serve AJAX calls within the OpenDepot.org interface.
The three calls use the same set of parameters:
- q is the mandatory query field – the text that the search is based on
- field is an optional parameter, and can be used to restrict which field you with to search in
- format defines how the data is to be returned:
- prototype is the format used by EPrints, and other Scriptilicious-like systems
- json is the base JQuery format
- text is a screen-readable text format that can be parsed by some means
- [json is the default]
get_orgs & get_repos
This lists all organisations or repositories (as appropriate) that match query in either the fields name or url
This lists all networks that match query in either the fields name or ip