Posted by: Theo | February 17, 2011

RSP Winter School presentation

The OARJ project recently had the pleasure to be invited to the RSP Winter School 2011 (#rspws11) to give a talk on some of our initial findings.

During the talk we covered a number of areas including repository interoperability, the SWORD protocol, common deposit use cases, a broker model for deposit, and how to match authors to institutions.

We had very positive feedback from the delegates – we were described as THE talk of the conference by Gaz Johnston and @williamjnixon tweeted “OA-RJ has the potential to be a “killer app” for deposit with case study work with Nature Publishing Group and UKPMC”.

The slides from our talk looked like this:

Quick Slide Commentary

1. The elevator pitch: OARJ is assisting deposit from different sources into multiple repository locations

2. Overview of talk

3. The problem: Repositories were initially designed as silos. Information (in this analogy grain) is packaged up in different ways (sacks) and deposited into repositories (the silo). People find the information and visit the silo to use it (feeding time).

4. Sometimes you need to pass information (grain) between silos so some clever people developed the SWORD protocol, which allows transfer from one silo to another. Continuing the simple analogy SWORD is like a truck transporting grain from one silo to another. For the delivery to be successful the driver needs to have a manifest which the silo foreman can accept (agreed packaging standards), and also have a parking permit (authorisation – username/passwords).

5. Simple deposit from one silo to another is great, but we all know the world is a lot more complex. There are different types of silo, different types of grain etc. We are going to look at two deposit use cases to see some of these complexies in action.

6. Deposit use case one: the multiple authored journal article. A journal article is written by a team of researchers from three different institutions. Once published it needs to be deposited in three separate repositories; problems – each paper may or may not be identical (different versions may be deposited), the metadata describing the object may or may not be the same (inconsistent citations), duplication of effort (assuming the papers are even deposited in the first place).

7. Deposit use case two: mandated open access. In a complication to the first use case, the authors are required by their research funder (for example the Wellcome Trust) to deposit a copy in a subject repository (e.g. UKPMC). They can do this by either paying the publishers (£000’s) to do this on their behalf, or they can self-archive a copy via the manuscript submission system (if the publisher allows it). Once the researcher has crossed this hurdle they are now also required to deposit a copy in their local instituitional repository by their university mandate. Duplication of effort again for all the researchers involved.

8. A solution: the broker model. A broker sitting between content providers and recievers could simplify things by giving one consitent deposit process for authors. It will also mediate the adoption of SWORD by giving content providers and recievers one place to agree packing formats/credentials.

9. This slide shows how we could adopt the broker model for the first use case scenario. The journal publisher uses the broker service to send materials to all associated repositories.

10. Working with Nature Publishing Group we have identified the workflow involved to pass content from the publishers in house systems throught the broker to repositories.

11. The table gives an indication of how many papers could be passed throught he broker in a 6 month demonstrater period.

12. This slide shows how we could adopt the broker model for the second use case scenario. The subject repository UKPMC could identift new content and pass it to the broker for deposit in institutional repositories.

13. The second main problem is finding a way to associate authors to institutions to repositories so deposit can occur. In the absence of a usable dataset to amtch authors-> institutions we have to find proxies for this – via IP, or extrating the data from postal or email addresses. The institutions can be matched to repositories by using data from existing registries, e.g. ROAR or openDOAR.

14. The junction database takes data from a number of existing sources (in green) and normalises it for use giving each institution an orgID. Futures sources where data could be extracted are shown in red. Once in the database we can then match orgIDs to IRs.

15. API – the information is available via a m2m interface for 3rd parties to use. Visit this blog post for more info.

16. The next steps. We have shown proof of concept locally on our test servers. Next we aim to provide a demonstrator service for a limited number of institutions to shown that it can work in the wild. If successful we will be aiming to expand the broker service.

17. FIN and exit to lunch!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: