Posted by: Theo | January 15, 2010

Designing a workflow to pass Journal Articles directly from the Publisher to Institutional Repositories.

One of the major use cases we are testing with the OA-RJ project is how publishers and institutional repositories can work more closely together (we’ve previously blogged about this here).

This new post describes how we envision the Authors’ post-print versions of journal articles will be passed directly from the Publishers to participating Institutional Repositories.

In line with the standard six month embargo adopted by most publishers for post-prints we have developed a three stage process shown in the diagram below:

Stage One (Steps 1 to 6 in red)

After the peer review process the final manuscript is accepted by journal editors and it enters the publishing workflow (1). Eligible manuscripts are flagged for submission to the Institutional Repository at the point of publication. At this point the article is assigned a manuscript number, packaged up within a zip file with descriptive metadata in xml format based on the NLM DTD 2.3 format (2). This will be sent to the OARJ broker by FTP (3). The package is received and unpacked by the broker (4), which then using the data identifies the correct repository location for each
package (5). An initial confirmation is sent to the repository managers to inform them that the broker has content belonging to them (6).

Stage Two (Steps 7 to 12 in yellow)

Only when the publishing process is complete will the article be assigned a full metadata record (7). To be able to create a citable object we are particularly interested in receiving an updated record, including the DOI, journal volume, issue and page numbers. The publisher will create a second package with these updated records and a manuscript number to match to the existing pre-publication record (8). This updated metadata record
is sent to the OA-RJ broker via FTP (9), and upon arrival is unpacked, matched and merged with the existing record via the manuscript number (10). The release date is set by adding together the publication date plus the post-print embargo duration – in this case six months (11).

Full notification and advance metadata will be sent to repositories to allow them to use this data even though the full text file is restricted (12).

Stage Three (Step 13 in green)

When the restriction period expires the broker will be able to transfer the files over to the designated repository/ies (13). Initially we will be using SWORD for this purpose, however we also want to investigate using RSS feeds so institutions can pull content from the broker instead.

Once the content arrives at the nominated Institutional Repository it
can enter into the local cataloguing workflow.

This is harder than it looks!

To conclude, the whole process of passing content between Publishers and Repositories is theoretically simple. However in reality building the necessary technological and administrative links between organisations to do this is actually very hard to co-ordinate. We believe the workflow described here is a right step towards our vision of seamless interoperability between different parts of the current scholarly communications chain. The project team would be really interested in what you think of the proposed workflow – you can comment directly below or send us an email if you prefer.


Leave a comment

Categories