Posted by: Theo | January 21, 2011

Broker progress update and next steps

It’s been a while since the last update so we thought we’d let you know how the technical development work has been progressing.

Depositing content from the broker into repositories

Despite initially following a few paths that have lead to dead-ends, or complications, the Broker dispatch side of things is coming together nicely.

The main issue we faced was deciding what packaging format to pass content to our partner test repositories. Our initial plan was to use METSDSpaceSIP, which can be imported by both EPrints & DSpace – however we found certain flaws which we think we have now overcome.

We have now internally tested passing content packages from the broker to our local EPrints and DSpace test repositories.

The test EPrints repository uses modules we wrote to import; specifically the importer module needed to understand extensions to the standard epcdx used in METSDSpaceSIP, and files in subdirectories.

Minor changes to the DSpace config were needed for the SWORD transfer to work – we’ll document these in a separate blog post.

When exported via our bespoke broker admin interface (actually an EPrints repo) it takes about four seconds for the Broker to transfer content via the following sequence of events:

  1. broker creates .zip package;
  2. broker transfers .zip via SWORD;
  3. IR ingests package;
  4. IR replies to broker;
  5. broker writes summary page indicating results.

Next steps

So now we have now shown a basic transfer from the broker to our own local repositories (Eprints and DSpace) – we need to test this on external servers. In the next few days/weeks we will be contacting our project partners with a test package to try and deposit in their own test environment. Once we are happy this works the next step will be to work together to configure the target repositories to accept a SWORD transfer.

Future developments – additions to the basic transfer

There is further work that we will need to look at to address certain complications:

  1. Given that the Broker is transferring on behalf of a third party, adding in provenance metadata (i.e. where the OA-RJ Broker got the data from) is important;
  2. We would also like to add some institutional data for the authors that we have (initially just their institutional affiliation);
  3. Embargo Metadata needs to be added. We’ve not worked this out yet, however EPrints applies Embargo at “document” level rather than “file” level. I think this is something that needs to be clarified before too much work is done;
  4. We would like to explore the idea of including files by reference – rather than physically including them in the .zip file. This has a couple of advantages:
  • When some other repository takes responsibility for the records, the Broker can send pass-by-reference values for that repository, and delete the binaries from the Broker.
  • The actual data being created by the broker is smaller without the binaries included – therefore one assumes both quicker to generate & transfer. It also means that when the target repositories come to collect the data, that FETCH operation is balanced with all other transfers – and can be safely delayed until the server load reduces. Given that the Broker can be transferring hundreds of Packages a day (one for each IR for each author for each deposit – from some larger publishers and funders as well as a variety of other clients), the server load could be quite high at times.

Any comments on these future developments would be appreciated.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: