Posted by: Theo | April 9, 2010

Multiple Deposit meeting (summary notes)

Here is a summary of the main topics of conversation covered during the session.

SWORD endpoints

Initially the OARJ project has been trying to identify repositories but how do we identify SWORD endpoints? This data is currently missing from the existing registries; also at the last SWORD meeting it was suggested that a registry should cover/include details of the relevant packages. Until such time as a registry exists then the broker will have to determine this information.

Notification Pingback

Is fire and forget good enough? Do the providers of information care about notification? Other stakeholders almost certainly will do – Research Funders will want to know about compliance rates, Publishers will care about where the content has been passed to, Deposit clients will want a public URI returned to the tool.

From a good web UI design point of view end users will want assurance about whether a deposit has been successful. Perhaps a live response would be worth thinking about. Similarly a ‘daisy chain’ approach could propagate initial public facing URIs to other subsequent repositories.

Publisher -> IR use case

Getting data out of publisher systems is non-trivial. Publishers use a multitude of DTDs depending on in-house systems. The general question was asked: is there value in recreating the PEER project? The PEER project has already defined an agreed packaging standard with publishers that the OARJ project should adopt (NLM DTD). However the OARJ Broker goes beyond the PEER project in that it accepts content from multiple sources in a variety of formats, deposits into a variable number multiple of targets and keeps track of transfers.

Cataloguing problem

Some Institutional Repository services cannot allow non-vetted deposits due to insurance concerns. Libraries are still working on embedding the repository as a core service, most representatives round the table are giving responsibility to cataloguing teams. Could libraries cope with suddenly influx of data?

Authorization problem

It would be beneficial from a number of stakeholder’s point of view for repositories to sign up to receive content from the broker service. The broker could then negotiate the deposit process (i.e. specify SWORD endpoints) and find out appropriate contacts if things go wrong. From a publishers point of view the repositories could indicate they are a trusted source by signing up to a list of guidelines.

Deduplication problem

The deduplication problem kept being raised in a number of contexts. Although generally it is a good problem to have (i.e. a lot of content is passing through the broker), it is out of scope for the current project to provide a definitive solution. For the time being this will have to be a problem for repositories to solve when receiving content.

It became apparent during the course of the discussion that the Broker system should by default be told where to deposit (i.e. controlled rather than dynamic), whereas the separate Junction service advises where to deposit. There is still a desire for the broker to lookup and multiple deposit without asking the submitter (although this may be phase 2).

Main ‘take-home’ points

  1. Acknowledge and use outcomes from the PEER project rather than recreate work.
  2. Deposit of full text in multiple places is okay.
  3. Duplication is a way of life on the web.
  4. Broker deposit should be a controlled process. Hold onto item until delivered. Metadata record (and public URI if possible) should be retained.
  5. Pingback only happens when IR gets back to broker which could then act as a hub.
  6. SameAs.org was suggested as a good way to handle multiple URI targets.
  7. IRs register with broker to create trust fabric.
  8. Depositors and deposit tools also register with broker.
  9. Embargo information should be passed on for repositories to be able to deal with.
  10. Use of established packaging formats is preferable (e.g. DSpace METS)
  11. AJAX name lookup for organisational names was desirable.

Attendees:

Alison Henning (Wellcome Trust)
Ben O’Steen (University of Oxford)
Dale Heenan (ESRC)
David Flanders (JISC)
Grace Baynes (Nature Publishing Group)
Graham Triggs (BioMed Central)
Ian Stuart (EDINA)
James Farnhill (JISC)
Jodie Double (University of Leeds)
John Salter (University of Leeds)
Julie Alinson (University of York)
Pablo de Castro (Carlos III University Madrid)
Richard Jones (Sympletic)
Theo Andrew (EDINA)
William Nixon (University of Glasgow)

Responses

  1. Offline comment from James Farnhill:

    A good summary. The only point I would pick up on is under Authorisation. What was being discussed here was developing a trust fabric so that the broker and other entities could trust each other. This would be useful as publishers, say, could submit content to a repository through the broker using their own credentials that were trusted by the broker. It’s a useful model to explore, even at a basic level, as there is a great deal of scope to expand this at a later date and it gives an example of n-tier in action for a specific use case, adding further weight to supporting this in more general developments under AAA at JISC.

  2. Good stuff guys. A good meeting. Now crunch time, make it happen Ian – good luck!


Leave a comment

Categories