Posted by: Ian | January 16, 2013

SWORD 1.3 v’s SWORD 2

What’s the difference, and how do they compare?

In summary

SWORD 1.3 is a one-off package deposit system: the record is wrapped up in some agreed format, and dropped into the repository. SWORD 1.3 uses an HTTP header to define what that package format is, and the Individual Repositories use that header to determine how to unpack the  record. Every deposit is a new record.

SWORD 2.0 is a CRUD-based (Create, Read, Update, Delete) system, where the emphasis is on being able to manage existing records, as well creating new records. SWORD 2 uses the URL to identify the record being manipulated, and the mime-type of the object being presented to know what to do with it.

In detail (EPrints-specific)

This is, per force, EPrints specific as I am an EPrints user, with no experience coding in DSpace/Fedora/etc.

SWORD 1.3

With a SWORD 1.3 system, one defines a mapping between the X-Packaging header URI and the importer package to handle it:

  $c->{sword}->{supported_packages}->{"http://opendepot.org/broker/1.0"} =
  {
    name => "Open Access Repository Junction Broker",
    plugin => "Sword::Import::Broker_OARJ",
    qvalue => "0.6"
  };

The importer routine then has some internal logic to ensure it only tries to manage records of the right type (XML files, Word Documents, Spreadsheets, Zip files, etc).

In the case of compressed files, it is customary to also indicate the routine to un-compress the file. For example, the same Importer could manage .zip, .tar, and .tgz files – which are all variations on a compressed collection of files – which has the following collection of mime-types:

application/x-gtar
application/x-tar
application/x-gtar-compressed
application/zip

Therefore our importer would have code like this:

   our %SUPPORTED_MIME_TYPES = ( "application/zip"    => 1, "application/tar"               => 1,
                                 "application/x-gtar" => 1, "application/x-gtar-compressed" => 1,);

   our %UNPACK_MIME_TYPES = ( "application/zip"               => "Sword::Unpack::MyNewZip",
                              "application/tar"               => "Sword::Unpack::MyNewTar",
                              "application/x-gtar"            => "Sword::Unpack::MyNewTar",
                              "application/x-gtar-compressed" => "Sword::Unpack::MyNewTar");

So, a basic SWORD 1.3 deposit is a simple POST request to a defined URL, with a set of headers to manage the deposit, and the record as the body of the request:

  curl -x POST \
       -i \
       -u username:password \
       --data-binary "@myFile.zip" \
       -H 'X-Packaging: http://opendepot.org/broker/1.0' \
       -H 'Content-Type: application/zip'  \
       http://my.repo.url/sword-path/collection

This will deposit the binary file myFile.zip into the collection point in the repository, using the importer identified by the Package http://opendepot.org/broker/1.0.

SWORD 2.0

This is much vaguer, as I’ve not really got a good working example of a SWORD 2 sequence available (the Broker doesn’t do CRUD).

With SWORD 2, the idea is to be able to update existing records, piecemeal:

  • Create a blank record
  • Add some basic metadata (title, authors, etc)
  • Add the rough-draft file
  • Add the post-review article
  • Delete the rough-draft file
  • Add the abstract
  • Add the publication metadata (journal, issue, pages, etc)

With SWORD 2, what routines are used to process the request is based on the mime-type given in the headers.

Within each importer, there is a new function:

sub new {
 my ( $class, %params ) = @_;
 my $self = $class->SUPER::new(%params);
 $self->{name} = "Import RJBroker SWORD (2.0) deposits";

 $self->{visible}   = "all";
 $self->{advertise} = 1;
 $self->{produce}   = [qw( list/eprint dataobj/eprint )];
 $self->{accept}    = [qw( application/vnd.broker.xml )];
 $self->{actions}   = [qw( unpack )];
 return $self;

So, to create a new record, one posts a file with no record id:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyData.xml" \
       -H 'Content-Type: application/vnd.broker.xml' \
      http://my.repo.url/id/content

This will find the importer that claims to understand ‘application/vnd.broker.xml’, and use it to create a new record. The server response will include URLs for updating the record.

To add a file to a known record:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyOtherFile.pdf" \
      http://my.repo.url/id/eprints/123

This will use the default application/octet-stream importer, and add the file MyOtherFile.pdf to the record with the id 123.

To add more metadata:

  curl -x POST -i \
       -u username:password \
      --data-binary "@MyData.xml" \
       -H 'Content-Type: application/vnd.broker.xml' \
      http://my.repo.url/id/eprints/123

This will find the importer that claims to understand ‘application/vnd.broker.xml’, and use that code to add the metadata to the record with the id 123.

Note: there is a difference between PUT and POST:

  • POST adds contents to any existing data. Where a field already exists, the action is determined by the importer
  • PUT deletes the data and adds the new information – it replaces the whole record.

Summary

  • SWORD 1.3 uses the X-Package header to determine which importer routine to use, and the importer uses the mime-type to confirm suitability
  • SWORD 2 uses the mime-type to determine which importer routine to use.
  • The URLs for making deposits are different.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: