May 25th workshop – institutional repositories and ORCID
At the end of May, some of the institutions in the UK that are implementing ORCID met at an event organised by Jisc for members of the Jisc ORCID consortium. Notes from the event are available online; in this blog post I have written up the notes I made at the sessions on institutional repositories and eprints in a more digestable format. The first session in which I took notes had the aim of discussing institutional repositories in general and use of ORCID. In practice, many of those present were using the eprints.org repository software as their institutional repository. This was reflected in the discussion which was weighted towards eprints.org users, and in turn in these records made for dissemination. The session had two rounds to allow participants to move across sessions; several participants stayed for both rounds, and I have amalgamated the observations across the session halves. Finally, comments and actions from the eprints.org specific session are also included in this write-up. There are separate summary of the event and requirements synthesis blog posts.
Institutions shared a range of examples of where they are at with getting repository software to work with ORCID identifiers. Among those using eprints.org software, the different state of play sat somewhere on a spectrum:
- Actively developing an eprints.org solution (the Open University)
- Users are being directed to sign up to ORCID, but the process is not considered to be well integrated with the repository instance. For example even when ORCIDs were being recorded within eprints.org, this was described as having been achieved in “an inelegant way”.
- Undecided about the way forward due to perceived barriers or unanswered questions. Uncertainties included whether ORCIDs can be recorded in eprints, being unsure if eprints is the right way to go, and needing to be clear about the business case to justify investment in developing an eprints-based solution.
It was also noted that concerns and possible ways forward are influenced by instances being hosted by a third party compared to instances hosted, controlled and managed within the institution. One example is that any external support brought in to install plugins and configure the system to deal with ORCID must operate within existing service level agreements for hosted instances.
Amongst those using other repository software, two dspace users were considering their strategies: one of the institutions had a CRIS whilst the other institution did not. One of the representatives of an institution using dspace was uncertain if ORCID was already implemented in their instance. It was later suggested that approaching a company that provides support for dspace (Amaya) would be one way of reaching a group of dspace users. There was also an Equella user: ORCIDs were collected via email submission; however other potential infrastructure is being explored.
Oxford University presented an interesting case study. Their work on ORCID was undertaken within an existing Identity management working group, with ORCID being integrated into the University’s single sign-on system. The identity system interacts with ORCID and adds the ORCID to the user’s identity profile. This approach was particularly appropriate for a large institution like Oxford, where colleges may be using various other resource management systems operating separately. The implication for repositories within this environment is that the repository can connect to the identity management system and retrieve the user’s ORCID. One limitation is that any system that does not integrate already with single sign-on (such as their Symplectic) presents some difficulties. The option to link an ORCID to the identity profile is offered to users alongside other Oxford help pages – they can register for ORCIDs or affiliate the institution with their ORCID ID. One problem with a system like Symplectic that does not link in with the identity management system is that the user needs to be presented with two separate sign-on processes: this should be avoided. In terms of linking up with a repository system, the ORCIDs can be pushed from Symplectic to Fedora. Note: The Oxford scoping study was published in July.
A different institution that is pushing ORCIDs to its repository via Symplectic is the University of Leeds. However the way ORCIDs are handled in the eprints repository is not yet standardised commumity-wide, therefore the transfer from Symplectic is not considered standardised, which limits the benefits that can be achieved from pushing ORCIDs in this workflow pattern.
Some concerns raised when deciding a strategy for repository integration with ORCID included:
- Investment in a technical solution may not be feasible for a smaller institution.
- However gains can be made where a single system is in use across a whole institution; that could justify investment
- In terms of ORCID visibility, these observations were made. There is still a lack of consensus on how ORCIDs are displayed (e.g. within citations) in user-facing interfaces. One position is that the ORCID in the form of a digit should operate only at a machine interface level. For human consumption there should only be links (e.g. names) that click through to other ORCID-linked resources. Some concern was expressed that these links move users away from interacting with institutional repositories; others countered with the view that researcher-centric information navigation is more closely aligned with the idea that the ORCID represents the researcher, with the institutional affiliation being of secondary importance.
- Identity management across the institution may need to be considered. Identity managed through interaction with ORCID (in the style that the RIOXX plugin performs) may present some clashes with identity managed through systems like LDAP.
- Some institutions use mediated deposit. A workflow in which the researcher who owns the ORCID record does not initiate the transfer of information from the repository to ORCID needs to be supported. The ‘trusted individual’ feature of ORCID might come in useful here.
- There are dependencies between the interactions of the repository with ORCID and those made by other systems within the institution. Workflows will be influenced by how registration, push and pull functions are distributed across different systems.
- Consensus is required on how ORCIDs are exposed in metadata e.g. through OAI-PMH.E-theses require an author ID to be exposed. There are dependencies (at least in some software) between how ORCIDs are stored internally and how they can then be exported
- When compiling requirements for the repository with respect to interactions with the ORCID registry and with other institutional systems, the positioning of the repository in the institutions’ overall strategy is important; for some, the repository will be central to the strategy, for others only part of the picture.
When thinking about workflows between the repository and an ORCID record, attendees considered how these may impact the quality of the information held in the record. Duplication of information in the ORCID record should be avoided, and it was suggested that data that is held elsewhere and does not originate from the repository (e.g. if it is also available in Scopus) should not be re-submitted to the ORCID record. Where the repository is the main focus of depositing research outputs (e.g. where there is no CRIS), the workflow should not redirect users to sign in to ORCID away from the repository. Single sign on could help implement a workflow that avoids moving away from the repository as the main interface, with data moving from the repository into ORCID. Where the data in the repository is considered to be highly curated, a workflow which moves data from the repository into ORCID is also important. That leaves open the question of how to use ORCID as a source of information to complete records held by the repository. One consequence of importing records from ORCID into the repository could be expanding scope; for example, widening out from a publication-centred repository to include other types of output recorded in ORCID. One other aspect of scope is that repositories are also often tied to outputs produced by a researcher while at the institution; ORCID records span their outputs across affiliation, which implies that repositories may need to be selective about what they import. A bib format for export from ORCID would enhance workflow and options for importing into repositories.
A discussion of the features for a plug-in for eprints.org mentioned the following as requirements:
- it needs to take into account those who have already adapted their eprints.org software
- the workflow needs to be configurable e.g. pushing data to ORCID must be optional
- authentication through the repository should be optional
- a feature to discover users who are already registered with ORCID is desirable; it would be very valuable to be able to get this overview, although it is up to the individual to record their institutional affiliation
- functionality (similar to that in the RIOXX plug-in) to perform name look-up against the ORCID API is desirable
- enlisting the types of interaction with internal systems that will be needed can help clarify functionality
- some future drivers both internal and external may develop e.g. interactions with internal grant or finance systems may become more important if requirements from funders and bodies like HESA become mandatory.
Overall there was agreement between eprints.org users (at least 20 were present) that they should work together as a group and look for common solutions. There was strong agreement that a community-agreed foundation was ideal,
so that locally-developed solutions don’t break with upgrades and versions of the repository. They also expressed a desire not to reinvent the wheels and build on existing solutions. Even those with resource to fund their own technical development expressed a preference to invest this in a community effort rather than contribute to yet another piecemeal solution and take a direction away from the mainstream. It was felt that a timeframe and reassurance that a community plan was in the making would help make the business case to invest resources in this effort. There was recognition that the lack of a defined process to get community consensus on eprints.org software development could be a barrier; dually a plug-in could become a de-facto standard if it became widely used. Hosting services were identified as an important stakeholder since the user base for the plugin would be boosted if it was offered for hosted instances. Working with ORCID so that the plugin is recognised as an ‘integration-ready’ product is also critical.
One starting point is the software developed by Peter West (a TIER 2 ORCID client). This is now the basis of the development being undertaken by The Open University. Although the code is released openly it is not available as a plug-in as the associated support for installing and configuring it is not currently available. A demo of the software and its functionality was given; in summary, it is used with the ORCID member API to allows creation of IDs, claim affiliation, push and pull data, and apply more complex queries against ORCID.
The Tier 1 plug-in that is available through eprints bazaar works with the ORCID public API; given an ORCID ID it allows the repository to pull data from ORCID where a DOI is present. Although a number of downloads are recorded it is not known how widely the plug in is used.
The eprints wiki was identified as the focus for sharing community know-how. Attendees were invited to participate in putting together use-cases and workflows to define the functionality of the plug-in. One standard plug-in used by several in the community would offer the advantage that there would be more developers to support it. Meaningful dialogue with the core eprints Southampton team who manage the governance of the eprints software is also needed; this was also particularly important for hosted instances. Those present were reminded that the eprints users mailing list is a practical place for discussions; ORCID is of interest internationally, and this list attracts an international audience of eprints.org software users who may be able to contribute.
There was agreement that the proposed plug in could have core functionality (a basic set of functions) with configurable options (layered requirements). However participants also considered that the many variations in systems that the repositories may need to interact with (despite common categories of systems that might be identified) presents complexity that could be challenging.
Overall, some clear outcomes from the sessions were recorded:
- A recommendation that Jisc should be the point of focus for organising the developments for eprints, capturing requirements and needs, and driving things forward (similar to RIOXX developments)
- An action for all eprints.org users to join the eprints user group mailing list; promotion through the ORCID-UK list would take place (Lizz Jennings from the University of Bath agreed to help with this).
Since the meeting progress has already been made with activity on the eprints wiki site. A resource by the Jisc support team documents the current capabilities of eprints.org software (some details of which was described and shared during the event session) and links to the documentation and code-sharing platforms from which the plug-ins can be obtained. Community agreed requirements for a plug-in are documented on the eprints wiki and a survey is under way.