Multi-tenant repositories seem to the topic of the day, we’re hearing them mentioned everywhere. Fortunately we’ve been running multi-tenant services for years, so we have lots of experience to share. If you’d like a demo, just let us know.
Let’s go through the three models of multi-tenancy, what we can show you today, and an overview of the underlying technology that elegantly provides multi-tenancy within Haplo.
With legacy repository software, institutions often have to use multiple repositories to handle different kinds of outputs. The most common is a pair of repositories to handle traditional outputs and research data. This is because the metadata, workflows, and rules are quite different, and legacy repositories don’t offer sufficient flexibility. This is not a great experience for researchers who need to deposit items, and misses an opportunity to showcase all a researcher’s outputs in a single place.
Haplo Repository has a single repository containing all types of outputs, with a single interface for researchers to deposit items with a guided UI to help them select the right type. Each kind of output has it’s own metadata, workflow, and rules, and is published in a single public portal with all outputs, along with a full web profile of that researcher’s academic background.
While the first level of multi-tenancy covers the needs of the vast majority of institutions, sometimes a group of institutions wants to use a shared repository to pool resources and expertise. This is hard to do in legacy repository software, because the code is full of assumptions and has a rigid model of permissions and user roles.
Haplo Repository can store items from multiple institutions. The flexible permissions system gives each user a view that only contains items from their institution, and the ingest workflow routes new items to the institution’s metadata team. Users can belong to multiple institutions, or have an oversight role of the entire shared repository so they can assist users from multiple institutions. Finally, reporting delivers insights across the shared repository, or for just the user’s institution.
Each institution has their own public portal, hosted on a URL on their institutional domain (web address) with their own branding and customisation, and can opt-in to also publish in a shared repository public interface. This is especially useful for consortia of universities, or collaborating research institutes, who need individual identities and present their work as a whole.
Legacy repository software is expensive to host because it’s hard to share resources. You have to run a separate database and a separate application server for each tenant on its own physical server or virtual machine. This is wasteful of resources and cumbersome to administrate, so there are no economies of scale.
To be practical, hosted software has to be truly multi-tenant, where a single database and application server can host multiple clients with complete isolation between them. This is the model that all modern service providers use, especially large providers like Google, Amazon and Microsoft.
Haplo Repository is a true multi-tenant server. A single instance can run hundreds of repository applications, and each of those is totally isolated with an independent customised configuration. This allows us to host advanced repository systems cost-effectively and streamline our infrastructure administration.
Sometimes you’ll need a central repository which contains the metadata of all the items, for managing and reporting across a set of repositories. As well as standard protocols, Haplo supports efficient inter-repository messaging to sync metadata records between otherwise isolated repositories.
The second model is the most interesting from a technology perspective, so we’ve put together a demonstration multi-tenant repository. This:
We’re particularly proud that, although this is a radically different model to our normal single-institution repository system, the multi-tenancy is implemented by a thin configuration and user interface layer on top of the standard repository system.
If you’d like to see this demonstration repository, we’d be delighted to show it to you.
You don’t just benefit from multi-tenancy with Haplo Repository, you gain all the other advantages of using a modern repository system:
One of the many lovely things about working with academic institutions is that their repository administrators and metadata teams love to talk about the technicalities of repository software, and we love to talk about the system we’ve built. So here’s a overview of the technical details!
Each of the different multi-tenancy models is easy to implement with the Haplo Repository, and can be combined together to deliver your perfect repository.
Storing multiple kinds of item in a single repository is a natural ability of the Haplo data model, and the way the system is implemented as many co-operating plugins.
Each item in the repository has a well-defined type, which defines its overall behaviour and metadata fields. The ‘type’ can be thought of as a metadata template with associated rules and behaviours. For example, research data has very difficult publication and access request workflows, because it is often sensitive data which needs to be prepared before it can be shared.
On top of the core item and metadata handling, the repository functionality is implemented by a large number of plugins — a typical Haplo application may be implemented as over 100 plugins. These plugins are carefully written to either provide functionality or policy. So, you might have a ingest workflow with an implementation of the functionality, but you’d add an additional policy plugin which controlled the fine details of how it behaved. This allows you to implement any rules you need, so the behaviour of the repository can change depending on what type of item it is, the metadata of the item, or perhaps the research discipline of the authors.
The shared multi-tenant repository is the most interesting multi-tenancy model, which really shows off the capability of the Haplo platform.
The underlying platform separates the institutions, while allowing them to share common information, such as records about publishers, organisations, institutions, and so on, increasing efficiency and reducing errors through a single source of information. A fine-grained permissions system enforces isolation, and the user interface is designed around providing different subsets of information for different users.
The platform provides several simple building blocks that combine elegantly to implement this multi-tenant shared repository:
All this labelling and permissions is completely automatic, and even administrator users do not need to think about labelling:
Public repository interfaces can be customised for each institution, along with a public shared repository where institutions can opt in:
Metadata schemas can be shared or configured for a institution:
The Haplo application server is a single Java process that can serve many isolated applications. Every day, we work on development servers that are hosting hundreds of independent applications on a single Haplo application server.
The database uses PostgreSQL ‘schemas’, a feature that allows each tenant to have strong isolation with their own set of database tables. When serving a request, the platform uses the hostname to choose the right tenant, selects the right database schema, then serves the request.
Within the Haplo platform’s server code, there’s no global state, so each tenant has their own completely separate configuration, caches, and set of plugins, and is isolated and resource controlled to ensure each tenant can’t affect any other. This is not an easy thing to implement, and requires great discipline and careful selection and use of software libraries, so it has to be designed in from the very start.
While these repositories are independent, sometimes they’ll need to communicate. There are two options, standard web protocols over HTTP, or sending messages between repositories using an internal message bus, or external messaging systems such as Amazon Kinesis.
Haplo has been used to deliver multi-tenant information systems, in production, for over 10 years. It’s a mature, secure, and well tested platform.
We’re always keen to talk about repositories, from the underlying technology, how an excellent user experience can increase deposit rates and accuracy, to the advantages of a repository as a central part of the institution’s CRIS.
Please get in touch for a demo. (Even if you don’t need advanced multi-tenant features.)