How to move towards wiki interoperability

Letting go of specific software or interfaces, I see a way forward through defining the functionalities of wikis and the information structures that support them.

Wiki interoperability continues to be complex, tricky and troublesome.1 What is it, exactly? I'll try to outline something that makes sense below. But even more essential, going hand in hand with what, is why? What purposes would wiki interoperability serve?

I've looked at this before in terms of living knowledge commons. Back in February I was thinking about Fedwiki interoperability, in the context of the question of how can one transfer a page (or more) from one wiki platform to a different one. And that probably remains the easiest case to imagine. I have a page about something in one wiki (say this one in Wikipedia) and I want to put it somewhere else. To move one page to another Mediawiki installation (e.g. the P2P Foundation Wiki) you have to export the page – Mediawiki exports to a specific XML format – and then use the import tool to add it to another Mediawiki installation. Whether it works depends on several things, including whether the importing installation has (or can import) the templates that are in the export. Templates are a common feature in Mediawiki, but they are not standard across other wiki systems. One perhaps surprising feature is that you can ask for past versions to be included, in which case each version will be given in full in the exported file. That can be a very large file, even for a small page!

If you want to move a Mediawiki page to a different wiki using Mediawiki export, you will need to look around for suitable software to do the job, and even if you find something, there is no guarantee how much of your original page will appear in your new wiki, other than plain text. Alternatively, you can simply save the page displayed on your browser as HTML, but then you will lose a lot of the structure of the Mediawiki version. If you were to try what is commonly called a ‘round trip’, attempting to put the information then back into Mediawiki, you will find that substantial information has been lost. In short, moving pages from one wiki to another is difficult, may be time-consuming, and will almost always lose at least some information.

But how important is that? Does it really matter? Why not simply link across between different pages on different wikis? That is clearly an option, but what are the consequences?

  1. In a coordinated but distributed living knowledge commons, it is unlikely that all the information will stay in one installation or one server all the time. New fields, new organising schemes are likely to appear, and when a new field opens up it might want to take over responsibility for information currently held in old pages. And what if two groups have already started wikis, or other information resources, about a particular area? As they grow and become aware of each other, they might want to negotiate and divide responsibility for particular areas with the other wiki, so that they can put more time and care into a subset of pages. Of course, you could use that as a prompt for a rewrite of those pages, but forcing that seems likely to deter people getting round to it, and thus risks fossilising the information.
  2. If you just link to other wikis, like you link to other external web pages, no current wiki system will automatically record a link back from where you link to. It's one of the good things about wiki systems in general that they do normally keep track of ‘what links here’ in some way. This is also vital for categories or tags. So, most likely, internal links, categories and tags will be limited to one installation. This touches on a wider view of interoperability than just export and import – the question of crafting APIs for synchronous meshing of different wiki systems so that the functionality is spread across the system – and this opens up many more questions and challenges.
  3. Different wiki user communities may have different abilities, and different cultures. If information about a certain topic is stuck in one place, that may not suit a different community that wants to take over curation of those pages. Different interfaces suit different people, and you can't expect the wiki software you started with to be equally usable to new people. So you may well want to have the same or similar information displayed in different systems.
  4. Similarly, different wiki systems are easier or harder to use. On Wikipedia, for example, editing plain text is easy enough, but getting into proper referencing and citation is much harder. Thus, you may want to move pages from one harder system to another easier one. Otherwise, the people who don't like the more complex interface may simply not bother to curate the pages on the system with the harder interface.

The root cause of this lack of interoperability – or, to take the most obvious case, the lack of portability of pages from one system to another – is that their models of how exactly a page is put together are different. It's not about the lightweight markup language they use, as Pandoc does a pretty good job at translating between these, but the structure of the page itself. Wikipedia makes extensive use of templates: there are thousands of them, all translating to HTML in their own way; and each Mediawiki instance can have a separate set. It's easy enough to translate templates into wikitext or HTML, but how do you translate HTML back into template form? Fedwiki has its own logical structure, which is fine for what it does, but effectively blocks the meaningful export or import of structure to other wiki systems.

Then, let's think about the revision history and the talk pages. Talk pages are structured essentially like ordinary pages, and could be represented simply as another kind of page. Revision history adds complexity. I haven't looked in any detail into how much different wiki systems store history, but I guess there are differences. Any differences will add complication to an export and import process, and may at best imply loss of information.

To enable portability of pages from one wiki system to another, ideally each system would export and import in the same format as all others. That doesn't mean that each system has to have exactly the same information. It does mean that if they do have similar information, it should be able to be represented in an interoperable way. That says little about the internal operations of any one system. It says little about the user interface. But it does mean that at some deeper level, the model of a wiki page needs to be compatible. And the way I think about that, it means that we need a basic agreed ontology underlying all the systems – a catalogue, if you like, of all the different kinds of things that exist in a wiki system, and how they are related. One of the reasons Javascript works across browsers is that, for normal web pages, browsers share the same DOM – Document Object Model – so Javascript operations in different browsers can have the same effect. (Imagine having to write browser-specific scripts …)

As I've seen so often in real life, solving a technical challenge is usually the easier part of solving a real life problem. The bigger question is, how do people coordinate to set up the (socio-technial) working system in a way in which everyone is motivated? It's this question that I'd like to invite focus on most of all here – though the answers are not at all clear. Here are the steps I envisage, very open to improvement (and please do get in touch if you have better ideas).

  1. Invite a reasonably diverse group of open source wiki developers, including both established wikis and ones in development.
  2. Ask representative leaders or developers of each wiki system which functionality or features they think they are doing, or could do, better then other wikis. Allow features to be included which are not yet implemented.
  3. Tidy up the list of features through dialogue, and agree a non-overlapping set of features that everyone can relate to, however important or unimportant they reckon those features as being.2
  4. Ask people to rate the relative merits of each of the wiki systems that they are not involved with on each of the agreed features, with particular attention to the better ones.
  5. For each feature, arrange teams drawn from the two or three top rated wiki systems to agree an interchange format, either that they could implement now, or that they are willing to work towards.
  6. Submit all these proposals to all teams, for suggestions for improvement. Accept improvements that more teams are willing to implement.
  7. Using this output, get a team to write up a clear proposal.
  8. Iterate this through a process of implementation and improvement – this iterative modification of the standard and the code is a tried and tested way of achieving workable interoperability.
  9. Collectively choose standards bodies to work with to formalise this.

That's the idea in outline. The idea could be run past any funding body with an interest, and any funder would have first choice of the body to publish the standard. But the non-negotiable basis is that the resulting standard would be truly open, with no fees either to consult or implement the standard. It would probably be a good idea to set up a foundation for ongoing governance, either an existing related one or a new one for the purpose.

Finding funding for all of this, or motivating people to do it as volunteers, is something that I am sure others are better than me at!


NOTES:

1: And it has been troublesome for many years. A paper was published in 2006, “Towards a Wiki Interchange Format”, but recent correspondence with the author suggests that his initiative sadly did not result in any substantial progress. We are in no better position now than we were then. My proposals here take a different approach to previous ones.

2: My own ideas for which features may be important are related to what I call Some ideas for requirements for a wiki supporting a living knowledge commons (as a cryptpad, in case anyone wants to help improve them). This is just my best guess at present, and I expect it will not be the same as the list of features that come up from the process suggested above.


Topics: Interoperability; Wiki software


If you have any remarks on any of my posts, please send me e-mail, saying what you want me to do with your remarks. Are they private to you and me, or would you be happy to quote you (I will always attribute your words unless you ask me not to), and add your response (or parts of it) to the post it's about?
Creative Commons Licence