For some time past Communardo offers a migration service to migrate data from existing legacy systems into Atlassian Confluence. During this process the Communardo product Content Import Plugin (CIP) is used. This plugin is available with a free evaluation license and can be downloaded from the Atlassian Plugin Exchange. You can order a full license in the online shop. So customers can perform migrations on their own without using the migration service.
This article describes some of our experiences with actual migrations. It is targeted to customers who want to perform a migration on their own and to customers who want to use our migration service.
How Migrations are Performed
Migrations with the CIP are performed in two steps. First, the data is exported from the source system to an XML transport format. As a second step this XML is imported into Confluence with the CIP.
The transport format contains all types of contents (spaces, pages, blog posts, attachments) with their meta data (author, creation date, modification date). Additionally the page hierarchy and internal links are supported.
The Customer's Expectations vs. Feasibility
Our previous experiences showed that customers usually have high expectations of the migration. These are often
- completeness of migrated data
- same layout after the migration
- keeping functionality
The completeness of the migrated data is provided from by the transport format. All pages and their contents are covered by that and hence can be migrated into Confluence.
The circles are standing for the feature set and layout abilities of different markup formats. HTML has the biggest feature set and the most possible variants. With Confluence wiki code it is possible to formulate many of HTML's features. But there are some functionalities that are not possible with HTML (e.g. macros). The same applies for MediaWiki code.
Looking at this diagram it becomes clear that it is not possible to migrate MediaWiki code lossless to Confluence wiki code. They do not share the exact same feature set. MediaWiki knows some concepts that are not possible with Confluence wiki code (e.g. merged table cells). When migrating from plain HTML to Confluence there are also some features that are not possible in Confluence (e.g. definition lists).
In some cases it is possible to find alternative markup variants that provide a similar result in Confluence. So definition lists could be replaced with nested unordere lists. However, for nested tables there is currently no possible counterpart in Confluence.
Another problem is the style of certain elements. A heading could look a lot other after a migration from MediaWiki to Confluence. The content of the heading was indeed migrated completely but the heading looks completely different afterwards. Formatting rules described in CSS cannot be migrated. To get a similar-looking result one has to define new Styles for Confluence or develop a Theme Plugin.
Examples for other Migration Challenges
The next paragraphs describe some problems we came across in our last migration projects. So you can get a feel of technical limitations in migration projects.
Until some years ago it was quite usual to use tables for website layouts. These are tables that have no visible borders. For an author of a website it was an easy way to build nice layouts with elements positioned side by side. So a navigation bar could be near the content.
Unfortunately Confluence does not know the concept of borderless tables. So it is not possible to migrate such a layout table without differences in layout. The lack of support for nested tables boost this problem even more.
Tables with Merged Cells
With HTML it is simply possible to merge some cells of a table in horizontal (colspan) or vertical (rowspan) direction. This is often used for table headings that span more than one table row.
Since Confluence has no possibility to markup spanned table cells these cannot be migrated without losses in layout or even content.
If you try to migrate hand-crafted HTML you run into real trouble. Many content management systems use WYSIWYG (what you see is what you get) editors for creating or editing contents. These editors do not show the source HTML directly but the formatted result as it will appear on the website later. The big advantage is that these editors generate uniform HTML.
For such content it is possible to recognize patterns of same markup and to transform them into Confluence code. If the page markup was hand-crafted no such patterns exist. So it is extremely hard to find patterns for transformation. As a result you have to accept severe formatting losses.
Often it is quite hard to migrate internal links. Sometimes the links of the source system are broken, so it is not possible to find the correct link target. Sometimes it is necessary to choose from multiple possibilities and take the "best-fitting link target".
Sometimes it is possible to change the the structure of the data or add some generated content. This is the case when the migration should generate some overview pages that didn't exist in the source system or that were generated by the system. The complexity for finding the targets of internal links is increased extremely by this.
Confluence offers the following possibilities to structure information:
- Spaces with pages and blog posts
- Child pages of pages (tree structure)
- Attachments on pages or blog posts
If there are structures in the source system that go beyond Confluence structural abilities (e.g. multilingual contents) there is no way to migrate these information directly into Confluence. Only by using third-party plugins (e.g. the Communardo SubSpace Plugin for space hierarchies) the migration is possible.
Unpredictable Behavior of the internal Wiki-Code Converter of Confluence
The transformation of HTML to Confluence wiki code is performed by Confluence's internal wiki code convertert. This component has some issues with the transformation of any HTML (that was not generated by Confluence itself) into wiki code. Even the smallest differences in the input (like line breaks or differently nested HTML elements) may result in completely different wiki code. Especially when migrating hand-crafted HTML migration losses are not avoidable.
The migration of contents into Atlassian Confluence is nearly lossless possible with the Content Import Plugin. The formatting of this information however has to be adjusted to Confluence and differs in the most cases from the formatting that was shown by the source system. Furthermore it is not possible to migrate concepts or functionality that existed in the source system but that does not exist in a similar form in Confluence (e.g. definition lists, nested tables, …)