Data Migration - Reconciliation Testing


This blog looks at data migration reconciliation (aka data migration audit) testing.  This is an essential form of testing that our clients should, and usually do, insist upon as part of user acceptance testing.  The question they want answering is “Did all the stuff I wanted moving from source to target get moved”.  And a jolly reasonable question it is too.

Now first of all let me introduce the concept of Unit of Migration (UoM).   I can't claim authorship of this concept.  I came across it when working with the guys from the now defunct Celona.  The UoM covers the challenge of the different views the business side and the technical side bring to data.  Our business colleagues may be interested in Customers or Products when we, as technologists, see records and tables. As we know, a single Customer from a business perspective may comprise dozens of tables from a technical perspective.  It may even (as in Microsoft Dynamics) be made up of more than one entity which in turn are made up of multiple tables.  The UoM is this amorphous thing that is the focus for the business.  Because the reconciliation we are talking about here is a business facing criteria, it is important that we manage and report on it at a UoM level.  The sales director wants to know which of her customers have failed to migrate and what we can do about it.  She is not interested in the figures for numbers of records.

We also need to understand UoM for fallout management.  If a record somewhere in the matrix of records that constitutes a customer fails to migrate is it significant enough to fail the whole customer? 

But fallout management is a subject for another blog (if I ever get round to writing it) for now let’s stick with Testing.

How we check migration reconciliation is, of course domain dependent.  What is satisfactory for a marketing CRM would not be satisfactory for our friends in finance.  For the first a simple count may be sufficient.  For the latter only a complete trial balance would satisfy.

Simple counts may not be all that simple however.  We have to be able to show that the number of UoM that commenced the journey equal the number of UoM that ended the trip less any that we know fell foul of validation, de-duplication etc.  Those that did drop out en route need to be accounted for, identified and made available for possible remediation. 

Explaining changes in numbers and proving them are two different things.  Within PDMv2 we start our discussions about reconciliation at the very outset of our data migration activity via the System Retirement Plans which are our principle business engagement mechanism.  It is important that we start this conversation early not least because retrofitting the intermediary snapshots we need to explain, say de-duplication, may be impossible post build.  Significant rework will be needed if they are not catered for in advance.

The next level of complexity in reconciliation occurs when we are asked to perform what is sometimes called horizontal validation.  This is where we count not just the number of UoM occurrences before and after the migration but also some meaningful values within them to make sure that our transformation, data enrichment, consolidations and exclusions have not altered a fundamental aspect of the UoM.  For example we may be asked to reconcile the value of outstanding purchase orders before and after movement.  We may be asked for totals of customer by type.

At the most complex there is the major reconciliation of whole structures – like the trial balance in accounts.  Similar structural reconciliation can occur in other domains – like the connectivity paths in linear assets like railway lines or network cables.

Structural reconciliation is also specific not just to a single domain but also to a particular technology, which is even more reason to consult widely and early, leveraging the experience of your implementation partners as well as your local business domain experts.

This data structural reconciliation, however leads us into another side to UoM reconciliation, not just answering the question “Have all my customers been migrated” but also “Have all my customer credit limit details been aligned with the right customer?  Are the correct delivery addresses assigned to each customer?”  In other words is the right data in the right place?

We are now straying into some murky waters because, undoubtedly, the act of transformation – maybe consolidation of our discount codes as part of our migration tidy-up – could lead us to apply the wrong codes to some of our accounts if we have bugs in our ETL.  On the other hand, it is also risks us confusing testing the migration with testing the solution.  This I discussed in a previous blog.  We may have consolidated and mapped as per the data spec we were given by the designers of the new system but the edge cases that real data throws up were not anticipated in that design or the design itself is flawed.

Isolating where the error lies in these cases is difficult but also essential to ascertain for both technical and commercial reasons.  But this is the subject of a later blog in this series and first we need to look at the other testing that needs to be completed as part of the package of Data Migration tests. 

I want to open up this discussion so please comment in the usual way.  And a reminder that we have set aside a session of the Data Migration Matters event (DMM8, 2nd June) specifically to discuss Data Migration Testing.  Please come along and let’s see if we can’t build up best practice in this area.

Johny Morris