Data Migration - Moving the Mail

With DMM9 less than 5 weeks away, here I introduce a sponsor new for this year 

Quadrotech are a company who specialise in the migration of email especially within the Windows environment.  They have a special talent for extracting data from almost unextractable 3rd party email archive systems.  Now this is something I have never had to perform but I know from experience just how hard it is to extract anything from archives that have been designed primarily for capture, retention and e-discovery so I praise them for their persistence.

They are also having a busy time of it right now, what with Microsoft exchange 2013, 2016 and Office 365 adoption, and of course all this on top of the classic archive migration scenarios for M&A and platform transformations.  Maybe we should employ their services here at iergo.  We have moved to Windows 10 from a heterogeneous collection or XP, Windows 7 and even Windows Vista machines.  The undoubted commercial logic of a subscription model also led us to move over to 365.  However we haven’t managed to get Outlook 365 to work properly on Windows 10 so we flipped back to an earlier version until we get another widow of opportunity.

Like all niche players it’s not until you pause to think about them that you realise just how vital they can be.  For some organisations retention of years of emails is a regulatory requirement and in mergers and de-mergers transfer of email archives is one of those items that should, but may not be, on the check list.  And, as we have found, the regular churn of Microsoft through its version cycle means we are always either facing or recovering from their latest initiative.  Microsoft themselves have recognised the help Quadrotech offer with a Partner of the Year Finalist award in their messaging category.

Come see them at DMM9 where they will be making use of them in our try before you buy Workshop room to demo their offerings alongside Experian Data Quality and Vision Solutions.

And with 5 weeks remaining you may think there’s plenty of time but there is a floor limit and we don’t want to disappoint anyone.  Check out what’s on offer at  Twitter – @johnymorris

Testing Data Migration Step 2

In the last blog we looked at two major confusions that bedevil data migration testing – confusing building for quality with testing for defects and confusing data design issues with data migration faults. 

I have set aside a session at the next Data Migration Matters event (DMM8, 2nd June, London) exclusively to discuss the issue of Testing Data Migrations.  Check out the timetable at:

I hope to see as many of you there as possible.  Let’s see if we can’t get some consensus around testing.

Now back to the blog.

In the last blog we looked at two major confusions that bedevil data migration testing – confusing building for quality with testing for defects and confusing data design issues with data migration faults.  

This blog was meant to look at reconciliation.  However I have had a number of questions regarding Data Design so reacting in an agile fashion I’m going to take a moment to look at this.  For ease of understanding I will set this problem up as a plain vanilla ERP implementation using a waterfall approach with a supplier or systems integrator delivering a Computer Off The Shelf (COTS) package into a manufacturing or service delivery company client. 

The supplier is responsible for understanding the operations of the COTS package on the one hand and for analysing the operations of the client on the other and then for bringing the two together in a perfect handshake.  Part of this fit has to be the data design.  Or does it?  Understanding the confusion surrounding this goes some way to understanding the confusion around testing I believe.

So what is “Data Design”?

Well what we are talking about here is the way the data structures built into a COTS package are used to support the data structures the client needs to carry out their business. 

Let’s take something simple – Lead to Cash (L2C).  Commercial organisations exist to sell stuff, at a profit.  So they all need a process of getting a lead and turning it into a sale and then delivering the product and collecting the cash.  This L2C process is fundamental to capitalism.

Without dwelling on all the detail, this journey from lead to cash involves the establishment of certain master or framework data items.  We have customers (both actual and potential), we have products both physical and logical (as in my case where I sell data migration consultancy).   If we concentrate on the products, all ERP packages will have some kind of Product-Master structure (please go with me on this one – I'm not going to dwell on the difference between a parts master and a product master).  Therefore all implementations of the lead to cash process will need a Product-Master establishing that is suitable for their physical and/or logical products.

But these Masters will not be the same in a house builder and a medical supplies manufacturer.

but who gets to design it?

So who is responsible for designing the use of the COTS Product-Master for our phantom client?  We need both the domain knowledge of the client and the COTS package knowledge of the supplier but it is always best practice to have one lead.  Are we target led or source led?

Well to my mind it must be the supplier.  They are the ones who know the target best and know how the Product-Master is related to accounts, product lifecycle management, supplier management etc. etc. within the target application.  The art of fitting the client’s business requirement to the structures on offer in the COTS Package requires knowledge of the COTS Package and the implication of design choices and this expertise is what the supplier brings to the party. 

Only the client has knowledge of the client product set that is the other half of the mix.  It is my contention that it should be the responsibility of the supplier to have the analysis skills that will reveal this knowledge in such a way that they can then perform the data design and deliver an operating platform that will enhance the activities of the client.  We need to bear in mind that most organisations do not replace the systems that support their businesses very often.  Therefore there is no reason why they should have developed the skills to analyse and articulate their business processes in a format ideally suited to a third party’s implementation requirements.  The supplier on the other hand has regular need for these items to be developed so it makes sense for them to have cultivated the skills needed to unearth these processes and the ancillary data.  They should have made these skills part of their own product set.

This is rarely contested but the level of detail in the data design often is.  It’s all very well placing the onus for data design of Product-Master on the shoulders of the supplier, but what about the detailed definition of items like the format of part numbers or the breakdown of a product set into discrete deliverables? 

There are two reasons for still saying that detailed Data Design belongs with the supplier.  Firstly, although the majority of the Data Design will stay as is – if you are a car manufacturer before the migration you will be one at the end so you will have models and versions of models and so on – there are some data items that are only present because that is the way the client has traditionally done things.  It is for the supplier, if they are to add value, to challenge these vestigial elements and substitute them for ones that will take advantage of the new system’s capabilities.  The risk otherwise is that the new system becomes a poor reproduction of the old which undermines the value proposition of making a change in the first place. 

The second reason is linked to the first.  In modern highly integrated COTS packages the setting of some values has impacts across the application.  It takes knowledgeable experts to understand the implication of something as apparently simple as the parts numbering system and its relationship to the part breakdown structure.  This means we need critically engaged target system experts to facilitate the result.

So to optimise our investment and to avoid cock-ups it should be the COTS package experts who are driving the bus.

However all too often I find, on arrival at an in-flight project, that the subtle difference between creating a general structure for your Product-Master and taking this down to the level of detail that can actually run the business, is a point of misunderstanding between the supplier and the client.  Often then as it is the Data Migration team who best know the legacy data and the issues seems to be one to do with data the task of providing this metadata falls on them.  This is wrong.

As an aside, if the main supplier will not commit to delivering the detailed data design but s waiting for the client to produce them, and given that the client may not have the skills to produce them, then the client should look to sub-contract that element out.

Whoever is performing the data design it still remains the case that this is not a data migration task.  If you are moving house, the removal men expect to be told where the bedrooms, lounge and kitchen in you new residence are.  (OK so they may be able to make a reasonable guess as to which room is  the kitchen but you get my point).  They are not expecting to decide for you which cupboards you wish to use and certainly not to have to architect the dwelling.  This should be the same with us lifters and shifters of data.  Tell us where to stick the stuff and we will organise things so that is where it gets stuck.

If all of this has been a little dense then please allow me to recapitulate:

  • Data Migrations are always part of bigger programmes
  • There is usually an incumbent supplier or systems integrator contracted to implement the new Computer Off The Shelf package
  • Data Design and Data Migration are not the same
  • Data Design is the alignment of the client metadata with the data structures available on the target system
  • Best practice is for the supplier to be responsible for detailed Data Design as well as detailed  process design
  • If the supplier is not going to perform this task then unless you have the skills in-house get assistance from a supplier who can provides this service
  • In any case Data Design should be seen as a separate task from data migration and planned into the project from the beginning
  • Data migration is the finding, extraction, transformation and loading of data of the appropriate quality in the right place at the right time.  It is not responsible for defining where and what the right place is

Back to reconciliation

First of all then what is data migration reconciliation (AKA data migration audit)?  Well put simply it answers the business side question “How will I know that everything I had in my old system, that I wanted moving to the new system, made it across”.   It does not completely answer the kindred question “....and how do I know it landed in the right place” because that involves both data migration issues (did we move the data according to the specification) and target design issues (did the programme perform the data design correctly to match business processes so that the locations to move the data into were available with appropriate behaviours).

Next time out we really will look at these two linked questions.

Testing Data Migrations Step 1

Reminder Data Migration Matters 8 early bird tickets on sale until 17/4/15.

With the next Data Migration Matters event immanent, I intend to run this discussion up to the date of the event and we have set aside a session in DMM8 to discuss the vexed question of Data Migration Testing for which these blogs are the precursors.  So join the discussion on-line and then come along to DMM8 to make yourself heard - literally

If there is one topic that generates more online chat than any other in the Data Migration space, it is the one about Testing Data Migrations.  Check out the various forum and you will see what I mean. I am going to argue  that at bottom this is due to a confusion about what the Data Migration project is about and therefore how to test that it has been successful.

However before I go any further with this let me make it plain that I am not a test analyst.  I have the utmost respect for their craft and I do not want to invade their space.  So anything I say here is not intended to be a lecture to far more skilled hands than me in this area.  It relates specifically to the perceived issues of testing Data Migrations rather than testing in general.

It is also true that testing, just like every other aspect of IT it seems, has its own tribes.  And I certainly do not want to get involved in the internecine particulars of disputes of which, as I say, I am not really qualified to opine.  So if I talk about Test Scenario’s or Test Scripts or test cases please accept them as the words of an informed bystander not with the very particular meanings that one school or another will ascribe to them.

This being the first blog of a series I want to lay down some fundamentals therefore Step 1, and the subject of this blog, is that you can’t test quality into a product.  It does not matter if that product is a motor car, a fine meal or a data migration. 

Design and build quality in.  Test defects out.

A lil' more testing and I'm sure I'll be green

A lil' more testing and I'm sure I'll be green

This may sound like distinction without a difference but think about it.  If you wanted to build a motor car that was green and economical, one that managed 100 plus miles per gallon (35km to the litre for our metric friends), you would not start with a Cadillac Eldorado and try to test the MPG into it (if that is one of your quality exit criteria – the Eldorado is a perfect example of 1950’s American flamboyant self confidence and needs no improving).

Yet this is often precisely the puzzle we are trying to solve in our data migration testing.  We are looking at the issue of quality from the wrong end of the project timeline.

But why is this?  Well in part modern procurement processes are causing an issue with data design.  I have written about this before in other contexts but it is an issue that will continue to bedevil both procurer's and suppliers of new enterprise apps until the purchasers amend their buying practices and suppliers react accordingly.

In brief the move to fixed price contracts for new system delivery and the premium on time to market and price has meant that the suppliers have been forced to move data design down the timeline.  Depending on whom you employ, and the predilections of the buyer, we typically have a cascade of Discovery -> High Level Design -> Low Level Design -> Build -> Test -> Deliver as our phases.  The Low Level Design phase is a misnomer.  In the struggle to win business and keep time lines and costs down, the supplier is constrained to move the detailed work on what particular fields, including custom fields, will be used for and therefore the precise details of their go live start up values into the build phase.  Of course we in the Data Migration end of things need these exact details to perform our data migration.  By the time the Build is complete we get a cascade of data requirements with user acceptance testing, bulk load testing etc. looming.

The temptation to tacitly assume that we can sort it all out in the testing is just plain wrong.  The “Throw the data at the target and see what sticks” approach to data migration is sadly making a reappearance in the hurly burly of modern implementations.

On less well ordered projects this cascade can also be incoherent and contradictory and this is when the second confusion emerges....

Test the data migration not the data design

When time is pushed and the detailed data design arrives way down the timeline, it is easy to confuse genuine data migration defects (wrongly selected data, badly transformed data, incomplete data sets etc) with data design defects (different format requirements in different parts of the solution, incorrectly designed functionality etc.).  On a well run complex programme, roles and responsibilities for outcomes are understood in advance.  In a later blog I will show some techniques for managing this, but in essence data migration testing should be about what is says on the label – testing the data migration not the quality of the solution. 

This is also true of the gaps we in the data migration team are often the first to be aware of in the solution.  Having analysed the source data we may see that a particular object – say a working at height risk assessment – contains 10 fields in the source but only six in the target.  How this is best managed I will also cover later but for now I make the statement that it is a design not a data migration issue.

And one final, if not confusion then point of distinction – I prefer to separate data migration testing from data migration reconciliation (or as it is sometimes called data migration auditing). 

It may be that they answer the same question (did all the things I wanted moved from the source end up in the right place in the target) but because they typically require different techniques both in requirements analysis and in the migration build, it is less confusing if we separate them.

The next blog will be on reconciliation testing followed by end to end, user acceptance, functional, mock load and soak tests.  Finally, when we are agreed that we have the set of data migration test types we will look at how to manage the two confusions above.

I look forward to hearing from you either over the normal media or in person at DMM8

Data Migration Pirates and Brides

This week news on a link up between X88 and Experian and a moan about pimping the search engines. 

First let me boast at this point that the launch of this web site went excellently well.  It was well received with a few minor quibbles.  A few folks said that the navigation menu's were difficult to read in their semi-opaque state and this has been corrected.  Amazing what looks cool in the design studio but doesn't work so well in the real world.

There were also requests for a forum type page to be set up.  We are investigating the feasibility of this.  I'm for it personally my only concern is that as a lot of traffic (at the moment) is being directed via LinkedIn groups, and they have the same forum functionality. Would I be causing myself more confusion having parallel threads running in two places?  Personally I think it worth the risk but let's see eh?


Onto the subject of today's diatribe.  Having put up this website and therefore reviewed the Google search results for Practical Data Migration I find that there are imposters (or at least impressionist wannabe's) who are responding to the Practical Data Migration call without training, accreditation or  compliance.  Caveat Emptor as the Romans would have said (roughly translates as "Buyer Beware").  Genuine PDMv2 practitioners will be proudly baring their PDMv2 Accreditation badges on their site not masquerading behind a paid for Google page placement.

Perhaps I should be flattered that PDM is something worth pretending too I suppose.

I am refraining from naming names today but if these miscreants do not mend the error of their ways I may be forced to act. You have been warned.

Happier news for my friends at X88, responsible for the fabulous Pandora software have finally come clean about their intentions and entered into nuptials with Experian.   For those  in the data management fraternity here in the UK you may be more familiar with the QAS brand than Experian because QAS's data quality and enhancement tools are pretty well ubiquitous in the name and address space.  I certainly rarely go onto a site in the UK where  they don't feature.

Of course if your work takes you closer to the sales process then you will know Experian for its credit rating services rather than the QAS brand - but it is the data quality aspects we will reflect on here.

As the two newly weds bed in I have to congratulate the happy couple on what looks, from the those who know them, as possibly a marriage made in heaven.  On the one hand the blushing young bride Pandora, in the way of youth everywhere, challenging accepted norms with youthful dynamism.  On the other, the more mature Experian who already have a substantial presence in many industry verticals.  

It seems that the offspring of this union are already out of the nursery and making their own way in the world.  A surprisingly short gestation period one might say but possibly a testament to the fecundity between the betrothed.  It propagates  the benefit of the name and address awareness and data enhancement products in the Experian genome, within the heterogeneous profiling and  data quality abilities of Pandora and builds on the pre-martial relationship they enjoyed in collaboration over the QAS /Experian Data Quality tool.  Perhaps not surprising then that Experian Pandora (as they have named their progeny, which may not be startlingly original but is at least explicable) emerged fully formed from the coming together of the two parties - perhaps more Aphrodite than Pandora but who am I to extend this into a classicist discussion?

However putting aside the marital humour for a moment, and looking at this from a commercial and technical perspective I can see this as being a great success. Experian is, as I said, ubiquitous and trusted at the highest levels within the finance departments of most FTSE 100 corporations.  Enterprises are betting their level of bad debt and therefore profitability on the basis of Experian's knowledge of the customer.  It is also one of the few credit rating agencies to come out of the 2008 debacle with an unimpaired reputation.  However just as it is trusted so it is a little stolid.  X88 on the other hand is the new kid on the block, bringing disruptive technology to market. Combining the strengths of both of them - the trustworthiness of one brand and the brilliance of the other, the deep knowledge of data subjects (both people and other legal entities) of Experian and the technical innovation of Pandora,  could create exactly the right cocktail for a fizzing success. There is also the possibility of the contrary with the bureaucracy of an established institution stifling the fleet of foot brilliance of the upstart to the detriment of both.

However this blog wishes the newly weds well and will be keeping a watching brief.  Normally I would at this point give you the url to follow up this blog, but this romance has been of such a whirlwind nature that checking or or,uk or even I seek in vain any reference to it but I expect the blushing bride and proud groom to make public their private joy at any moment.

Johny Morris





Data Migration - What if it all goes wrong

This blog explains why we always need a fall back (or even a fall forward) plan.  Just in case.  How good is yours?

First of all welcome to this the first blog of the new site for Practical Data Migration.  I'm keen to hear your feedback either publicly via comments attached to this blog or if you prefer to castigate us in private please use the Ask Johny feature to contact me more discretely.  

And bookmark this site - we'll be adding to it regularly.

I would also like to thank my friends at the BCS for hosting Johny's Data Migration blog all these years.  They've been a great support  but with the new website up and running it's time to say good bye to the old and hello to the new.

Given that this is a new start I thought we should celebrate it by looking at the worst that can happen.  

After long months of preparation, the big day comes along.  Then disaster strikes.  Now this can be for no predictable reason.  Of course if we have done our preparation correctly the data will be ready, the technical processes will have been tested.  The business processes will also have been briefed out.  Still things can go unexpectedly wrong and then we need a fall back.  One that has been thought out in advance and if not necessarily tested (as the cautionary tale that follow shows testing is not always possible) at least seen to be feasible.

Unfortunately It was one of my ex-clients (Network Rail) who presented us with the Christmas gift of the almost perfect example of how not to manage a situation where a transformation project has not run according to plan.

For those not familiar with the structure of the rail industry here in the UK, Network Rail is a not  for profit company that owns the rail ways but they neither maintain the infrastructure nor do they run the trains.  There are a number of franchises, geographically and route based let out to independent Train Operating Companies (or TOC's in the parlance of the trade).  The engineering is carried out by infrastructure companies (or Infra Tec's).  So although Network Rail can commission rail work it has no direct control over its delivery.  This is a situation familiar I think to most of us working in IT transformation delivery.  The client commissions but someone else delivers often with a number of sub-contractors for specialist elements

Mostly in the private sector these disasters occur behind closed doors and the public is none the wiser.  Just occasionally the whole mess becomes public.  Then a besieged CEO has to stand up in front of the camera's and deal with the fall out.  Please step up Network Rail's managing director Robin Gisby to explain on National Television how overrunning engineering work left thousands of passengers without trains on the Saturday between Christmas and New Year.  His next public appearance may well be before a select committee of the houses of parliament where he will receive a further grilling.

PASSENGER concourse king's cross 

PASSENGER concourse king's cross 

The fault, it seems, was in the breakdown of some machinery during a multi-million pound maintenance activity outside London's King's cross Station followed by over optimistic planning for the safety commissioning of the new system (and let's be honest, no one want's that skimped on).  For those not familiar with the weird topography of the major rail routes into London there are three major stations within a mile of one another that  serve as termini for all destinations North of London plus one station a few miles away in the City.  Each station however serves local trains as well s long distance and there is overlapping provision and re-use of the same track. This is a bequest from the profit hunting Victorian railway entrepreneurs who competed for routes as opposed to most other nations more planned approach to transport provision.

All a bit of a messy inheritance but the effective closure of King's Cross and signal problems on the lines out of Euston plus issues elsewhere on the system created mayhem in that busy period between Christmas and New Year.  

So to re-cap.  A complex legacy, a major transformation project, tight implementation deadlines, all delivered through a complex network of suppliers.  Sound familiar?

finsbury park station

finsbury park station

the lock out

the lock out

And then it starts to go wrong.  

Anyone who has been on one  of our courses will know just  how  much we stress a fall back plan which puts your in a situation that  allows business as  usual with minimum customer impact.  When the changes you have  made cannot be just rolled back and the old regime reinstated we may have to fall forward.  Railway engineering work tends to be in this category but so do some IT system changes.  Ideally the fall back plan should be tested, but this is often not possible for technical, financial and programme reasons.  In fact fall back plans are rarely tested. They do have  to be tenable however.  Now  I have no inside information on this one but the Rail Track fall back plan seems to have either been made up on the fly or possibly on the back of an envelope.  Up  the line from King's Cross is Finsbury Park.  This is a busy suburban station in its own right. with overground and underground lines intersecting.  But it only has five lines and six platforms.  King's Cross has 12 platforms.  Suburban trains are typically 3 carriages long, inter-city trains are typically 11.  So the plan to halt all the incoming trains at Finsbury Park and get customers to schlep up there with their suitcases, push chairs, aged relatives, children, wheelchairs etc. was clearly floored.  The result, as should  have been anticipated, was that Finsbury Park was mobbed to the extent that incoming trains could  not disgorge their passengers because of the  crush and the station had to be repeatedly closed for safety reasons. When passengers did hit the platform the confusion was so great that getting on the right train was a lottery and this caused knock on problems up the line.

The lessons here are obvious. Firstly, never go into a change programme without a workable fall back strategy  Secondly you can delegate the task but you can't delegate responsibility. Finally if it does all go wrong, get out in front of the camera and apologise quickly.  Mia culpa not excuses.  

In the words of the Duke of Wellington - there but for the grace of god go I.  Things can always go wrong.  Most of our migrations are amenable to a graceful fall back to existing systems but in the 24/7 world we now work in that may mean applying updates that have been backed up in anticipation of a go live to the legacy as opposed to the target as planned. Sometimes we can't fall back to the legacy.  Be prepared!

From next week I will be running a series of blogs on the fraught question of testing data migration projects. This is a topic that  is often raised and I'm going to throw my thoughts into the public domain and see if we can't get a debate going.  I look forward to hearing from you.

Johny Morris


Big Data, Big Hats

This blog was first published in 2012 and is one of the most widely quoted of all my efforts. As you will see it was a lot of fun to write - I hope it is as much fun to read.  Some of the links probably no longer work but I don't think that detracts from its main message.

Is Big Data just another Marketing wheeze and does it suffer from the same semantic issues that have bedevilled other MI/BI/DW oversells of the past 20 years?

Read More

Data Migration – The Agile Experience

This blog was first published on the BCS website in February 2013.  In the nearly two years that have elapsed since it was written our thinking has developed and a white paper encapsulating this will be published shortly

Agile is definitely the development method in ascendancy at the moment.  Seems wherever I go, or any clients I speak to, they are either doing Agile or wanting to be Agile.......

Read More