Data Migration – The Agile Experience

This blog was first published on the BCS website in February 2013.  In the nearly two years that have elapsed since it was written our thinking has developed and a white paper encapsulating this will be published shortly

Agile is definitely the development method in ascendancy at the moment.  Seems wherever I go, or any clients I speak to, they are either doing Agile or wanting to be Agile.  The old cynic in me, of course, sees this as the classic IT bandwagon.  There is a problem – developments bogged down in analysis paralysis or taking years to deliver value.  There is a solution – shorter, iterative cycles, delivered by smaller groups, producing tangible benefits to the business, earlier.  Then along come the mavericks, the chancers and the bandwagon followers.  Anything and everything gets branded Agile.  There are multiple flavours of Agile. 

Everyone does Agile but everyone, it seems to me, is doing something different.

The “seems to me” is important here.  I am not pretending to present a scientific study here just my impressions of the sites where I have worked that have claimed some degree of Agile credential.  To give the whole thing the benefit of the doubt I am going to be even more selective and not choose projects where the degree of dysfunction suggests something more like “Dancing With Two Left Feet” than Agile.  I am happy to be corrected by anyone out there but be warned, as previously stated, my curmudgeonly status makes me suspicious of the perfect solution that exists just over the hill, just out of sight.  Real attestable examples only please.  I have heard of too many of these Eldorado’s in the past, associated with too many initiatives that have all had their day and faded into well justified obscurity, to be easily gulled.  You will have to convince me that you have a solution to some in-built issues with Agile and Data Migration – well better solutions than mine in any case.

So first the good points about Agile, and they are legion.  In the best run establishments, using Agile delivers real value real quick.  With two week sprints, enhancements are hitting the users regularly.  Maybe not every two weeks, writing and testing software just may not be capable of being delivered in that time frame but quickly enough to astound us old timers who have known two year development cycles.  Heck we’ve known twelve month plus delivery cycles after the software is written and before it goes live for testing and configuration.  Tangible changes, of value, every four weeks are enough to maintain the interest of our colleagues in the real business.

The rapid implementation and constant delivery helps remove the estrangement of IT from Business.  We just get to see a lot of each other regularly.  All those soft inter-personal factors that so aid delivery are maintained.

But I am not here to sing the praises of Agile, there is enough acreage of comment doing that.  I’m just going to focus on how we fit our Data Migration work within the Agile approach.

The first issue I have is the metronome of Agile.  Chopped into an unremitting fortnightly (or weekly, three weekly etc.) rhythm is great for galvanizing work teams who are, after all, completely orientated to project delivery, but out there in the real world of the business there are other beats going on.  There are monthly cycles, weekly cycles, quarterly and annual cycles; there are indeterminate sales cycles often dictated by agents outside of the enterprise; there are incoming disturbances caused by events.  All of these impact on the ability of the business to engage with the team.  A friend of mine was given the role of Product Owner for an Anglo-Dutch-American concern.  Her day job meant that she was away at conferences around the world at least six times a year and of course each conference meant an additional work load of preparation and follow up.  She also had regular meetings either sides of the Atlantic and the North Sea.  The time she could give to the scrum was therefore uneven.  This intermittent lack of availability, although extreme, is not unusual.  It is probably even more galling for the IT team to be able to see their chief contact across the office but be told that they are not available than it is to know she is in Acapulco.

The second issue, when it comes to my neck of the woods, is of duration.  Many data quality issues can only be solved in the business by the business.  But, again, there is a question of time.  If you are going to find and correct the contact details of your top 500 customers then you may expect it to take two weeks.  That would be fifty customers a day.  However if you are answering the phone and taking orders and resolving production issues at the same time, expect that to drop to ten customers a day.  Suddenly we have gone from a two week, to a ten week turn around.  If resolution involves contacting customers and then awaiting their response, all predictions are inexact.  We have no control over their response time.

Duration is also a problem when it comes to preliminary investigation.  The initial investigation of the legacy data stores can also challenge time estimates.  Within PDMv2 we manage this by time boxing and prioritizing via the DQR process.  However, on a large project with more than one scrum operating at once, but many of them requiring legacy data, and given the plethora of existing legacy systems, it is beyond the capacity of any one scrum to synthesise the most appropriate data out of hundreds (sometimes thousands) of systems.

From my observation the solution to this in practice – and here I am talking about well performing teams in well performing environments – is either to plough on regardless, grabbing data from the most obvious source and forcing it to fit - or pushing responsibility back onto the Product Owner as the user representative to get the user data.  The first solution is doing all the damage that ten years of PDM has been trying to avoid - rushing into ill-considered decisions because of time and resource constraints.  The second is relying on the SEP (Someone Else’s Problem) principal that creates a huge gulf between the Business and the Technologists.  Why should my friend the, international marketer, have any knowledge of data profiling, data discovery and the myriad other tools and techniques you need to get the best data?  Shouldn’t she expect the IT spods to do this for her?  The dreaded responsibility gap starts to loom.

So what is the solution?

Well the easiest answer, from a PDM perspective, is to treat the scrum(s) as a classic example of the DMZ.  Within the Demilitarized Zone the technical delivery proceeds.  Data issues are fed out to the DQR process.  Metadata knowledge is fed in from Landscape Analysis and Gap Analysis & Mapping.  User requirements of the migration (like migration audit or data lineage) are fed from System Retirement Plans.  There is sufficient similarity in philosophy and practice between the operation of PDM and all the Agile brands (Scrum, Kanban etc.) to allow the two processes to sit side by side.  This is an approach that works.

More interesting is the prospect of taking the Data Migration components and trying to encapsulate them in a scrum or scrums of their own.  I’m not sure, given the proximity to the business and the rhythm issues outlined above, how this would work but it would be exciting to try.  Anybody out there who has – please let me know how you got on.

I realize in the above I have played fast and loose, confounding the various flavours of Agile.  Firstly, in my defence, I think my main points stand whatever the flavour of Agile you encounter and secondly my experience on the ground has been that although each client is honestly endeavouring to stick to a single approach, what I am finding is more syncretistic than single minded.

I look forward to your comments and look out for the white paper to be published shortly.  This takes the points on the use of the DMZ and a dedicated data migration scrum and expands on them based on practical case study experience.

Johny Morris