Data Migration, Part 4/4, Automation

Data Migration Testing revolves around testing the integrity of Data, content and the quality of data between Source and Destination DB based on the given mapping rule.

Broadly, any data Migration has the following:

– Table level mapping

– Element level mapping (column level)

– New elements in the New DB

– Ignore few old elements from the Legacy DB

– Not to be converted data from Legacy to the New system.

– Count Check

Table Level mapping defines how the tables from the old DB should be mapped to the new DB. Based on the mapping rules and how the new system is designed, the table level mapping could be one to many or many to one or one to one.

Element Level mapping defines how each column in the legacy system is mapped to the columns in the new system and this could be also be one to many, manty to one or one to one.

New elements in the new DB are the columns which is very specific to the New DB and not existing in the Legacy

Ignore elements from the Legacy system are the columns which are being exempted from the mapping rule and never brought to the new system.

Not to be converted data are the records from the Legacy DB which are being exempted from getting converted to the new DB for whatsoever reason.

Count Check This is one of the most important validation check to ensure that the source count is in sync with the target count check based on the given mapping rule.

Essentially every automated/manual testing for Data migration projects focuses on the above validation points.

The very fact that we deal with millions and millions of data, manual testing of data migration project can be next to impossible and the situation demands  an automated process and everytime there is a new extraction of data and any changes to the code, we should be able to trigger the automated scripts to check the sanctity and integrity of the data.

This could be achieved with a bunch of SQL scripts which compares the Source data to the Target data based on the mapping rule against each source and target tables.

A simple methodology without using any expensive off the shelf testing tool is to use Excel Macros.

– Get the macros in Excel to connect to Source and Target DB.

– Store the SQL scripts which  validates each table and element level mapping in the macros

– Store the result of the execution back to the Excel which basically confirms if the mapping passed/failed.

The Pros:

1. User-friendly and doesn’t need anyone to learn and understand the complexities of some off -the shelf expensive tools.

2. All the sql scripts being used as part of the test execution are stored at one place and are quite easy for reference.

3. Any mapping rule changes would mean a change in the SQL script and it’s easy to go and make the SQL changes to the corresponding script.

4. It’s very likely that the testing environment (source and destination DB) could change with every execution depending on its availability and the same could be configured in the Excel as a drop down.

5. As the actual execution happens against the DB, validating millions of data is not a challenge and the excel row limit doesnt kick in as only the results of the execution gets stored back to the Excel.

6. Its quite easy to filter the failed test cases and focus on the issues and re-run the same once the issues are fixed.

7. We can even configure the macros to run for specific set of tables, failed test cases rather than going for the entire test suite based on the regression cycle needed for a specific run thereby saving time.and effort.

8. As its excel we could get some fancy reports once the run is complete without depending on some expensive tools and its reporting services.

9. We could even integrate this with the unit test cases of Development team and run it as and when they complete their coding for each tables thereby delivering the QA much matured and bug free code.

10. On an average, 2000 test cases with as much SQL queries takes about the 30 mins to execute and report depending on the load on the DB at a given point of time which signifies the amount of time getting saved as against executing as much test cases manually.

11. All this without having some expensive data migration testing tools thereby saving a lot of cost to the project.


The major factors which should drive QA automation for any project and not particularly migration project is a cost-benefit analysis. It makes sense to go for expensive off-the shelf automation tools if and only if the returns justify the investment. As long as if we can achieve even 80 percent of the efficiency by any in-hand existing inexpensive tool, we should always shoot for the same and get the best out of it.

As for Data Migration project, the biggest challenge is always the sheer quantity of data to be tested and getting a quality output. Most of the times, the very databases where the data gets loaded with a smart combination of excel macros could achieve the desired results.