Creating secure test data to test systems

Tamer Salman
Editor’s note: This article is authored by Tamer Salman, senior researcher of Security & Quality Technologies at IBM Research – Haifa.

How does an insurance company, financial institution, healthcare organization, or government body get personal and confidential data to develop and test the performance of new software? The challenges associated with managing personal and confidential data are huge, especially with the increasingly stringent data privacy regulations. Some data is private and confidential, other data may have been redesigned and transformed, and some may not exist at all. Typically, project leaders or database administrators will set up an additional environment for development and testing. The big challenge is how to populate it with data.

With expertise in constraint satisfaction and automatic test generation, IBM researchers in Haifa developed the Data Fabrication Platform (DFP). It’s a solution that efficiently creates high quality test data, while eliminating any potential data security or privacy concerns. The platform is already helping a large insurance company revamp their current processes around test data.

Generating masses of personal (but fabricated) data

For most situations, generating the mass of data needed involves in-house scripting, simple data generation techniques, manual insertions and updates, and a lot of masking and data scrubbing. Even after the test data is ready, the requirements can change during development, rendering the current data useless and necessitating a repeat of some processes. The result is a tedious, costly, and time consuming process that doesn’t necessarily deliver results.

In order to accommodate distributed and outsourced development and testing, our client needed test data that would not be susceptible to leaks or breaches in security and privacy. They also need the ability to transform and evolve the data as the business needs changed or were updated. DFP does this by allowing for rule sharing and migration. It also minimizes test-data generation efforts by eliminating security and privacy concerns, and offering support for early development and regression tests. 

Data rules

The logic of what’s needed in these secure, confidential instances can be described using various rules that define the relationships between different columns in your databases, resources for populating new data columns, or transformations from archived data. DFP lets companies put these rules into the system, and get the data needed as output. The platform consumes the provided rules and generates the requested data, which can be automatically inserted into the target databases, or any of a variety of formats, such as XML, CSV, and DML files.

At the heart of the DFP lies a powerful Constraint Satisfaction Problem (CSP) solver, also developed in Haifa. A CSP typically involves numerous possibilities that can’t be solved automatically by straightforward algorithms within an acceptable amount of time. A form of artificial intelligence, the CSP solver from IBM solves these unique complex problems using it's ability to arrive at many more buildable solutions than traditional optimization approaches. The CSP solver provides accelerated solutions and helps eliminate errors by generating only data that is valid for the specific requirements.

In summary , the IBM Data Fabrication Platform is an easy to use technology that allows for rule sharing and migration, minimizes test-data generation efforts, eliminates security and privacy concerns, and makes it easier for companies to outsource development and testing. 

Labels: , , , ,