Monday, November 16, 2020

Not just during Covid-19: Time for data masking

Facial data mask

If you follow common sense or health agency / government instructions, you probably wear at least a basic face mask when mingling with other people. Mostly, this is to protect others from your droplets, possibly with some virus-laden payload. If the mask is more advanced, it is capably of filtering out stuff that may harm you.

Similar to face mask, data masks have been around for a long time. Data masks and face masks share the same purpose, preventing leakage of damaging "tiny bits". In the database system Db2, data masking is already built-in. You only need to apply data masking by defining masks (styling and sewing your own mask) and enabling them (putting it own). Here is how.

Data masking

The term data masking covers (pun intended) the various methods to hide all or parts of original data and instead providing some modifed data. Thus, data masking is essential to privacy and data protection. Another term often used is data obfuscation because of how some data masking techniques work.

You may have noticed data masking in action. On payment receipts, parts of your card number is replaced by X or *. In letters or on the phone, reference are made to IDs starting or ending with some numbers without giving away all data. You may have seen declassified reports with parts blacked or deleted. All these are data masking techniques. Here are methods applied to the same input:

Data masking applied

Data masking can be statically applied or handled dynamically depending on who is accessing the data. Masking can be performed by tools outside the data source or by the system itself. Db2 has masking capabilities built-in as part of the Row and Column Access Control (RCAC).

Db2 Row and Column Access Control

One of the many data security features in Db2 is the Row and Column Access Control (RCAC), sometimes called fine-grained access control (FGAC). It allows to filter out rows based on access permissions and to guard column data based on masks. A mask can be defined with a CREATE MASK statement which roughly follows this structure:

CREATE MASK maskName ON tableIdentifier
FOR COLUMN columnIdentifier
RETURN dataMaskingExpression
ENABLE;

The dataMaskingExpression is can range from a simple expression like always returning the last few digits of a number to a complex CASE expression. The latter can be used to mask, transform or obfuscate the original data depending on the user, group or role accessing it (see the docs for examples). Thus, certain privileged roles are able to read the actual data while others might see only a fraction or nothing at all.

Conclusions

Face masks and data masks have commonalities. Both protect against leaking harmful particles. Face masks should be worn during the Covid-19 pandemic, data masks are a common tool to enhance data security and to protect data.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.