Decreasing Data Precision to Achieve Increased Sample Sizes: A bold, wise move from Statistics Canada

Statistics Canada recently announced a trial usage of Random Tabular Adjustment, RTA in their publicly released datafiles (read the academic paper by Mark Stinner here). RTA is a fancy name for a fairly simple process that will give external researchers access to more data about more people.

What is RTA?

Let’s say my income last year was $47 362.83. With RTA, my income would be adjusted by adding a random number to it. If you’re familiar with Excel spreadsheets, think of it as creating a random number within a lower and upper bound and recoding my income into 47362.83 + RANDBETWEEN(-100,100). In other words, my reported income would be reported as somewhere between 47262.83 and 47462.83.

If everyone’s income in that datafile is treated in the exact same way by randomly adding a number between -100 and +100 to it, the overall summary numbers remain the same. The average income remains the same, the standard deviation remains the same, the median remain the same. Every metric used to describe the original set of incomes and the adjusted set of incomes remains the same. Every generalization arising from the original set of incomes remains the same as from the adjusted set of incomes. Nothing changes except for an inconsequential feature of the specific numbers.

Why use RTA?

Some numbers are very telling, easily searchable, and findable in a database. If you happen to know my income, you could easily find me in a database and subsequently read every other variable associated with my income, perhaps my health care details or how much debt I owe to various sources. Perhaps only one household in the area has 12 children. If the dataset said 12 children, everyone would know it was THAT household. Their precise income could be revealed along with any personal information. Their personal privacy would be lost.

Historically, StatsCan decided that data that would lead to the lost of personal privacy would not be shown. Thus, maybe the entire income variable wouldn’t be included in the dataset. Or, that household of 12 children wouldn’t be included. The loss of these details would mean that learnings about the larger household would be lost. We wouldn’t have the opportunity to better understand what types of social and community services a larger household needs or uses. We wouldn’t know if they’re too far to access libraries or physical fitness or voting facilities, or if they would benefit from more programming for infants or for teenagers. With RTA, the number of children in the family might be reported as 8, 9, 10, or 11 – still large enough to identify them as having many children, but not too large to identify the precise household.

By implementing RTA, we can ensure their privacy is retained and therefore include their information in publicly released datasets.

What’s next for Statistics Canada?

Statistics Canada is trialing this procedure with the Survey of Innovation and Business Strategy 2017. If all goes well, and they can confirm that privacy is maintained and the adjusted data remains useful, they will continue to test it on further datasets.

What’s next for researchers?

If you’re not completely convinced about the use of RTA even though statistics experts at the highest level in Canada have agreed to test it out, think about it like this. ALL market research data already incorporates something like RTA. We think about it in the form of random error or margin of error or confidence intervals. We know that every data point in our datasets is an estimate that has been affected by sampling error or non-sampling error. People make subjective estimates about their dynamic opinions and their ill-informed purchases and attempt to quantify subjective opinions with precise numbers. Researchers create awkwardly worded questions, mishear responses in interviews, and make data entry mistakes. Our data already incorporate random (hopefully!) adjustment. RTA is no different. In fact, it’s likely a more truly random form of adjustment.

How can researchers use RTA?

Perhaps qualitative researchers conducting individual interviews who specialize in small sample market and consumer research have the most to benefit from RTA. When revealing PII could be a problem in the creation of reports, think about whether RTA could alleviate the problem.

You might like to read these:

Decreasing Data Precision to Achieve Increased Sample Sizes: A bold, wise move from Statistics Canada

Related Posts:

Recent Posts

Recent Comments

Contacts