WASHINGTON – Steps taken by the Census Bureau to protect individual responses may muddy cancer research, housing policy, transportation planning, legislative map-drawing and health care policy, researchers have warned the agency.
The problems come from a new policy – differential privacy – that adds “noise” to census data to help prevent outside attackers from identifying individuals among public data. However, the agency's latest test of the policy created what researchers called absurd outcomes: households with 90 people and graveyards populated with the living. Such results could skew a count used to redistribute political power and $1.5 trillion in federal spending nationwide.
Some researchers at a national statistics conference last week in Washington, D.C., called for the Census Bureau to delay formally rolling out the new policy until more research can be done, citing its uneven impact on communities and systemic bias against urban areas. The results of the current algorithm could throw off state budget distributions, health care planning, emergency preparedness and dozens of other areas, argued Nevada state demographer Jeff Hardcastle.
“You can make a bad decision with good data, but it is much harder to make a good decision with bad data,” Hardcastle said.
Researchers have been applying the differential privacy algorithm to a set of 2010 census data used as a test, systematically adding noise with a specific “privacy budget,” a numerical value that helps govern how much the data is changed before public release.
The policy change came after Census Bureau research found that an attacker could isolate almost half of the respondents using public data, allowing them to be cross referenced with commercially available medical data, purchasing information and the like. The agency adopted the new method in response and wants feedback by next summer on how useful the data still is.
Researchers and policymakers have pushed back on the current iteration, though, saying it would throw off state tax disbursements in Tennessee, community planning in Alaska and emergency planning in New York. Joe Salvo, director of the population division for New York City's department of planning, said the agency needs to fine-tune the results so they are still useful.
“Differential privacy is here to stay. Essentially, we are going to have to recognize that and we are going to work for an optimal solution in the noise-privacy continuum we are working with,” Salvo said.
The changes brought on by differential privacy particularly seem to throw off counts for small, minority communities.
Native American communities have been especially affected. During a Census Bureau advisory committee meeting last month, Nicole Borromeo of the Alaska Federation of Natives called on the agency to consider Native Americans and Alaska natives as a separate case when looking at differential privacy.
Researchers such as Nicholas Nagle of the University of Tennessee said the changes can have a range of unintended effects, including on states such as his that distribute state tax receipts based on population. His research shows that towns would gain or lose revenue, randomly, based on the changes resulting from differential privacy.
“We have never assumed the number of people in a place was something worth protecting,” Nagle said. “That has been assumed it was public information. We're proud of it. We advertise it on road signs.”
A coalition of state and federal data users sent a letter earlier this month to the Census Bureau calling for more clarity on the policy and raising concerns about the role data scientists played in the process.
“We are particularly concerned that insufficient analysis has been conducted regarding how (differential privacy) will affect the Census data used for informing policy and allocating public and private funds,” the letter said.
Researchers at last week's conference said the threat of reidentification is real.
“I'm terrified of what is going on,” said Danah Boyd of Microsoft Research.
She said the issue for researchers is not just what data points a respondent might consider public information – their race or age, for example – but what parts an attacker might use to cross reference with commercial data about purchases, medical history and other private information. That could then be used to hurt the same vulnerable communities that demographers, social scientists and others want to help.
“We are talking about voting rights and we are talking about deportation. We are talking about access to services and we are talking about getting kicked out of housing,” Boyd said.
Census Bureau officials acknowledged several problems with the current iteration of the policy and said some – such as overpacked households – may be fixed. Other issues, said Census Bureau official Matthew Spence, may turn into a policy tradeoff.
Taking some suggestions, such as making population at the smallest level of geography fixed, may make other problems worse because “as we add invariants it adds to noisier results elsewhere,” Spence said.