OTREC research examines ways to anonymize location data
Posted on July 7, 2014
A new OTREC report explores an innovative technique for making household travel data more widely available without compromising individual privacy.
Public agencies spend vast amounts of money collecting information in household travel surveys.
In the report, Wider Dissemination of Household Travel Survey Data Using Geographical Perturbation Methods, lead investigator Kelly Clifton of Portland State University examines ways to make that information more accessible by planners and other professionals.
Survey respondents are guaranteed anonymity in exchange for their participation. In addition to asking which modes individuals use to get around, surveys learn where they live, where they work, their household sizes and demographic information.
Detailed geospatial referencing of the home, work and other travel destinations is common practice.
Such data can be of enormous use to planning professionals, but its dissemination must be balanced with the need to keep locations confidential.
To protect this confidentiality, data are often aggregated to a geographic level such as census tracts or transportation analysis zones (TAZs) before being publicly shared.
This limits the utility of the information. Details are lost with data aggregation. For example, walking trips can be affected to a large degree by the built environment. If all pedestrian trips are aggregated up to a larger zone, then questions about how they were affected by the built environment cannot be answered.
To allow more precise data to be more widely distributed without sacrificing participants’ anonymity, Clifton took a deep look into other geographical masking methods.
With the help of graduate student researcher Steven Gehrke, Clifton reviewed various methods of geo-masking, also known as geographical perturbation.
Their goal was to develop a conceptual framework to guide geographical perturbation efforts.
After looking into several methods, they tested one of the more promising methods by actually using it on household survey data for the Portland, Ore. metropolitan region.
With this process, researchers aimed to quantify the concepts of disclosure risk and data utility, in order to improve the understanding of their tradeoff.
The method they chose for empirical testing is known as the donut masking technique.
In this technique, as shown in the image below, a “donut” is defined around each protected point.
The inner ring of the donut, with the protected location in the center, is the anonymity zone: public records will not show that the point is located anywhere within that circle.
The boundary of the donut’s outer ring is defined by the data custodian; in urban contexts, it usually corresponds to an accessible walking distance. In more rural contexts, the outer ring may need to extend further where there is less population density.
The data points are then randomly re-distributed on a map so that they fall somewhere within the donut, between the inner and outer rings.
When the material becomes public, planners and other professionals will have more spatially sensitive information, without having access to the true locations of an individual’s home or workplace.
To test the donut masking technique, Clifton's team used it to explore the connections between 4,824 households and five measures of the built environment in the Portland, Ore. metro region.
They analyzed 25 scenarios, and each time, conducted an analysis into the vulnerability of a sampled household to identity disclosure.
They found that data custodians using this method must be aware of a sensitive "tipping point" between disclosure risk and data utility. Guidelines for calculating the position of the inner and outer rings, and discussion of other factors to consider, can be found in the final report.
Visit the project page for more details.