Wednesday, May 5, 2010

Online privacy can't be evaluated on a human scale.

We need to stop thinking of privacy in human terms. The constant surveillance that happens online is not like the East German Stasi in the movie Leben der Anderen: hiding upstairs with headphones on. That image of online surveillance misleads us, because it puts the data collection in human terms. While most of us probably realize that the people at Google (which hosts this blog) could be reading this post as I type, even before I submit. While that is kind of freaky, it doesn’t really worry us, because we know that they aren’t wasting their time doing that. (If I’m wrong, Google Interns, say hi in the comments!)

So by thinking about privacy violations on a human scale, we convince ourselves that even though the capability exists to track us, our privacy is only potentially violated. For our privacy to actually be violated, someone (Google, Facebook, or the FBI) would have to specifically notice us. And our individual activity on the web is not merely a needle in a haystack, it is one needle in the world’s largest pile of needles. Face it, we think to ourselves, we’re just not that interesting.

But this is simply not the case. As we know, information about who we are and what we do drives the advertising-based Web economy. With virtually unlimited and virtually free storage, it doesn’t make economic sense not to collect any piece of information that can be collected. As a recent NYT article showed, retailers and marketers want to know exactly how their marketing campaigns are working. With web coupons, the article states, “a retailer could know that Amy Smith printed a 15 percent-off coupon after searching for appliance discounts at on Friday at 1:30 p.m. and redeemed it later that afternoon at the store.”

Notice how that sentence is written: A retailer could know. This again implies a human retailer looking at the data. Really, if this data has been collected, it is already 'known' in the sense that it will be used in data analysis. And the consequences of our lack of privacy that result from the database sense of knowing are very different than those from having the human retailers and marketers know your behavior.

Rather than a person knowing discrete facts, the database allows your data to be carefully analyzed as part of the aggregate. And when you analyze such a huge pot of data, you start finding odd correlations.

At the risk of sounding like a someone from the tinfoil-hat crowd Let me take two examples that show how real harm can result from data that is, objectively, harmless.

The first example is the case of Maka Mini Mart in South Seattle, a small store serving the local East African immigrant community. In early 2002, in the wake of the 9/11 terrorist attacks, Muslim run businesses were under scrutiny for ties to terrorist organizations, and based on the records of the electronic debit cards that replaced paper food stamps, Maka Mini Mart looked suspicious. According to this Seattle P-I Article, the suspicious transactions included large purchases made minutes apart and transactions for even-dollar amounts, unusual for food purchases. As a result, the USDA ‘permanently disqualified’ the store from the food-stamp program, immediately eliminating virtually the entire revenue of the store.

Eventually, the USDA reversed its decision against this and a few other Somali markets in Seattle. The unusual, even-dollar purchases, it turns out, were a result of the community's practice of buying meat by the dollar, rather than by the pound. (Isn’t it interesting that it is more typical to by meat by the pound but gas by the dollar? I’ve never seen anyone say 5 gallons on pump 3) The multiple large purchases were because people in the community tended to go shopping together, and buy in quantity.

These are all perfectly appropriate behaviors, and wouldn’t raise the suspicions of any human watching, but when computer analysis looked for suspicious anomalies, suddenly this business was practically shut down.

Take a second, more hypothetical example.
A lot of employers, particularly in the public sector, do background checks before hiring someone. Imagine a not too distant future in which a company includes aggregated online history in their background checks. The company won’t tell your employer specifically what you do online (that would be an invasion of privacy that no one would stand for), but they have an automated system that looks for patterns and can generate something like a credit score. Just as a credit score is used to predict how likely you will be to pay back your debts, your background score might predict how likely you are to embezzle, have a drug problem, get into fights etc.
As I said, when you analyze data, you can find strange correlations. Let’s say, a correlation is found between pedophilia and being a Star Trek fan.

Even though, statistically speaking, that correlation doesn’t mean Star Trek fans are likely to be pedophiles, (it means pedophiles are likely to be Star Trek fans) it is easy to imagine that subtle distinction getting missed (or ignored in the name of being extra thorough) in the algorithms that generate these scores.

So, if you spend a lot of time browsing Star Trek forums, your background score might show that there is just some small chance you might be a pedophile. If you were hiring someone for a position working with children, what would you do?

No reasonable person would deny someone a job working with children solely because they like Star Trek, but in this hypothetical situation, no human would see that data in context. Maybe listening to a certain type of music will be found to correlate with posting hate speech in blog comments. Buying spinach on sale with a loyalty card might correlate to some other undesirable behavior.

Tons of individually insignificant pieces of data are being collected, stored, and analyzed. Which bits of data are truly harmless and which wind up having consequences remains to be seen. We don’t know how data being gathered now will be used in two, three, or ten years. What we do know is that it isn’t prying human eyes we need to worry about. Whatever privacy implication there are, they aren’t on a human scale.

No comments:

Post a Comment