I found Martin Eberhard, co-founder and former CEO of Tesla Motors, in the pages of 2600.
Nursing the best dark brew I’ve ever had, I moved from a great article on free global phone calls to another on the language of gang signs, ultimately landing on a column signed not with an anonymous pseudonym but by Martin Eberhard, co-founder of Tesla Motors.
The subject? Engineering a “patriot hack” to protect privacy online. This, I remember thinking, should be interesting…
It was so interesting, in fact, that I reached out to Martin after my bear-rich Pacific Northwest roadtrip and asked for permission to reprint his article here. He graciously agreed.
This article is broken up into four sections, which I titled:
The Patriot Hack – From China’s Firewall to Lockpicking (15%)
The Political and Technical Landscape (60%)
Strategies to Protect Your Privacy (10%)
The “Haystack” Call to Action (15%)
If you want a quick read and aren’t interested in the political or legal aspects, just jump over the second section.
I hope you find this as thought-provoking — and practical — as I did.
The Patriot Hack – From China’s Firewall to Lockpicking
How long can the regime control what people are allowed to know, without the people caring enough to object? On current evidence, for quite a while.
So concludes James Fallows’ article titled “Penetrating the Great Firewall” in the March ’08 issue of The Atlantic. The Chinese firewall is a crude but effective system that looks at every single Internet connection in the country, and decides whether or not the user may proceed, based on policies set by the government. If a Chinese citizen looks too hard for information about, say, Tibetan independence, the Tiananmen Square massacre, or Falun Gong, not only might her search be blocked, she is also inviting a visit from the police.
An outrageous invasion of privacy, isn’t it?
Reading Fallows’ article immediately made me think about how to get around the Chinese firewall, and made me wonder how many people there already have. I guess it’s the hacker instinct in me – I go straight from being outraged about the invasion of privacy to wondering how I might hack it if I had to.
I figured out how ordinary locks worked sometime in junior high school, and soon thereafter, I figured out how to pick these locks, how to make keys for them without fancy locksmith machines, and how to re-key locks my way. Soon thereafter, I discovered computers, which definitely were not personal in those days. I got kicked out of my 10th grade computer programming (Fortran) class for allegedly loading something into the school district’s mainframe that brought the whole thing down. (No comment.) In those days, such security systems were challenges – picking the lock was an end to itself.
As I grew up, I channeled this energy into getting a decent engineering degree, then into becoming an entrepreneur. I guess you could say that Tesla Motors was my first try at hacking the global energy system.
The Political and Technical Landscape
Meanwhile we are busily transforming the “Land of the Free” into a high-tech surveillance society of our own. In the name of preventing terrorism in this post-9/11 world, we have come to accept the Patriot Act, video cameras watching us along highways and intersections, more video cameras in other public places, invasive airport screening, scrutinized financial transactions, widespread wiretaps, surveillance of our online activities, efforts to create national identity cards, face recognition equipment at sporting events, and lots more.
Alarmingly, we give up our privacy not just to protect ourselves from terrorists, but also for mundane convenience: “preference” information gathered by online retailers, credit card usage data, ubiquitous RFID tags embedded in consumer goods, “club” discount cards at supermarkets, deep personal information posted at social networking sites and then sold to marketers, open wireless networks, etc.
In this article I focus on the ocean of data collected about us by search engine companies.
We know that search engine companies collect and save massive amounts of information about our searches, but then again, search engines are so useful and convenient. They ostensibly use this information to tune the advertising that we get to see. We also know that many sites sell the data they collect to others. Who knows to what other ends these data are put? Some, such as Google says as a matter of policy that they will not be evil.
Unfortunately, your privacy is not a right that is clearly or specifically called out in the US Constitution. Some specific aspects of your privacy are protected, such as the privacy of your beliefs (in the 1st Amendment), privacy of your home against demands that it be used to house soldiers (in the 3rd Amendment), privacy of you and your possessions against unreasonable searches (in the 4th Amendment), and perhaps most importantly the 5th Amendment’s privilege against self-incrimination, which provides some protection for the privacy of your personal information.
Since about 1923, the US Supreme Court has interpreted the “liberty” guarantee of the 14th Amendment to guarantee an increasingly broad right to privacy, and is the basis of most privacy protection outside those specifically listed. But the future of this constitutional privacy protection remains an open question. In our current Supreme Court, the so-called “originalists,” like Justices Scalia and Thomas, are not inclined to protect your privacy beyond those plainly and specifically guaranteed in the Bill of Rights. (Supreme Court nominee Robert Bork has derided the right of privacy as “a loose cannon in the law.” Good thing he never made it onto the Court!)
Beyond constitutional protection, your privacy and the protection of your sensitive or personal information are protected somewhat by a patchwork of statutes on a per-industry basis. The Privacy Act of 1974 prevents the unauthorized disclosure of your personal information that is held by the federal government. The Fair Credit Reporting Act protects information about you that has been gathered by credit reporting agencies. The Children’s Online Privacy Protection Act restricts what information about your children (age 13 and under) can be collected by web sites. The Sarbanes-Oxley Act, HIPAA and GLBA each contain some protection for some of your personal or confidential information. Some state laws also provide protection.
Since privacy is not specifically protected in the constitution, there will continue to be a battle between those of us who want our privacy protected and those who want to invade it – often our own government, certainly businesses who aggregate and sell our eyeballs, and worst of all, cooperation between the two.
Let’s not forget most of the phone companies’ gleeful cooperation with the US government’s widespread warrantless wiretap program. You can bet that every service provider company – search engine companies included – is paying close attention to the immunity that Congress is right now granting to these phone companies for their illegal participation in this wiretapping program. [Note from Tim: I did a post on the practical implications of this and FISA here.]
What will happen when the government asks your favorite search engine company to divulge what you and I have searched for? This has happened already. So far, Google has resisted, but AOL and others did not. The World Privacy Forum notes:
“In 2006, AOL released about 20 million search queries of over 500,000 of its users. Those queries were put on the web. Reporters for the New York Times were able to identify a user from the search queries; others have also been able to identify users. In 2005, the U.S. Department of Justice subpoenaed Google, Yahoo, MSN, and AOL for tens of millions of users’ search queries. Google successfully fought the request, and was able to limit its disclosure, but it is unknown how much data other companies may have turned over.”
Although Ask.com has subsequently announced that they will delete your searches after 18 months, Google has not.
To get an idea bout how long Google is interested in your data, a Google cookie on your machine expires in the year 2038! [Note from Tim: this appears to have been reduced but someone with better detective skills should comment.] So the Google search you made 3 years ago for, say, “file sharing music” could come back to haunt you 3 years from now when some new, even more odious version of the Digital Millennium Copyright Act (DMCA) comes into law.
Can even Google forever be trusted not to be evil? To what new ends will they put all that data about us? Anyway, doesn’t it creep you out knowing that they are saving and analyzing every search you have ever made?
And now, with Google’s acquisition of Doubleclick, they will be able to correlate your searches with the rest of your web browsing – and maybe make it more painful to block cookies from Doubleclick and Google.
Strategies to Protect Your Privacy
To get an idea about what websites, including search engines already know about you, check out this site: http://ipid.shat.net/. Spooky.
I use an Ironkey when I can, and there are both free sites and pay sites that can make your surfing anonymous. But some websites don’t work well with these tools. [From Tim: I cannot wait to test Pandora -- one of my favorite sites -- overseas using some of the proxy sites.]
The World Privacy Forum suggests several strategies to help protect your privacy while using search engines:
• Do not accept search engine cookies. If you already have some on your computer, delete them.
• Do not sign up for email at the same search engine where you regularly search.
• Mix it up. Use a variety of search engines.
• Watch what you search for.
• Read your news on one search engine, have your email on another, and use a handful of other separate search engines for Web research.
• Vary the physical location you search from.
• If you surf using a cable modem, or a static (unchanging) Internet connection, ask your service provider to give you a new IP address.
• Be aware that your online purchases can be correlated to your search activity at some search engines.
The “Haystack” Call to Action
Unfortunately, these search strategies are cumbersome and not especially effective.
We certainly can not count on the government to respect or help to protect our privacy. And I would rather not have to trust Google and Ask.com to protect my privacy.
What we need is a simple tool that requires little of our attention, and provides pretty good privacy – something as simple to use as a browser plug-in.
This is an opportunity for a little constructive hacking, and browsers that allow plug-ins provide the perfect opportunity. What I am proposing is a simple plug-in for the Firefox browser (and any other browser that supports plug-ins) that will bury your searches in noise. Let’s call this plug-in “Haystack.” [There are step-by-step tutorials for how to create Firefox plug-ins]
Here is how it works: Haystack generates a relatively low level background of random searches across a variety of search engines whenever your computer and your network connection are not too busy. The goal is to generate hundreds to thousands of random (hay) searches for every real search you do, such that your searches are a small needle in the haystack of these automatically-generated searches.
Search engines generally run analytic software that constantly looks for attacks – denial of service attacks, bogus click-throughs to pump up somebody’s advertising costs, etc. Since the goal of Haystack is to protect our privacy, not to bring any search engine down, it must be written in such a way that, from the search engine’s point of view, it looks like you are just manually searching.
• Search engine variety: through a setup option, you can select which search engines Haystack uses, matching the ones you normally use yourself.
• Frequency: I think one search every 15 seconds on average is about right, though the interval should be random, varying from say 5 seconds to about 5 minutes. If your machine is on for 10 hours per day, this will generate 2,400 “hay” searches per day. Remember, the goal is to look as much like a lot of human-generated searches as possible, not to jam up the search engine.
• Search terms: this needs to be very broad, random, and always changing. I suggest seeding the program with a search word list, and then pulling new search terms from the search results themselves, as well as occasionally from the text on the front pages of news sites like cnn.com. The searches must include a spectrum of provocative terms, so that any such search that you might do will not stand out.
• Search complexity: like search terms, broad and random. Search for single words, as well as several words at a time, and even with excluded words.
• Computer usage: Ideally, Haystack should not initiate searches when either your computer is very busy or your network connection is very busy. Since the actual search results are not valuable, Haystack should even abort an initiated search by closing the connection to the search engine if CPU usage suddenly increases.
• User controls:
o On/off radio button
o Check boxes to enable one or more search engine sites
o Slider for search frequency (2 seconds to 10 minutes?)
o Button to clear search engine cookies and private data
o Button to get latest version
• Output: Haystack should not bother the user with an open tab; the search results should be silently loaded and discarded (after gleaning a new search term or two from the data). A small icon on the toolbar indicating that Haystack is running should be good enough, perhaps also indicating the ratio of Haystack searches to your own searches.
If you and I both run Haystack, then the “information” search engines collect from our searches is mostly noise. Perfect. But think what happens if millions of us run Haystack… It does throw a monkey wrench into their lovely data collection machinery, doesn’t it?
Such is the cost of asserting our right to privacy.
So why am I writing this? Simple: I am a hardware hacker. My software abilities are limited to some really tight assembly language code. I am also spending most of my time planning my next big hack into the world of oil consumption, perhaps the subject of a future article.
Although I care a lot about privacy and recognize its defense as a patriotic act, I am not the one to write Haystack.
[Postscript: Readers have suggested several good tools that do most of what Haystack is designed to do. Read the comments for all the goodies, but here are two excellent picks: Scroogle (anonymizes Google searches) and TrackMeNot (noise-producing Firefox plug-in).]