Political data gathered on more than 198 million US citizens was exposed this month after a marketing firm contracted by the Republican National Committee stored internal documents on a publicly accessible Amazon server.
The data leak contains a wealth of personal information on roughly 61 percent of the US population. Along with home addresses, birthdates, and phone numbers, the records include advanced sentiment analyses used by political groups to predict where individual voters fall on hot-button issues such as gun ownership, stem cell research, and the right to abortion, as well as suspected religious affiliation and ethnicity. The data was amassed from a variety of sourcesfrom the banned subreddit r/fatpeoplehate to American Crossroads, the super PAC co-founded by former White House strategist Karl Rove.
Deep Root Analytics, a conservative data firm that identifies audiences for political ads, confirmed ownership of the data to Gizmodo on Friday.
UpGuard cyber risk analyst Chris Vickery discovered Deep Roots data online last week. More than a terabyte was stored on the cloud server without the protection of a password and could be accessed by anyone who found the URL. Many of the files did not originate at Deep Root, but are instead the aggregate of outside data firms and Republican super PACs, shedding light onto the increasingly advanced data ecosystem that helped propel President Donald Trumps slim margins in key swing states.
Although files possessed by Deep Root would be typical in any campaign, Republican or Democratic, experts say its exposure in a single open database raises significant privacy concerns. This is valuable for people who have nefarious purposes, Joseph Lorenzo Hall, the chief technologist at the Center for Democracy and Technology, said of the data.
The RNC paid Deep Root $983,000 last year, according to Federal Election Commission reports, but its server contained records from a variety of other conservative sources paid millions more, including The Data Trust (also known as GOP Data Trust), the Republican partys primary voter file provider. Data Trust received over $6.7 million from the RNC during the 2016 cycle, according to OpenSecrets.org, and its president, Johnny DeStefano, now serves as Trumps director of presidential personnel.
The Koch brothers political group Americans for Prosperity, which had a data-swapping agreement with Data Trust during the 2016 election cycle, contributed heavily to the exposed files, as did the market research firm TargetPoint, whose co-founder previously served as director of Mitt Romneys strategy team. (The Koch brothers also subsidized a data company known as i360, which began exchanging voter files with Data Trust in 2014.) Furthermore, the files provided by Roves American Crossroads contain strategic voter data used to target, among others, disaffected Democrats and undecideds in Nevada, New Hampshire, Ohio, and other key battleground states.
Deep Root further obtained hundreds of files (at least) from The Kantar Group, a leading media and market research company with offices in New York, Beijing, Moscow, and more than a hundred other cities on six continents. Each file offers rich details about political adsestimated cost, audience demographics, reach, and moreby and about figures and groups spanning the political spectrum. There are files on the Democratic Senatorial Campaign Committee, Planned Parenthood, and the American Civil Liberties Union, as well as files on every 2016 presidential candidate, Republicans included.
Whats more, the Kantar files each contain video links to related political ads stored on Kantars servers.
Spreadsheets acquired from TargetPoint, which partnered with Deep Root and GOP Data Trust during the 2016 election, include the home addresses, birthdates, and party affiliations of nearly 200 million registered voters in the 2008 and 2012 presidential elections, as well as some 2016 voters. TargetPoints data seeks to resolve questions about where individual voters stand on dozens of political issues. For example: Is the voter eco-friendly? Do they favor lowering taxes? Do they believe the Democrats should stand up to Trump? Do they agree with Trumps America First economic stance? Pharmaceutical companies do great damage: Agree or Disagree?
The details of voters likely preferences for issues like stem cell research and gun control were likely drawn from a variety of sources according to a Democratic strategist who spoke with Gizmodo.
Data like that would be a combination of polling data, real world data from door-knocking and phone-calling and other canvassing activities, coupled with modeling using the data we already have to extrapolate what the voters we dont know about would think, the strategist said. The campaigns that do it right combine all the available data together to make the most robust model for every single voter in the target universe.
In a statement, Deep Root founder Alex Lundry told Gizmodo, We take full responsibility for this situation. He said the data included proprietary information as well as publicly available voter data provided by state government officials. Since this event has come to our attention, we have updated the access settings and put protocols in place to prevent further access, Lundry said.
Deep Roots data was exposed after the company updated its security settings on June 1, Lundry said. Deep Root has retained Stroz Friedberg, a cybersecurity and digital forensics firm, to investigate. Based on the information we have gathered thus far, we do not believe that our systems have been hacked, Lundry added.
So far, Deep Root doesnt believe its proprietary data was accessed by any malicious third parties during the 12 days that the data was exposed on the open web.
Deep Roots server was discovered by UpGuards Vickery on the night of June 12 as he was searching for data publicly accessible on Amazons cloud service. He used the same process last month to detect sensitive files tied to a US Defense Department project and exposed by an employee of a top defense contractor.
This is not the first leak of voter files uncovered by Vickery, who told Gizmodo that he was alarmed over how the data was apparently being usedsome states, for instance, prohibit the commercial use of voter records. Moreover, it was not immediately clear to whom the data belonged. It was decided that law enforcement should be contacted before attempting any contact with the entity responsible, said Vickery, who reported that the server was secured two days later on June 14.
Deep Roots data sheds light onto the increasingly sophisticated data operation that has fed recent Republican campaigns and lays bare the intricate network of political organizations, PACs, and analysis firms that trade in bulk voter data. In an email to Gizmodo, Deep Root said that its voter models are used to enhance the understanding of TV viewership for political ad buyers. The data accessed was not built for or used by any specific client, Lundry said. It is our proprietary analysis to help inform local television ad buying.
However, the presence of data on the server from several political organizations, including TargetPoint and Data Trust, suggests that it was used for Republican political campaigns. Deep Root also works primarily with GOP customers (although similar vendors, such as NationBuilder, service the Democrats as well).
Deep Root is one of three data firms hired by the Republican National Committee in the run-up to the 2016 presidential election. Founded by Lundry, a data scientist on the Jeb Bush and Mitt Romney campaigns, the firm was one of three analytics teams that worked on the Trump campaign following the partys national convention in the summer of 2016.
Lundrys work brought him into Trumps campaign war room, according to a post-election AdAge article that charted the GOPs 2016 data efforts. Deep Root was hand-picked by the RNCs then-chief of staff, Katie Walsh, in September of last year and joined two other data shopsTargetPoint Consulting and Causeway Solutionsin the effort to win Trump the presidency.
Walsh, who now works for the nonprofit America First Policies after a brief stint in the White House, oversaw Trumps data operation in partnership with Brad Parscale, Trumps digital director. (Parscale did not respond to a request for comment before press time. Attempts to reach Walsh for comment were also unsuccessful.) Walsh and Parscale focused their efforts on three categories of voters, AdAge reports: voters who might be predisposed to support Trump, Republican voters who were uncertain about Trump, and voters that were leaning toward Hillary Clinton but could be persuaded by Trumps message of changing up government-as-usual.
To appeal to the three crucial categories, it appears that Trumps team relied on voter data provided by Data Trust. Complete voter rolls for 2008 and 2012, as well as partial 2016 voter rolls for Florida and Ohio, apparently compiled by Data Trust are contained in the dataset exposed by Deep Root.
Data Trust acquires voter rolls from state officials and then standardizes the voter data to create a clean, manageable record of all registered US voters, a source familiar with the firms operations told Gizmodo. Voter data itself is public record and therefore not particularly sensitive, the source added, but the tools Data Trust uses to standardize that data are considered proprietary. That data is then provided to political clients, including analytics firms like Deep Root. While Data Trust requires its clients to protect the data, it has to take clients at their word that industry-standard encryption and security protocols are in place.
TargetPoint and Causeway, the two firms employed by the RNC in addition to Deep Root, apparently layered their own analytics atop the information provided by Data Trust. TargetPoint conducted thousands of surveys per week in 22 states, according to AdAge, gauging voter sentiment on a variety of topics. While Causeway helped manage the data, Deep Root used it to perfect its TV advertising targetsproducing voter turnout estimates by county and using that intelligence to target its ad buys.
A source with years of experience working on political campaign data operations told Gizmodo that the data exposed by Deep Root appeared to be customized for the RNC and had apparently been used to create models for turnout and voter preferences. Metadata in the files suggested that the database wasnt Deep Roots working copy, but rather a post-election version of its data, the source said, adding that it was somewhat surprising the files hadnt been discarded.
Because the data from the 2008 and 2012 elections is outdatedthe source compared it to the kind of address and phone data one could find on a lousy internet lookup siteits not very valuable. Even the 2016 data is quickly becoming stale. This is a proprietary dataset based on a mix of public records, data from commercial providers, and a variety of predictive models of uncertain provenance and quality, the source said, adding: Undoubtedly it took millions of dollars to produce.
Although basic voter information is public record, Deep Roots dataset contains a swirl of proprietary information from the RNCs data firms. Many of filenames indicate they potentially contain market research on Democratic candidates and the independent expenditure committees that support them. (Up to two terabytes of data contained on the server was protected by permission settings.)
One exposed folder is labeled Exxon-Mobile [sic] and contains spreadsheets apparently used to predict which voters support the oil and gas industry. Divided by state, the files include the voters names and addresses, along with a unique RNC identification number assigned to every US citizen registered to vote. Each row indicates where voters likely fall on issues of interest to ExxonMobil, the countrys biggest natural gas producer.
The data evaluates, for example, whether or not a specific voter believes drilling for fossil fuels is vital to US security. It also predicts if the voter thinks the US should be moving away from fossil-fuel use. The ExxonMobil national score document alone contains data on 182,746,897 Americans spread across 19 fields.
Some of the data included in Deep Roots dataset veers into downright bizarre territory. A folder titled simply reddit houses 170 GBs of data apparently scraped from several subreddits, including the controversial r/fatpeoplehate that was home to a community of people who posted pictures of people and mocked them for their weight before it was banned from Reddits platform in 2015. Other subreddits that appear to have been scraped by Deep Root or a partner organization focused on more benign topics, like mountain biking and the Spanish language.
The Reddit data couldve been used as training data for an artificial intelligence algorithm focused on natural language processing, or it might have been harvested as part of an effort to match up Reddit users with their voter registration records. During the 2012 election cycle, Barack Obamas campaign data team relied on information gleaned from Facebook profiles and matched profiles to voter records.
During the 2016 election season, Reddit played host to a legion of Trump supporters who gathered in subreddits like r/The_Donald to comb through leaked Democratic National Committee emails and craft pro-Trump memes. Trump himself participated in an Ask Me Anything session on r/The_Donald during his campaign.
Given how active some Trump supporters are on Redditr/The_Donald currently boasts more than 430,000 membersit makes sense that Trumps data team might be interested in analyzing data from the site.
A FiveThirtyEight analysis that looked at where r/The_Donald members spend their time when theyre not talking politics might shed some light onto why Deep Root collected r/fatpeoplehate data. FiveThirtyEight found that, when Redditors werent commenting in political subreddits, they most often frequented r/fatpeoplehate.
Its possible that Deep Root intended to use data from r/fatpeoplehate to build a more comprehensive profile of Trump voters. (Lundry declined to comment beyond his initial statement on any of information included in the Deep Root dataset.)
However, FiveThirtyEights investigation doesnt account for Deep Roots collection of data from mountain-biking and Spanish-speaking subreddits that werent as popular with r/The_Donald membersand data from these subreddits that are not so closely linked to Trumps diehard supporters might be more useful for his campaigns goal of pursuing swing voters.
My guess is that they were scraping Reddit posts to match to the voter file as another input for individual modeling, a source familiar with campaign data operations told Gizmodo. Given the number of random forums, my guess is they started with a list of accounts to scrape from, rather than scraping from all forums then trying to match from there (in which case youd start with the political ones).
Matching voter records with Reddit usernames would be complicated and any large-scale effort would likely result in many inaccuracies, the source said. However, campaigns have attempted to match voter files with social media profiles in the past. Such an effort by Deep Root wouldnt be entirely surprising, and would likely yield rich data on the small portion of users it was able to match with their voter profiles, the source explained.
The Deep Root incident represents the largest known leak of Americans voter records, outstripping past exposures by several million records. Five voter-file leaks over the past 18 months exposed between 350,000 and 191 million files, some of which paired voter dataname, race, gender, birthdate, address, phone number, party affiliation, etc.with email accounts, social media profiles, and records of gun ownership.
Campaigns and the data analysis firms they employ are a particularly weak point for data exposure, security experts say. Corporations that dont properly secure customer data can face significant financial repercussionsjust ask Target or Yahoo. But because campaigns are short-term operations, theres not much incentive for them to take data security seriously, and valuable data is often left out to rust after an election.
Campaigns are very narrowly focused. They are shoestring operations, even presidential campaigns. So they dont think of this as an asset they need to protect, the Center for Democracy and Technologys Hall told Gizmodo.
Even though voter rolls are public record and are easy to accessOhio, for instance, makes its voter rolls available to download onlinetheir exposure can still be harmful.
Voter registration records include ZIP codes, birthdates, and other personal information that have been crucial in research efforts to re-identify anonymous medical data. Latanya Sweeney, a professor of government and technology at Harvard University, famously used voter data to re-identify Massachusetts Governor William Weld from information in anonymous hospital discharge records.
Because of the personal information they contain, voter registration databases can also be useful in identity theft schemes.
Even though exposure of Deep Roots data has the potential to harm voters, its exactly the kind of data that campaigns lust after and will spend millions of dollars to obtain. Campaigns are motivated to accumulate as much deeply personal information about voters as possible, so they can spend their ad dollars in the right swing districts where theyre likely to sway the greatest number of voters. But voter data rapidly goes stale and campaigns close up shop quickly, so data is seen as disposable and often isnt well-protected.
I can think of no avenues for punishing political data breaches or otherwise properly aligning the incentives. I worry that if theres no way to punish campaigns for leaking this stuff, its going to continue to happen until something bad happens, Hall said. The data left behind by campaigns can pose a lingering security issue, he added. None of these motherfuckers were ever Boy Scouts or Girl Scouts, they dont pack out what they pack in.
Here is the original post:
GOP Data Firm Accidentally Leaks Personal Details of Nearly 200 Million American Voters - Gizmodo