This is Atlantic Intelligence, an eight-week series in which The Atlantics leading thinkers on AI will help you understand the complexity and opportunities of this groundbreaking technology. Sign up here.
The bedrock of the AI revolution is the internet, or more specifically, the ever-expanding bounty of data that the web makes available to train algorithms. ChatGPT, Midjourney, and other generative-AI models learn by detecting patterns in massive amounts of text, images, and videos scraped from the internet. The process entails hoovering up huge quantities of books, art, memes, and, inevitably, the troves of racist, sexist, and illicit material distributed across the web.
Earlier this week, Stanford researchers found a particularly alarming example of that toxicity: The largest publicly available image data set used to train AIs, LAION-5B, reportedly contains more than 1,000 images depicting the sexual abuse of children, out of more than 5 billion in total. A spokesperson for the data sets creator, the nonprofit Large-scale Artificial Intelligence Open Network, told me in a written statement that it has a zero tolerance policy for illegal content and has temporarily halted the distribution of LAION-5B while it evaluates the reports findings, although this and earlier versions of the data set have already trained prominent AI models.
Because they are free to download, the LAION data sets have been a key resource for start-ups and academics developing AI. Its notable that researchers have the ability to peer into these data sets to find such awful material at all: Theres no way to know what content is harbored in similar but proprietary data sets from OpenAI, Google, Meta, and other tech companies. One of those researchers is Abeba Birhane, who has been scrutinizing the LAION data sets since the first versions release, in 2021. Within six weeks, Birhane, a senior fellow at Mozilla who was then studying at University College Dublin, published a paper detailing her findings of sexist, pornographic, and explicit rape imagery in the data. Im really not surprised that they found child-sexual-abuse material in the newest data set, Birhane, who studies algorithmic justice, told me yesterday.
Birhane and I discussed where the problematic content in giant data sets comes from, the dangers it presents, and why the work of detecting this material grows more challenging by the day. Read our conversation, edited for length and clarity, below.
Matteo Wong, assistant editor
More Challenging By the Day
Matteo Wong: In 2021, you studied the LAION data set, which contained 400 million captioned images, and found evidence of sexual violence and other harmful material. What motivated that work?
Abeba Birhane: Because data sets are getting bigger and bigger, 400 million image-and-text pairs is no longer large. But two years ago, it was advertised as the biggest open-source multimodal data set. When I saw it being announced, I was very curious, and I took a peek. The more I looked into the data set, the more I saw really disturbing stuff.
We found there was a lot of misogyny. For example, any benign word that is remotely related to womanhood, like mama, auntie, beautifulwhen you queried the data set with those types of terms, it returned a huge proportion of pornography. We also found images of rape, which was really emotionally heavy and intense work, because we were looking at images that are really disturbing. Alongside that audit, we also put forward a lot of questions about what the data-curation community and larger machine-learning community should do about it. We also later found that, as the size of the LAION data sets increased, so did hateful content. By implication, so does any problematic content.
Wong: This week, the biggest LAION data set was removed because of the finding that it contains child-sexual-abuse material. In the context of your earlier research, how do you view this finding?
Birhane: It did not surprise us. These are the issues that we have been highlighting since the first release of the data set. We need a lot more work on data-set auditing, so when I saw the Stanford report, its a welcome addition to a body of work that has been investigating these issues.
Wong: Research by yourself and others has continuously found some really abhorrent and often illegal material in these data sets. This may seem obvious, but why is that dangerous?
Birhane: Data sets are the backbone of any machine-learning system. AI didnt come into vogue over the past 20 years only because of new theories or new methods. AI became ubiquitous mainly because of the internet, because that allowed for mass harvesting of large-scale data sets. If your data contains illegal stuff or problematic representation, then your model will necessarily inherit those issues, and your model output will reflect these problematic representations.
But if we take another step back, to some extent its also disappointing to see data sets like the LAION data set being removed. For example, the LAION data set came into existence because the creators wanted to replicate data sets inside big corporationsfor example, what data sets used in OpenAI might look like.
Wong: Does this research suggest that tech companies, if theyre using similar methods to collect their data sets, might harbor similar problems?
Birhane: Its very, very likely, given the findings of previous research. Scale comes at the cost of quality.
Wong: Youve written about research you couldnt do on these giant data sets because of the resources necessary. Does scale also come at the cost of auditability? That is, does it become less possible to understand whats inside these data sets as they become larger?
Birhane: There is a huge asymmetry in terms of resource allocation, where its much easier to build stuff but a lot more taxing in terms of intellectual labor, emotional labor, and computational resources when it comes to cleaning up whats already been assembled. If you look at the history of data-set creation and curation, say 15 to 20 years ago, the data sets were much smaller scale, but there was a lot of human attention that went into detoxifying them. But now, all that human attention to data sets has really disappeared, because these days a lot of that data sourcing has been automated. That makes it cost-effective if you want to build a data set, but the reverse side is that, because data sets are much larger now, they require a lot of resources, including computational resources, and its much more difficult to detoxify them and investigate them.
Wong: Data sets are getting bigger and harder to audit, but more and more people are using AI built on that data. What kind of support would you want to see for your work going forward?
Birhane: I would like to see a push for open-sourcing data setsnot just model architectures, but data itself. As horrible as open-source data sets are, if we dont know how horrible they are, we cant make them better.
Related:
P.S.
Struggling to find your travel-information and gift-receipt emails during the holidays? Youre not alone. Designing an algorithm to search your inbox is paradoxically much harder than making one to search the entire internet. My colleague Caroline Mimbs Nyce explored why in a recent article.
Matteo
Original post:
Building AI safely is getting harder and harder - The Atlantic
- Shell to use new AI technology in deep sea oil exploration - Reuters [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Tom Hanks: I could appear in movies after death with AI technology - BBC [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Why C3.ai, Palantir, and Other AI Stocks Soared This Week - The Motley Fool [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- How to do the AI Webtoon filter going viral on TikTok - Dexerto [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- AI poses risk to humanity, according to majority of Americans in new poll - Ars Technica [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- New AI tool predicts Parkinson's disease with 96% accuracy -- 15 ... - Study Finds [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- AI is in a 'baby bubble.' Here's what could burst it. - Markets Insider [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Rise of the machines: how long before AI steals my job? - Mexico News Daily [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Amazon is focusing on using A.I. to get stuff delivered to you faster - CNBC [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Beijing calls on cloud providers to support AI firms - TechCrunch [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- AI at warp speed: disruption, innovation, and whats at stake - Economic Times [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- How a family is using AI to plan a trip around the world - Business Insider [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Prompt Injection: An AI-Targeted Attack - Hackaday [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- WHO calls for safe and ethical AI for health - World Health Organization [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Azeem on AI: Where Will the Jobs Come from After AI? - HBR.org Daily [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- AI runs amok in 1st trailer for director Gareth Edwards' 'The Creator ... - Space.com [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- From railroads to AI: Why new tech is often demonised - The Indian Express [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- How Generative AI Changes Organizational Culture - HBR.org Daily [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Google plans to use new A.I. models for ads and to help YouTube creators, sources say - CNBC [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- A.I. and sharing economy: UBER, DASH can boost profits investing ... - CNBC [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- A Wharton professor says AI is like an 'intern' who 'lies a little bit' to make their bosses happy - Yahoo Finance [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- CNET Published AI-Generated Stories. Then Its Staff Pushed Back - WIRED [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- AI-Driven Robots Have Started Changing Tires In The U.S. In Half The Time As Humans - CarScoops [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Elections in UK and US at risk from AI-driven disinformation, say experts - The Guardian [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Here's What AI Thinks an Illinoisan Looks Like And Apparently, Real Illinoisans Agree - NBC Chicago [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- We Put Google's New AI Writing Assistant to the Test - WIRED [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- 'Heart wrenching': AI expert details dangers of deepfakes and tools to detect manipulated content - Fox News [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- From Amazon to Wendy's, how 4 companies plan to incorporate AIand how you may interact with it - CNBC [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- Meta Made Its AI Tech Open-Source. Rivals Say Its a Risky Decision. - The New York Times [Last Updated On: May 21st, 2023] [Originally Added On: May 21st, 2023]
- For chemists, the AI revolution has yet to happen - Nature.com [Last Updated On: May 23rd, 2023] [Originally Added On: May 23rd, 2023]
- G7 calls for adoption of international technical standards for AI - Reuters [Last Updated On: May 23rd, 2023] [Originally Added On: May 23rd, 2023]
- Bloomsbury admits using AI-generated artwork for Sarah J Maas novel - The Guardian [Last Updated On: May 23rd, 2023] [Originally Added On: May 23rd, 2023]
- New AI research lets you click and drag images to manipulate them ... - The Verge [Last Updated On: May 23rd, 2023] [Originally Added On: May 23rd, 2023]
- France makes high-profile push to be the A.I. hub of Europe setting up challenge to U.S., China - CNBC [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- German tabloid Bild cuts 200 jobs and says some roles will be replaced by AI - The Guardian [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- How Christopher Nolan Learned to Stop Worrying and Love AI - WIRED [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- OpenAI plans app store for AI software, The Information reports - Reuters.com [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- A.I. could remove all human touchpoints in supply chains. Heres what that means - CNBC [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Cision Announces Code of Ethics for AI Development and Support ... - PR Newswire [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- AI Stock Price Prediction: Is C3.ai Really Worth $16? - InvestorPlace [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Is Applied Digital (APLD) Stock the Next Big AI Play? - InvestorPlace [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- 2 Cloud Stocks to Ride the AI Opportunity - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Digital health funding this week: Outbound AI, Aledade, Dexcare - Modern Healthcare [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- The AI Tool That Beat Out Top Wall Street Analysts - InvestorPlace [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Replacing news editors with AI is a worry for misinformation, bias ... - The Conversation [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- In new AI hype frenzy, tech is applying the label to everything now - Axios [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- How AI like ChatGPT could be used to spark a pandemic - Vox.com [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- 70% of Companies Will Use AI by 2030 -- These 2 Stocks Have a ... - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Why C3.ai Stock Crashed by 10% on Friday - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- YouTube integrates AI-powered dubbing tool - TechCrunch [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- AINsight: Now Everywhere, Can AI Improve Aviation Safety? - Aviation International News [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- What is 'ethical AI' and how can companies achieve it? - The Ohio State University News [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- US to launch working group on generative AI, address its risks - Reuters.com [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Amazon Wants to Teach Its Cloud Customers About AI, and It's Yet ... - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- HIMSSCast: When AI is involved in decision making, how does man ... - Healthcare IT News [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- How AI could transform the legal industry for the better - Marketplace [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Neuroscience, Artificial Intelligence, and Our Fears: A Journey of ... - Neuroscience News [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Why SoundHound AI Stock Was Making So Much Noise This Week - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Advertisers should beware being too creative with AI - Financial Times [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- 3 Top AI Stocks to Buy Right Now - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- 1 AI Stock That Could Take You to Easy Street -- and 1 That Could ... - The Motley Fool [Last Updated On: June 23rd, 2023] [Originally Added On: June 23rd, 2023]
- Generative AI To Wearable Plant Sensors: New Report Lists Top 10 Emerging Tech Of 2023 - NDTV [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Researchers use AI to help save a woodpecker species in decline - MPR News [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- OceanGate fires a whistleblower, hackers threaten to leak Reddit data, and Marvel embraces AI art - TechCrunch [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- As AI Spreads, Experts Predict the Best and Worst Changes in ... - Pew Research Center [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- 9 AI-powered tools for empowering CFOs unveiled at Health Magazine round table - Gulf News [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Translating Japanese, finding rap rhymes: How these young Toronto-area workers are using AI - Toronto Star [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Artificial Intelligence in Asset Management Market to grow by USD 10,373.18 million from 2022 to 2027, Growing adoption of cloud-based artificial... [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- 5 Stocks Well-Positioned to Reap Rewards of AI: Morgan Stanley - Business Insider [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- ChatGPT-maker OpenAI planning to launch marketplace for AI applications - Business Today [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- AI watch: from Wimbledon to job losses in journalism - The Guardian [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- AWS is investing $100 million in generative A.I. center in race to keep up with Microsoft and Google - CNBC [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Bets on A.I. and innovation help this tech-focused T. Rowe Price ... - CNBC [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Generation AI: It is Indias time to play chief disruptor | Mint - Mint [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- The Next Token of Progress: 4 Unlocks on the Generative AI Horizon - Andreessen Horowitz [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- MongoDB Embraces AI & Reduces Developer Friction With New Features - Forbes [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- Why smart AI regulation is vital for innovation and US leadership - TechCrunch [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- WEDNESDAY: West Seattle facilitator hosting 'civic conversation ... - West Seattle Blog [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- A.I. has a discrimination problem. In banking, the consequences can be severe - CNBC [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]
- AI Consciousness: An Exploration of Possibility, Theoretical ... - Unite.AI [Last Updated On: June 26th, 2023] [Originally Added On: June 26th, 2023]