Every week it seems the world is stunned by another advance in artificial intelligence, including text-to-image generators like DALL-E and the latest chatbot, GPT-4.
What makes these tools impressive is the enormous amount of data theyre trained on, specifically the millions of images and words on the internet.
But the process of machine learning relies on a lot of human data labelers.
Marketplaces Meghan McCarty Carino spoke to Sarah T. Roberts, a professor of information studies and director of the Center for Critical Internet Inquiry at UCLA, about how this work is often overlooked. The following is an edited transcript of their conversation.
Sarah T. Roberts: In the case of something like ChatGPT and the engines that its using, its really going out and pretty much data mining massive portions of the internet. Now, we all know that the internet is filled with the best information and the greatest stuff all the time, right? So whats required for something like that is to have human beings with their special ability of discernment and good judgement, and sometimes visceral reactions to material and in the case of ChatGPT, to cull material out, material that users, or more importantly, companies, would not want inside of their products as a potential output. And so that means these data labelers, much like content moderators, spend their days working on some of the worst stuff that we can imagine. And in this case, theyre trying to build models to cull that out automatically. But it always starts and ends with human engagement.
Meghan McCarty Carino: What do we know about the people who are doing this really key work of data labelling?
Roberts: So taking a page from the content moderation industry, much of this work is outsourced to third-party companies that provide large labor pools. Often these data labelers are at great remove from where we might imagine the work of engineering these products goes on. They might be in other parts of the world. There was a great article by Billy Perrigo in Time magazine in January of 2023, about a place in Kenya that was doing data labelling. It was a really hard, upsetting job, and folks were being paid at most $2 an hour to be confronted with that material. Unfortunately, this is an industry that is reliant upon human intervention and human discernment, but once again, takes it for granted and pays very little and puts people in harms way.
McCarty Carino: Right, very similar to, you know, what weve learned about content moderation, which, as you said, happens in a similar sort of outsourced way where these people are sort of the front lines of everything that we dont want showing up in our end product, and it runs through these workers.
Roberts: Yeah, thats right. And for years, Ive been listening to industry figures and other pundits tell me that my concern about the welfare of content moderation workers was appropriate, but it was finite, and that in just a few years, AI technologies would be such that we could eliminate that work. In fact, whats happening is just the opposite. We are expanding, greatly expanding at an exponential pace, the number of people who are doing work like this. I think of data labelling frankly as content moderation before the fact, both in practice, but also in the material conditions of the work.
McCarty Carino: When we think about how these technologies are often described, or characterized by the companies that put them out or, you know, in the press, I mean, what is important to keep in mind as we think about this type of labor and its relationship to those products?
Roberts: I think what we have to remember is that AI is artificial intelligence leaning heavily on the artificial. And what its doing at best is imitating human discernment, thinking and processes, but it is only as good as the material that goes into it. You know, theres an old adage in programming, garbage in, garbage out, that goes maybe even more so for applications like these AI tools that weve been discussing. Emily Bender and her colleagues wrote a great paper called stochastic parrots, which is how she and her colleagues describe what ChatGPT is actually doing. And for those who arent familiar with that term, basically, what shes saying is that you can use ChatGPT, its incredible, Ive used it as well. But you have to keep in mind that what youre seeing as its output is at best mimicking humans in the same way that a parrot might copy our pattern of speech, a series of words or phrases, even our inflection, but really has no cognitive ability to necessarily understand what those things mean.
And in fact, I would give a parrot a better chance of having that kind of cognition than I would have machine. So in a way, Ive been thinking about ChatGPT and other tools like it really as vanity machines. Just as an example, I requested it to generate an annotated bibliography for me the other day in my own field. I picked something that I thought I would have some expertise in in order to evaluate the output. And it gave me about 10 answers. The first one it gave me was something I would have chosen as well, a book by a colleague. Perfect response. And then it started producing a bunch of new papers and books in my area of study that Id never heard of. And I really thought, Wow, have I really been underwater that much during COVID? Like, all this stuff is coming out and Im missing it? Turns out, those were fake citations, fake authors, fake books on legitimate presses, fake papers, but using legitimate journal titles with even page numbers given. Imagine if I hadnt had the expertise to know that those were bogus. Thats just one example of the way that this stochastic parrot or this mimicry might reproduce. And, of course, to be fair, I didnt ask it to give me real citations or truthful information. It gave me its best guess at what an annotated bibliography would look like in my field. But none of it was real.
McCarty Carino: What gets lost when tools like this are thought of as these sort of genius technological achievements without considering all of the human labor that went into them?
Roberts: They could have really chosen any model. They could have decided, you know, an infinite number of possibilities of how to set up that work and how to treat those workers. And I think it says something about tech companies. The actual intelligence that they are mining, the very essence of what makes these tools appear to have this human element in other words, mimicking the humans that work on the labeling, work on the moderation, work on these inputs are erased from the process. And I think the erasure of the humanity that goes into these tools is to all of our detriment if for no other reason then we cant really fully appreciate the labor that goes into creating them or the limits of the tools and how they should be applied.
A report from Grand View Research valued the global data collection and labeling market over $2.2 billion in 2022. Its a huge sector.
And its important to understand its not just this new generative AI that requires this kind of work. For example, my colleague Jennifer Pak reported a couple years ago on a data labeling center in China that contracts with big companies like Baidu and Alibaba.
One of the workers Jennifer spoke to said he was making twice the average salary in his local province, roughly $11,000 a year, plus commission.
The operation had workers labeling street data for an autonomous vehicle project basically, Thats a bike, thats a pedestrian, thats a baby stroller.
The same type of labor is used to label faces to train facial recognition software or to help robot vacuums navigate their way around your home.
Earlier this year, we spoke to MIT Technology Review reporter Eileen Guo about her story on how sensitive personal images taken by robot vacuums inside peoples homes ended up online.
Its a winding path, but it runs through a group of outsourced data labelers in Venezuela that iRobot contracted.
More here:
The human labor behind AI chatbots and other smart tools - Marketplace
- Electric Vehicles for Construction, Agriculture and Mining Market 2020 | In-Depth Study On The Current State Of The Industry And Key Insights Of The... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Robotic process automation market Business Opportunities and Future Strategies with Major Vendors | Celaton Ltd., Redwood Software, Uipath SRL, Verint... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Tissue Expander Market: Projected To Witness Vigorous Expansion By 2020 2026 | Sientra, Inc.; GC Aesthetics; KOKEN CO.,GROUPE SEBBIN SAS -... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Insulation Coating Market: Report Offers Intelligence And Forecast Till 2020 2027 | Sharpshell Industrial Solution, The Dow Chemical Company -... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Surgical Snare Market: Size, Analytical Overview, Growth Factors, Demand, Trends And Forecast To 2020 2026 | CONMED Corporation, Cook, Medline... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Edge Data Center Market Trends And Opportunities By Types And Application In Grooming Regions; Edition 2020-2026 - Zenit News [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Data Warehousing Market is Expected to Grow at an active CAGR by Forecast to 2028 - Zenit News [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Artificial Intelligence in Big Data Analytics and IoT Markets, 2025 - AI Makes IoT Data 25% More Efficient and Analytics 42% More Effective for... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Lifesciences Data Mining And Visualization Market 2020 | Forecast to 2027 with Focusing on Major Players - TechnoWeekly [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- United States Electronics Health Records (EHR) Market Outlook and Forecast 2020-2025 with In-depth Analysis and Data-driven Insights on the Impact of... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Feature selection and risk prediction for patients with coronary artery disease using data mining - DocWire News [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Global Lifesciences Data Mining and Visualization Market 2020 Analysis, Types, Applications, Forecast and COVID-19 Impact Analysis 2025 - The Daily... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Data Mining Tools Market Growth Prospects, Key Vendors, Future Scenario Forecast 2027 IBM Corporation, SAS Institute Inc., RapidMiner, Inc., KNIME AG,... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Data Mining Tools Market A Latest Research Report to Share Market Insights and Dynamics to 2028 - TechnoWeekly [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Global Data Mining Software Market 2020 | Know the Companies List Could Potentially Benefit or Loose out From the Impact of COVID-19 | Top Companies:... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
- Transaction monitoring: Poor data highlights need to invest in tech - Euromoney magazine [Last Updated On: November 16th, 2020] [Originally Added On: November 16th, 2020]
- Sensyne Health agreement with Somerset NHS Foundation Trust helps business achieve a major landmark - Proactive Investors UK [Last Updated On: November 16th, 2020] [Originally Added On: November 16th, 2020]
- How TikTok could be used for disinformation and espionage - CBS News [Last Updated On: November 16th, 2020] [Originally Added On: November 16th, 2020]
- Social app Parler apparently receives funding from the conservative Mercer family - The Verge [Last Updated On: November 16th, 2020] [Originally Added On: November 16th, 2020]
- Biological Data Visualization Market Analysis, COVID-19 Impact,Outlook, Opportunities, Size, Share Forecast and Supply Demand 2021-2027|Trusted... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- The Weirdest Objects in the Universe | Space - Air & Space Magazine [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Epiroc introduces the RCS 4.20 Rig Control System for Pit Viper rigs - MINING.com [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Operating Systems Market Overview, Development by Companies and Comparative Analysis by 2026 - Cheshire Media [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Feed Binders Market Segments by Product Types, Manufacturers, Regions and Application Analysis to 2026 - The Think Curiouser [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Advanced Analytics Market Analysis, COVID-19 Impact,Outlook, Opportunities, Size, Share Forecast and Supply Demand 2021-2027|Trusted Business Insights... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Data Center Infrastructure Market 2026 Growth Forecast Analysis by Manufacturers, Regions, Type and Application - The Daily Philadelphian [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Fog Computing Market Report Aims To Outline and Forecast , Organization Sizes, Top Vendors, Industry Research and End User Analysis By 2026 - Cheshire... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Global Trend Expected to Guide Data Center Colocation Market from 2020-2026: Growth Analysis by Manufacturers, Regions, Type and Application - PRnews... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Cybercrime To Cost The World $10.5 Trillion Annually By 2025 - GlobeNewswire [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Peloton Collaborates with Sfile Technology | Texas | tylerpaper.com - Tyler Morning Telegraph [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Global Wireless Charger Market 2026 Trends Forecast Analysis by Manufacturers, Regions, Type and Application - The Daily Philadelphian [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- EHR market expected to grow 6% per year through 2025 - Healthcare IT News [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Gordon Bell Prize Winner Breaks Ground in AI-Infused Ab Initio Simulation - HPCwire [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Lifesciences Data Mining and Visualization Market: Global Industry Analysis and Opportunity Assessment 2016-2026, Tableau Software,SAP SE,IBM,SAS... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Data Mining Tools Market Includes Important Growth Factor with Regional Forecast, Organization Sizes, Top Vendors, Industry Research and End User... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Lifesciences Data Mining And Visualization Market jump on the sunnier outlook for growth despite pandemic - The Think Curiouser [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Data Mining Software Market 2020 to Global Forecast 2023 By Key Companies IBM, RapidMiner, GMDH, SAS Institute, Oracle, Apteco, University of... [Last Updated On: November 22nd, 2020] [Originally Added On: November 22nd, 2020]
- Plant-Based Meat Market with Latest Research Report And Growth By 2026 Market Analysis, Size, Share, Trends, Key Vendors, Drivers And Forecast - The... [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- STREAMING ANALYTICS MARKET OVERVIEW: SIZE, SHARE AND DEMAND IN UPCOMING DECADE The Courier - The Courier [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Portable Fire Extinguisher Market (COVID-19 Analysis): Indoor Applications Projected to be the Most Attractive Segment during 2020-2026 - The Courier [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- BIG DATA AND BUSINESS ANALYTICS MARKET ADVANCED TECHNOLOGY AND NEW INNOVATIONS BY 2026 IBM, ORACLE, MICROSOFT, SAP The Market Feed - The Market Feed [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Insights on the Oil Condition Monitoring Global Market to 2027 - Strategic Recommendations for New Entrants - Benzinga [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Insights on the Adaptogens Global Market (2020 to 2027) - Strategic Recommendations for New Entrants - PRNewswire [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- These 2 IPO Stocks Are Crushing the Stock Market on Wednesday - The Motley Fool [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Playout solutions market Competitive Analysis, Key Companies and Forecast Harmonic, Inc., SES SA, Grass Valley Canada, Evertz, BroadStream Solutions,... [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Graph Database Market To Witness Astonishing Growth 2027 || TIBCO Software Inc., Franz Inc, OpenLink Software, TigerGraph, MarkLogic Corporation,... [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Major Chinese Tech Company Baidu Caught Mining Private User Data Through Android Apps - Digital Information World [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- After 27 million drivers license records are stolen, Texans get angry with the seller: the government - The Dallas Morning News [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- 6th International Online Conference on Fuzzy Systems and Data Mining (FSDM 2020) held at Huaqiao University - India Education Diary [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Data Mining Tools Market: Industry Analysis, Size, Share, Growth, Trend And Forecast 2018 2028 - Cheshire Media [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Tracking H1N1pdm09, the Hantavirus, and G4 EA H1N1 w/ Data Mining - hackernoon.com [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
- Mining Tire Market: Qualitative analysis of the leading players and competitive industry scenario | Bridgestone, Michelin, Titan Tire, Chem China,... [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Micro Mobile Data Center Market Capacity, Production, Revenue, Price and Gross Margin, Industry Analysis & Forecast by 2026 - The Market Feed [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Impact Of Covid 19 On Telecom Analytics 2020 Industry Challenges Business Overview And Forecast Research Study 2026 - The Courier [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Personal data protection is essential to fully capitalise on the benefits of India's digital revolution: Cyble - PR Newswire India [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Making the most of your packaging line - Food & Drink Business [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Electro Diesel Locomotive Market Trends, Innovation, Growth Opportunities, Demand, Application, Top Companies and Industry Forecast 2027 | CRRC,... [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Edge Computing Market : Overview Report by 2020, Covid-19 Analysis, Future Plans and Industry Growth with High CAGR by Forecast 2026 - The Courier [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Data Analytics Outsourcing Market 2020 Top Emerging Trends Impacting the Growth Due to COVID19 and In-Depth Compitative Intelligence - Murphy's Hockey... [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Making it Real: Effective Data Governance in the Age of AI - Datanami [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Yield10 Bioscience Researcher Dr. Meghna Malik to Present at the 4th CRISPR AgBio Congress 2020 Virtual Event - GlobeNewswire [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- The Solution Approach Of The Great Indian Hiring Hackathon: Winners' Take - Analytics India Magazine [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Mining Software Market 2020-2026: COVID-19 Impact and Revenue Opportunities after Post Pandemic - Murphy's Hockey Law [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Data Quality Tools Market 2026 Growth Forecast Analysis by Manufacturers, Regions, Type and Application - The Market Feed [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Rising Uptake of Big Data Analytics Software for Business to Propel Big Data and Business Analytics Market Wall Street Call - Reported Times [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- HPE, a touchstone of Silicon Valley, moving headquarters to Houston to save costs, recruit talent - San Francisco Chronicle [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Several Robinhood Favorites See Selling Pressure on Wednesday - TheStreet [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Data Mining Tools Market to Reflect Impressive Growth Rate Along with Top Leading Players - The Haitian-Caribbean News Network [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Supply Chain Management: Lessons to Drive Growth and Profits Using Data Mining and Analytics | Quantzig - Business Wire [Last Updated On: December 3rd, 2020] [Originally Added On: December 3rd, 2020]
- Top 5 trends and predictions for market research in 2021 - AZ Big Media [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Space Mining Market Trends Analysis, Top Manufacturers, Shares, Growth Opportunities, Statistics & Forecast to 2026 - BAVIATION Business Aviation... [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Citi Launches Citi Fleet Card in the UK and Europe - Business Wire [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Facebook Accused Of Illegally Conspiring With Google - ValueWalk [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Data Mining Tools Market Top Manufacturers, Product Types, Applications and Specification, Forecast to 2028 - BIZNEWS [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- INTRUSION Inc. Expands Executive Team with Focus on Amplification of New Cybersecurity Solutions - GlobeNewswire [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Essnova Solutions Named to Inc. 500 List of Fastest Growing Companies - Business Wire [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Ready Money Capital Limited Now Offers Financial Solutions for All and Sundry - PRNewswire [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- The 3 Robinhood Stocks I'm Most Excited About - Motley Fool [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Data Mining Tools Market Business Growth Tactics, Future Strategies, Competitive Outlook and Forecast - BAVIATION Business Aviation News [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
- Supernova's Clients Wanted a New Data Insights Tool, So the Company Built 1 From Scratch - Built In Chicago [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]