AIs big players all flunked a major transparency assessment of their LLMs – Fortune

Hello and welcome to Eye on AI. This week was a big one for AI research, and were going to start by diving into perhaps the most comprehensive attempt to interrogate the transparency of leading LLMs yet.

The Stanford Institute for Human-Centered AI released its Foundation Model Transparency Index, which rates major foundational model developers to evaluate their transparency. Driven by the fact that public transparency around these models is plummeting just as the societal impacts of them are skyrocketing, the researchers evaluated 100 different indicators of transparency across how a company builds a foundation model, how that model works, and how its actually used. They focused on 10 major foundation model developersOpenAI, Anthropic, Google, Meta, Amazon, Inflection, AI21 Labs, Cohere, Hugging Face, and Stabilityand designated a single flagship model from each developer for evaluation.

Eye on AI talked with one of the researchers behind the index to get a deeper understanding of how the companies responded to their findings, what it all means about the state of AI, and their plans for the index going forward, but first lets get into the results. To sum it up, everyone failed.

Meta (evaluated for LLama 2) topped the rankings with an unimpressive score of 54 out of 100. Hugging Face (BLOOMZ) came in right behind with 53 but scored a notable 0% in both the overall risk and mitigations categories. OpenAI (GPT-4) scored a 48, Stability (Stable Diffusion 2) scored a 47, Google (PaLM 2) scored a 40, and Anthropic (Claude 2) scored a 36. Cohere (Command), AI21 Labs (Jurassic-2), and Inflection (Inflection-1) spanned the mid-30s to low 20s, and Amazon (Titan Text) scored a strikingly low 12, though its worth noting its model is still in private preview and hasnt yet been released for general availability.

We anticipated that companies would be opaque, and that played out with the top score of 54 and the average of a mere 37/100, Rishi Bommasani, CRFM Society Lead at Stanford HAI, told Eye on AI. What we didnt expect was how opaque companies would be on critical areas: Companies disclose even less than we expected about data and compute, almost nothing about labor practices, and almost nothing about the downstream impact of their models.

The researchers contacted all of the companies to give them a chance to respond after they came up with their first draft of the ratings. And while Bommasani said they promised to keep those communications private and wouldnt elaborate on specifics like how Amazon responded to such a low score, he said all 10 companies engaged in correspondence. Eight of the 10 companies (all but AI21 Labs and Google) contested specific scores, arguing that their scores should be 8.75 points higher on average, and eventually had their scores adjusted by 1.25 points on average.

The results say a lot about the current state of AI. And no, it wasnt always like this.

The successes of the 2010s with deep learning came about through significant transparency and the open sharing of datasets, models, and code, Bommasani said. In the 2020s, we have seen that change: Many top labs dont release models, even more dont release datasets, and sometimes we dont even have papers written about widely deployed models. This is a familiar feeling of societal impact skyrocketing while transparency is plummeting.

He pointed to social media as another example of this shift, pointing to how the technology has become increasingly opaque over time as it becomes more powerful in our lives. AI looks to be headed down the same path, which we are hoping to countervail, he said.

AI has quickly gone from specialized researchers tinkering to the tech industrys next (and perhaps biggest ever) opportunity to capture both revenue and world-altering power. It could easily create new behemoths and topple current ones. The off to the races feeling has been intensely palpable ever since OpenAI released ChatGPT almost a year ago, and tech companies have repeatedly shown us theyll prioritize their market competitiveness and shareholder value above privacy, safety, and other ethical considerations. There arent any requirements to be transparent, so why would they be? As Bommasani said, weve seen this play out before.

While this is the first publication of the FMTI index, it definitely wont be the last. The researchers plan to conduct the analysis on a repeated basis, and they hope to have the resources to operate on a quicker cadence than the annual turnaround most indices are conducted in or to mirror the frenetic pace of AI.

Programming note:Gain vital insights on how the most powerful and far-reaching technology of our time is changing businesses, transforming society, and impacting our future. Join us inSan Francisco on Dec. 1112forFortunes third annualBrainstorm A.I.conference. Confirmed speakers include such A.I. luminaries asSalesforce AI CEOClara Shih,IBMsChristina Montgomery, Quizlets CEOLex Bayer,and more.Apply to attendtoday!

And with that, heres the rest of this weeks AI news.

Sage Lazzarosage.lazzaro@consultant.fortune.comsagelazzaro.com

Hugging Face confirms users in China are unable to access its platform. Thats according to Semafor. Chinese users have been complaining of issues connecting to the AI startups popular open-source platform since May, and its been fully unavailable in China since at least Sept. 12. Its not exactly clear what prompted action toward the company, but the Chinese government routinely blocks access to websites it disapproves of. It could also be related to local regulations regarding foreign AI companies that recently went into effect.

Canva unveils suite of AI tools for the classroom. Just two weeks after Canva introduced an extensive suite of AI-powered tools and capabilities, the online design platform announced a suite of AI-powered design tools targeted specifically to teachers and students. The AI-powered tools will live in the companys Canva for Education platform and include a writing assistant, translation capabilities, alt text suggestions, Magic Grab, and the ability to animate designs with one click.

Apple abruptly cancels John Stewarts show over tensions stemming from his interest in covering AI and China. Thats according to the New York Times. The third season of The Promise was already in production and set to begin filming soon before Stewart was (literally) canceled. The details of the dispute over covering AI and China are not clear, but Apples deep ties with China have come under increased scrutiny lately as tensions with the country rise and the U.S. takes action to limit the transfer of AI technologies between the U.S. and China. The company is also starting to move some of its supply chain out of China.

China proposes a global initiative for AI governance. The Cyberspace Administration of China (CAC) announced the Global AI Governance Initiative, calling out the urgency of managing the transition to AI and outlining a series of principles and actions around the need for laws, ethical guidelines, personal security, data security, geopolitical cooperation, and an emphasis on a people-centered approach to AI, according to The Center for AI and Digital Policy newsletter Update 5.40. The document emphasizes the dual nature of AI as a technology that has both the ability to drive progress but also unpredictable risks and complicated challenges.

Eric Schmidt and Mustafa Suleyman call for an international panel on AI safety. The former Google CEO and DeepMind/Inflection AI cofounder published their call to action in the Financial Times. Arguing that lawmakers still lack a basic understanding of AI, they write that calls to just regulate are as loud, and as simplistic, as calls to simply press on. They propose an independent, expert-led body inspired by the Intergovernmental Panel on Climate Change (IPCC), which is mandated to provide policymakers with regular assessments of the scientific basis of climate change, its impacts and future risks, and options for adaptation and mitigation.

Polling the people. Anthropic this past week published the results of an experiment around what it calls constitutional AI, a method for designing AI models so theyre guided by a list of high-level principles. The company paneled around 1,000 American adults about what sort of principles they think would be important for an AI model to abide by and then trained a smaller version of Claude based on their suggestions. They then compared the resulting model to Claude, which was trained on a constitution designed by Anthropic employees.

Overall, the results showed about a 50% overlap in concepts and values between the two constitutions. The model trained on the peoples constitution focused more on objectivity, impartiality, and promoting desired behaviors for the model to abide by rather than laying out behaviors to avoid. The people also came up with some principles that were lacking from Anthropics version, such as Choose the response that is most understanding of, adaptable, accessible, and flexible to people with disabilities. The model created with the peoples constitution was also slightly less biased than the commercially available version, though the models performed similarly overall.

Its also important to take note of Anthropics methodology. While the company said it sought a representative sample across age, gender, income, and geography, one factor noticeably missing is race. This is especially concerning as evidence has repeatedly shown that people of color are adversely affected by racial bias and accuracy issues in AI models.

How Sam Altman got it wrong on a key part of AI: Creativity has been easier for AI than people thought Rachyl Jones

OpenAIs winning streak falters with reported failure of Arrakis project David Meyer

Nvidia thought it found a way around U.S. export bans of AI chips to Chinanow Biden is closing the loophole and investors arent happy Christiaan Hetzner

Sick of meetings? Microsofts new AI assistant will go in your place Chloe Taylor

Why boomers are catching up with AI faster than Gen Zers, according to Microsofts modern work lead Jared Spataro

How AI can help the shipping industry cut carbon emissions Megan Arnold

Billionaire AI investor Vinod Khoslas advice to college students: Get as broad an education as possible Jeff John Roberts

Would you let Meta read your mind? The tech giant perhaps most synonymous with invading user privacy announced it reached an important milestone in its pursuit of using AI to visualize human thought.

Using a noninvasive neuroimaging technique called magnetoencephalography (MEG), Meta AI researchers showcased a system capable of decoding the unfolding of visual representations in the brain with an unprecedented temporal resolution. In other words, the system can analyze a persons brain activity and then reconstruct visuals depicting what their brain is seeing and processing. While they only reached accuracy levels of 70% in their highest-performing test cases, the researchers note in their paper that this is seven times better than existing models.

The fact that the AI announcements coming out of tech companies in a single week range from animate text with one click to decode and reconstruct human thought shows how incredibly wide-reaching and powerful this technology is. Its hard to imagine theres a corner of society and humanity it wont touch.

Continue reading here:
AIs big players all flunked a major transparency assessment of their LLMs - Fortune

Related Posts

Comments are closed.