LLMs cant self-correct in reasoning tasks, DeepMind study finds – TechTalks

Image generated with Bing Image Creator

This article is part of ourcoverage of the latest inAI research.

Scientists are inventing various strategies to enhance the accuracy and reasoning abilities of large language models (LLM) such as retrieval augmentation and chain-of-thought reasoning.

Among these, self-correctiona technique where an LLM refines its own responseshas gained significant traction, demonstrating efficacy across numerous applications. However, the mechanics behind its success remain elusive.

A recent study conducted by Google DeepMind in collaboration with the University of Illinois at Urbana-Champaign reveals that LLMs often falter when self-correcting their responses without external feedback. In fact, the study suggests that self-correction can sometimes impair the performance of these models, challenging the prevailing understanding of this popular technique.

Self-correction is predicated on the idea that LLMs can assess the accuracy of their outputs and refine their responses. For instance, an LLM might initially fail a math problem but correct its answer after reviewing its own output and reasoning.

Several studies have observed this process, also known as self-critique, self-refine, or self-improve.

However, the effectiveness of self-correction is not universal across all tasks. The paper from DeepMind and University of Illinois reveals that the success of self-correction is largely contingent on the nature of the task at hand. In reasoning tasks, self-correction techniques typically succeed only when they can leverage external sources, such as human feedback, an external tool like a calculator or code executor, or a knowledge base.

The researchers underscore the fact that high-quality feedback is not always accessible in many applications. This makes it crucial to understand the inherent capabilities of LLMs and to discern how much of the self-correction can be attributed to the models internal knowledge. They introduce the concept of intrinsic self-correction, which refers to a scenario where the model attempts to correct its initial responses based solely on its built-in capabilities, without any external feedback.

The researchers put self-correction to the test on several benchmarks that measure model performance in solving math word problems, answering multiple-choice questions, and tackling question-answering problems that require reasoning. They employed a three-step process for self-correction. First, they prompt the model for an answer. Next, they prompt it to review its previous response. Finally, they prompt it a third time to answer the original question based on its self-generated feedback.

Their findings reveal that self-correction works effectively when the models have access to the ground-truth labels included in the benchmark datasets. This is because the algorithm can accurately determine when to halt the reasoning process and avoid changing the answer when it is already correct. As the researchers state, These results use ground-truth labels to prevent the model from altering a correct answer to an incorrect one. However, determining how to prevent such mischanges is, in fact, the key to ensuring the success of self-correction.

However, this assumption does not reflect real-world scenarios, where access to the ground truth is not always available. If the ground truth were readily accessible, there would be no need to employ a machine learning model to predict it. The researchers demonstrate that when they remove the labels from the self-correction process, the performance of the models begins to decline significantly.

Interestingly, the models often produce the correct answer initially, but switch to an incorrect response after self-correction. For instance, in GPT-3.5-Turbo (the model used in the free version of ChatGPT), the performance dropped by almost half on the CommonSenseQA question-answering dataset when self-correction was applied. GPT-4 also exhibited a performance drop, albeit by a smaller margin.

According to the researchers, if the model is well-aligned and paired with a thoughtfully designed initial prompt, the initial response should already be optimal given the conditions of the prompt and the specific decoding algorithm. In this case, introducing feedback can be viewed as adding an additional prompt, potentially skewing the models response away from the optimal prompt. In an intrinsic self-correction setting, on the reasoning tasks, this supplementary prompt may not offer any extra advantage for answering the question. In fact, it might even bias the model away from producing an optimal response to the initial prompt, resulting in a decrease in performance, the researchers write.

Self-correction is also prevalent in multi-agent LLM applications. In these scenarios, multiple instances of an LLM, such as ChatGPT, are given different instructions to perform distinct roles in a multi-sided debate. For instance, one agent might be tasked with generating code, while another is instructed to review the code for errors.

In these applications, self-correction is implemented by instructing agents to critique each others responses. However, the researchers found that this multi-agent critique does not lead to any form of improvement through debate. Instead, it results in a form of self-consistency, where the different agents generate multiple responses and then engage in a form of majority voting to select an answer.

Rather than labeling the multi-agent debate as a form of debate or critique, it is more appropriate to perceive it as a means to achieve consistency across multiple model generations, the researchers write.

While self-correction may not enhance reasoning, the researchers found that it can be effective in tasks such as modifying the style of the LLMs output or making the response safer. They refer to these tasks as post-hoc prompting, where the prompting is applied after the responses have been generated. They write, Scenarios in which self-correction enhances model responses occur when it can provide valuable instruction or feedback that pre-hoc prompting cannot.

Another key finding of the paper is that the improvement attributed to self-correction in certain tasks may be due to an inadequately crafted initial instruction that is outperformed by a carefully constructed feedback prompt. In such cases, incorporating the feedback into the initial instruction, referred to as the pre-hoc prompt, can yield better results and reduce inference costs. The researchers state, It is meaningless to employ a well-crafted post-hoc prompt to guide the model in self-correcting a response generated through a poorly constructed pre-hoc prompt. For a fair comparison, equal effort should be invested in both pre-hoc and post-hoc prompting.

The researchers conclude by urging the community to approach the concept of self-correction with skepticism and to apply it judiciously.

It is imperative for researchers and practitioners to approach the concept of self-correction with a discerning perspective, acknowledging its potential and recognizing its boundaries, the researchers write. By doing so, we can better equip this technique to address the limitations of LLMs, steering their evolution towards enhanced accuracy and reliability.

Read more from the original source:
LLMs cant self-correct in reasoning tasks, DeepMind study finds - TechTalks

Working at DeepMind | Glassdoor [Last Updated On: September 8th, 2019] [Originally Added On: September 8th, 2019]
DeepMind Q&A Dataset - New York University [Last Updated On: October 6th, 2019] [Originally Added On: October 6th, 2019]
Google absorbs DeepMind healthcare unit 10 months after ... [Last Updated On: October 7th, 2019] [Originally Added On: October 7th, 2019]
deep mind Mathematics, Machine Learning & Computer Science [Last Updated On: November 1st, 2019] [Originally Added On: November 1st, 2019]
Health strategies of Google, Amazon, Apple, and Microsoft - Business Insider [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
To Understand The Future of AI, Study Its Past - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
Tremor patients can be relieved of the shakes for THREE YEARS after having ultrasound waves - Herald Publicist [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
The San Francisco Gay Mens Chorus Toured the Deep South - SF Weekly [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
The Universe Speaks in Numbers: The deep relationship between math and physics - The Huntington News [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
MINI John Cooper Works GP is a two-seater hot hatch that shouts its 306 HP - SlashGear [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
How To Face An Anxiety Provoking Situation Like A Champion - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
The Most Iconic Tech Innovations of the 2010s - PCMag [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
Why tech companies need to hire philosophers - Quartz [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
Living on Purpose: Being thankful is a state of mind - Chattanooga Times Free Press [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
EDITORIAL: West explosion victims out of sight and clearly out of mind - Waco Tribune-Herald [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
Do you need to sit still to be mindful? - The Sydney Morning Herald [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Listen To Two Neck Deep B-Sides, Beautiful Madness And Worth It - Kerrang! [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Worlds Last Male Northern White Rhino Brought Back To Life Using AI - International Business Times [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Eat, drink, and be merryonly if you keep in mind these food safety tips - Williamsburg Yorktown Daily [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
The alarming trip that changed Jeremy Clarksons mind on climate change - The Week UK [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Actionable Insights on Artificial Intelligence in Law Market with Future Growth Prospects by 2026 | AIBrain, Amazon, Anki, CloudMinds, Deepmind,... [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Searching for the Ghost Orchids of the Everglades - Discover Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Parkinsons tremors could be treated with SOUNDWAVES, claim scientists - Herald Publicist [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
Golden State Warriors still have prolonged success in mind - Blue Man Hoop [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
3 Gratitude Habits You Can Adopt Over The Thanksgiving Holiday For Deeper Connection And Joy - Forbes [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
The minds that built AI and the writer who adored them. - Mash Viral [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
Parkinson's Patients are Mysteriously Losing the Ability to Swim After Treatment - Discover Magazine [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
Hannah Fry, the woman making maths cool | Times2 - The Times [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
Meditate with Urmila: Find balance of body, mind and breath - Gulf News [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
We have some important food safety tips to keep in mind while cooking this Thanksgiving - WQOW TV News 18 [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
Being thankful is a state of mind | Opinion - Athens Daily Review [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
Can Synthetic Biology Inspire The Next Wave of AI? - SynBioBeta [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
LIVING ON PURPOSE: Being thankful is a state of mind - Times Tribune of Corbin [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
AI Hardware Summit Europe launches in Munich, Germany on 10-11 March 2020, the ecosystem event for AI hardware acceleration in Europe - Yahoo Finance [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
Of course Facebook and Google want to solve social problems. Theyre hungry for our data - The Guardian [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
Larry, Sergey, and the Mixed Legacy of Google-Turned-Alphabet - WIRED [Last Updated On: December 6th, 2019] [Originally Added On: December 6th, 2019]
AI Index 2019 assesses global AI research, investment, and impact - VentureBeat [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
For the Holidays, the Gift of Self-Care - The New York Times [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
Stopping a Mars mission from messing with the mind - Axios [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
Feldman: Impeachment articles are 'high crimes' Founders had in mind | TheHill - The Hill [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
Opinion | Frankenstein monsters will not be taking our jobs anytime soon - Livemint [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
DeepMind co-founder moves to Google as the AI lab positions itself for the future - The Verge [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
Google Isn't Looking To Revolutionize Health Care, It Just Wants To Improve On The Status Quo - Newsweek [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
Artificial Intelligence Job Demand Could Live Up to Hype - Dice Insights [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
What Are Normalising Flows And Why Should We Care - Analytics India Magazine [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
Terence Crawford has next foe in mind after impressive knockout win - New York Post [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
DeepMind proposes novel way to train safe reinforcement learning AI - VentureBeat [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
Winning the War Against Thinking - So you've emptied your brain. Now what? - Chabad.org [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
'Echo Chamber' as Author of the 'Hive Mind' - Ricochet.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
Lindsey Graham: 'I Have Made Up My Mind' to Exonerate Trump and 'Don't Need Any Witnesses' WATCH - Towleroad [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
Blockchain in Healthcare Market to 2027 By Top Leading Players: iSolve LLC, Healthcoin, Deepmind Health, IBM Corporation, Microsoft Corporation,... [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
In sight but out of mind - The Hindu [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
The Case for Limitlessness Has Its Limits: Review of Limitless Mind by Joe Boaler - Education Next - EducationNext [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
The Top 10 Diners In Deep East Texas, According To Yelp - ksfa860.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
3 breathing exercises to reduce stress, anxiety and a racing mind - Irish Examiner [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
DeepMind exec Andrew Eland leaves to launch startup - Sifted [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
The Top 10 Diners In Deep East Texas, According To Yelp - kicks105.com [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
Mind the Performance Gap New Future Purchasing Category Management Report Out Now - Spend Matters [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
Madison singles and deep cuts that stood out in 2019 - tonemadison.com [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
Hilde Lee: Latkes bring an ancient miracle to mind on first night of Hanukkah - The Daily Progress [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
Political Cornflakes: Trump responds to impeachment with complaints about the 'deep state' and toilet flushing - Salt Lake Tribune [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
Google CEO Sundar Pichai Is the Most Expensive Tech CEO to Keep Around - Observer [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
Christmas Lectures presenter Dr Hannah Fry on pigeons, AI and the awesome power of maths - inews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
The ultimate guitar tuning guide: expand your mind with these advanced tuning techniques - Guitar World [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
Inside The Political Mind Of Jerry Brown - Radio Ink [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
Elon Musk Fact-Checked His Own Wikipedia Page and Requested Edits Including the Fact He Does 'Zero Investing' - Entrepreneur [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
The 9 Best Blobs of 2019 - Livescience.com [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
AI from Google is helping identify animals deep in the rainforest - Euronews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
Want to dive into the lucrative world of deep learning? Take this $29 class. - Mashable [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
Re: Your Account Is Overdrawn - Thrive Global [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
Review: In the Vale is full of characters who linger long in the mind - Nation.Cymru [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
10 Gifts That Cater to Your Loved One's Basic Senses - Wide Open Country [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
The Most Mind-Boggling Scientific Discoveries Of 2019 Include The First Image Of A Black Hole, A Giant Squid Sighting, And An Exoplanet With Water... [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
DeepMind's new AI can spot breast cancer just as well as your doctor - Wired.co.uk [Last Updated On: January 1st, 2020] [Originally Added On: January 1st, 2020]
Why the algorithms assisting medics is good for health services (Includes interview) - Digital Journal [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
2020: The Rise of AI in the Enterprise - IT World Canada [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
An instant 2nd opinion: Google's DeepMind AI bests doctors at breast cancer screening - FierceBiotech [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
Google's DeepMind AI outperforms doctors in identifying breast cancer from X-ray images - Business Insider UK [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
New AI toolkit from the World Economic Forum is promising because it's free - The National [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]
AKA Wants to Help People Break Bad Habits and Create New Positive Ones - Hospitality Net [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]

Cloud Hosting

LLMs cant self-correct in reasoning tasks, DeepMind study finds – TechTalks

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin