Page 36«..1020..35363738..5060..»

Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer – NVIDIA Blog

Enhancing Japans AI sovereignty and strengthening its research and development capabilities, Japans National Institute of Advanced Industrial Science and Technology (AIST) will integrate thousands of NVIDIA H200 Tensor Core GPUs into its AI Bridging Cloud Infrastructure 3.0 supercomputer (ABCI 3.0). The HPE Cray XD system will feature NVIDIA Quantum-2 InfiniBand networking for superior performance and scalability.

ABCI 3.0 is the latest iteration of Japans large-scale Open AI Computing Infrastructure designed to advance AI R&D. This collaboration underlines Japans commitment to advancing its AI capabilities and fortifying its technological independence.

In August 2018, we launched ABCI, the worlds first large-scale open AI computing infrastructure, said AIST Executive Officer Yoshio Tanaka. Building on our experience over the past several years managing ABCI, were now upgrading to ABCI 3.0. In collaboration with NVIDIA and HPE, we aim to develop ABCI 3.0 into a computing infrastructure that will advance further research and development capabilities for generative AI in Japan.

As generative AI prepares to catalyze global change, its crucial to rapidly cultivate research and development capabilities within Japan, said AIST Solutions Co. Producer and Head of ABCI Operations Hirotaka Ogawa. Im confident that this major upgrade of ABCI in our collaboration with NVIDIA and HPE will enhance ABCIs leadership in domestic industry and academia, propelling Japan towards global competitiveness in AI development and serving as the bedrock for future innovation.

ABCI 3.0 is constructed and operated by AIST, its business subsidiary, AIST Solutions, and its system integrator, Hewlett Packard Enterprise (HPE).

The ABCI 3.0 project follows support from Japans Ministry of Economy, Trade and Industry, known as METI, for strengthening its computing resources through the Economic Security Fund and is part of a broader $1 billion initiative by METI that includes both ABCI efforts and investments in cloud AI computing.

NVIDIA is closely collaborating with METI on research and education following a visit last year by company founder and CEO, Jensen Huang, who met with political and business leaders, including Japanese Prime Minister Fumio Kishida, to discuss the future of AI.

Huang pledged to collaborate on research, particularly in generative AI, robotics and quantum computing, to invest in AI startups and provide product support, training and education on AI.

During his visit, Huang emphasized that AI factories next-generation data centers designed to handle the most computationally intensive AI tasks are crucial for turning vast amounts of data into intelligence.

The AI factory will become the bedrock of modern economies across the world, Huang said during a meeting with the Japanese press in December.

With its ultra-high-density data center and energy-efficient design, ABCI provides a robust infrastructure for developing AI and big data applications.

The system is expected to come online by the end of this year and offer state-of-the-art AI research and development resources. It will be housed in Kashiwa, near Tokyo.

The facility will offer:

NVIDIA technology forms the backbone of this initiative, with hundreds of nodes each equipped with 8 NVLlink-connected H200 GPUs providing unprecedented computational performance and efficiency.

NVIDIA H200 is the first GPU to offer over 140 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s). The H200s larger and faster memory accelerates generative AI and LLMs, while advancing scientific computing for HPC workloads with better energy efficiency and lower total cost of ownership.

The integration of advanced NVIDIA Quantum-2 InfiniBand with In-Network computing where networking devices perform computations on data, offloading the work from the CPU ensures efficient, high-speed, low-latency communication, crucial for handling intensive AI workloads and vast datasets.

ABCI boasts world-class computing and data processing power, serving as a platform to accelerate joint AI R&D with industries, academia and governments.

METIs substantial investment is a testament to Japans strategic vision to enhance AI development capabilities and accelerate the use of generative AI.

By subsidizing AI supercomputer development, Japan aims to reduce the time and costs of developing next-generation AI technologies, positioning itself as a leader in the global AI landscape.

Read the original:

Japan Enhances AI Sovereignty With Advanced ABCI 3.0 Supercomputer - NVIDIA Blog

Read More..

Google says Gemini AI is making its robots smarter – The Verge

Google is training its robots with Gemini AI so they can get better at navigation and completing tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pros long context window which dictates how much information an AI model can process allows users to more easily interact with its RT-2 robots using natural language instructions.

This works by filming a video tour of a designated area, such as a home or office space, with researchers using Gemini 1.5 Pro to make the robot watch the video to learn about the environment. The robot can then undertake commands based on what it has observed using verbal and / or image outputs such as guiding users to a power outlet after being shown a phone and asked where can I charge this? DeepMind says its Gemini-powered robot had a 90 percent success rate across over 50 user instructions that were given in a 9,000-plus-square-foot operating area.

Researchers also found preliminary evidence that Gemini 1.5 Pro enabled its droids to plan how to fulfill instructions beyond just navigation. For example, when a user with lots of Coke cans on their desk asks the droid if their favorite drink is available, the team said Gemini knows that the robot should navigate to the fridge, inspect if there are Cokes, and then return to the user to report the result. DeepMind says it plans to investigate these results further.

The video demonstrations provided by Google are impressive, though the obvious cuts after the droid acknowledges each request hide that it takes between 1030 seconds to process these instructions, according to the research paper. It may take some time before were sharing our homes with more advanced environment-mapping robots, but at least these ones might be able to find our missing keys or wallets.

Read more from the original source:

Google says Gemini AI is making its robots smarter - The Verge

Read More..

Exclusive: Intuit is laying off 1,800 employees as AI leads to a strategic shift – Fortune

Intuit will tell approximately 1,800 of its global employees10% of its workforcethey will be leaving the company. But leadership says the move isnt to cut costs.

Sasan Goodarzi, CEO of the Fortune 500 company, which offers products like QuickBooks, Credit Karma, and TurboTax, wrote an internal email to employees, seen by Fortune, announcing the very difficult decisions my leadership team and I have made.

Goodarzi explains that Intuits transformation journey, including departing from the 1,800 employees, is part of its strategy to increase investments in priority focus areas of AI and generative AI, such as its GenAI-powered financial assistant called Intuit Assist, and reimagining its products from traditional workflows to AI-native experiences. The strategy also focuses on money movement, mid-market expansion for small businesses, and international growth.

We do not do layoffs to cut costs, and that remains true in this case, Goodarzi writes. Intuit plans to hire approximately 1,800 new people with strategic functional skill sets primarily in engineering, product, and customer-facing roles such as sales, customer success, and marketingand expects its overall headcount to grow in its fiscal year 2025, which begins Aug. 1.

Of the employees who will depart Intuit, 1,050 are not meeting expectations based on a formal performance management process. The company believes they will be more successful outside of Intuit, Goodarzi writes. In addition, Intuit is reducing the number of executivesdirectors, SVPs, and EVPsby approximately 10%, expanding certain executive roles and responsibilities.

Intuit is also consolidating 80 tech roles to sites where it is growing technology teams, including Atlanta, Bangalore, New York, Tel Aviv, and Toronto. The company is closing two sites in Edmonton and Boise that have over 250 employees, with a certain number of employees relocating to other sites within Intuit or leaving the company. Intuit is also eliminating more than 300 roles across the company to streamline work and reallocate resources toward key growth areas, according to the email.

All departing U.S. employees will receive a package that includes a minimum of 16 weeks of pay, plus two additional weeks for every year of service. They will have 60 days before they leave the company, with a last day of Sept. 9. Employees outside the U.S. will receive similar support, taking into account local requirements.

This timing allows everyone leaving to reach their July vesting date for restricted stock unitsand the July 31 eligibility date for annual IPI bonuses, Goodarzi writes. Those not on an IPI plan will be able to reach the eligibility date for July or Q4 incentives. Its the most generous severance package Intuit has ever offered, according to the company.

Intuit is in a position of strength, according to Goodarzi. The company earned $14.4 billion in revenue in its fiscal year 2023, moving up 24 spots on the Fortune 500. For the period ending April 30, Intuit reported revenue of $6.7 billion, up 12%.

See the article here:

Exclusive: Intuit is laying off 1,800 employees as AI leads to a strategic shift - Fortune

Read More..

Investors are underestimating the AI capabilities of this big-cap tech stock, analysts say – CNBC

Investors are underappreciating Amazon's artificial intelligence potential, according to some analysts. "While consensus believes that AMZN lags behind its mega-cap peers in AI capabilities, AMZN highlighted that it has launched over two times the number of AI services than MSFT and GOOGL combined since 2023," wrote BMO Capital Markets' Brian Pitz. "We believe investors are underestimating AMZN's positioning to win its fair share of AI workloads and believe its platform approach with Bedrock will be a winning strategy over the long term." The analyst has an outperform rating on Amazon and a price target of $220, which signals 10% upside from Wednesday's close. These tools and initiatives should support ongoing and accelerating revenue growth through 2024 and more than 20% growth in 2025, according to JPMorgan analyst Doug Anmuth. He called the stock a best idea, viewing AWS as offering a "best-in-class" array of large language models and foundation models tailored to developer needs. Anmuth has an overweight rating on the stock and a $240 price target, which implies upside of 20%. The commentary comes on the heels of the e-commerce giant's Amazon Web Services summit in New York, where the company highlighted the ways customers are implementing AI workloads and showcased its rapidly growth AI platform know as Bedrock. Amazon shares have rallied about 30% year to date. AMZN YTD mountain Shares this year Citi's Ronald Josey also reaffirmed his confidence in AWS estimates for the second quarter and onward following the event. The analyst also believes demand is gradually moving toward companies offering multiple specialized models. He retained a $245 price target on shares and called the stock a top Internet pick. The price target implies 23% upside from Wednesday's close.

See more here:

Investors are underestimating the AI capabilities of this big-cap tech stock, analysts say - CNBC

Read More..

Intuits AI gamble: Mass layoff of 1,800 paired with hiring spree – Ars Technica

On Wednesday, Intuit CEO Sasan Goodarzi announced in a letter to the company that it would be laying off 1,800 employeesabout 10 percent of its workforce of around 18,000while simultaneously planning to hire the same number of new workers as part of a major restructuring effort purportedly focused on AI.

"As Ive shared many times, the era of AI is one of the most significant technology shifts of our lifetime," wrote Goodarzi in a blog post on Intuit's website. "This is truly an extraordinary timeAI is igniting global innovation at an incredible pace, transforming every industry and company in ways that were unimaginable just a few years ago. Companies that arent prepared to take advantage of this AI revolution will fall behind and, over time, will no longer exist."

The CEO says Intuit is in a position of strength and that the layoffs are not cost-cutting related, but they allow the company to "allocate additional investments to our most critical areas to support our customers and drive growth." With new hires, the company expects its overall headcount to grow in its 2025 fiscal year.

Intuit's layoffs (which collectively qualify as a "mass layoff" under the WARN act) hit various departments within the company, including closing Intuit's offices in Edmonton, Canada, and Boise, Idaho, affecting over 250 employees. Approximately 1,050 employees will receive layoffs because they're "not meeting expectations," according to Goodarzi's letter. Intuit has also eliminated more than 300 roles across the company to "streamline" operations and shift resources toward AI, and the company plans to consolidate 80 tech roles to "sites where we are strategically growing our technology teams and capabilities," such as Atlanta, Bangalore, New York, Tel Aviv, and Toronto.

In turn, the company plans to accelerate investments in its AI-powered financial assistant, Intuit Assist, which provides AI-generated financial recommendations. The company also plans to hire new talent in engineering, product development, data science, and customer-facing roles, with a particular emphasis on AI expertise.

Despite Goodarzi's heavily AI-focused message, the restructuring at Intuit reveals a more complex picture. A closer look at the layoffs shows that many of the 1,800 job cuts stem from performance-based departures (such as the aforementioned 1,050). The restructuring also includes a 10 percent reduction in executive positions at the director level and above ("To continue increasing our velocity of decision making," Goodarzi says).

These numbers suggest that the reorganization may also serve as an opportunity for Intuit to trim its workforce of underperforming staff, using the AI hype cycle as a compelling backdrop for a broader house-cleaning effort.

But as far as CEOs are concerned, it's always a good time to talk about how they're embracing the latest, hottest thing in technology: "With the introduction of GenAI," Goodarzi wrote, "we are now delivering even more compelling customer experiences, increasing monetization potential, and driving efficiencies in how the work gets done within Intuit. But its just the beginning of the AI revolution."

Follow this link:

Intuits AI gamble: Mass layoff of 1,800 paired with hiring spree - Ars Technica

Read More..

Alma co-founder had such a bad immigration experience she founded a legal AI startup to fix it – TechCrunch

When Aizada Marat moved from New York to California in 2018 with her husband, KODIF co-founder and CEO Chyngyz Dzhumanazarov, she needed to sort out her immigration status. Thats when everything started going badly.

The Kyrgyzstan-born, Harvard-educated attorney came to the U.S. when she was 17 for an exchange year with FLEX (future leaders exchange) sponsored by the U.S. State Department.

After graduating from Harvard, Marat moved to London because of immigration issues. Now she was coming out to California with Dzhumanazarov, who had been admitted to Stanford Business School, and to take a job offer at leading law firm Cooley.

But she didnt realize that immigration lawyers can be very buyer-beware. Through a Google search she found a lawyer in Palo Alto to help her with her visa. That turned out to be a bad move. Marat said the lawyer gave her wrong advice about when she could file authorization to work in California. That mistake caused her to not be able to work for more than a year. She also could not leave the country.

Im a lawyer, so I listen to what lawyers say, Marat told TechCrunch. Unfortunately, listening to them was devastating because months later, I was still unable to work. I had a job offer from Cooley.

Marat did end up getting to work at Cooley for three years. And she went back to that immigration law firm and showed them the mistake they made with her. It also ignited an entrepreneurial fire in her.

After she left Cooley to work at McKinsey as a management consultant, Marat kept coming back to that horrible immigration experience. So much so that she started thinking about why immigration legal services were of poorer quality given the long and complicated immigration process.

She learned that immigration law is super fragmented, meaning that 10% of the market is owned by one law firm while the other 90% is shared among over 20,000 law firms.

Very few big law firms have immigration services today because it is mainly serving individuals, and those are small checks, Marat said. Thats why, to get a talent visa green card, the majority of the time, people can self-petition. They dont even need an employer. Cooley, in my case, wouldnt really sponsor visas, so I had to sort it out myself.

And when she thought about what to do about it, Marat set out to start her own company developing software to sell to immigration attorneys. The goal was to help them deliver better services, so what happened to Marat wouldnt happen again.

After four or five months of selling that software to five immigration law firms, Marat and her team made the decision to provide immigration services directly. In October 2023, they launched Alma, an AI-powered legal tech startup that she started with other immigrants, including former Uber engineering manager Shuo Chen and former Step product manager Assel Tuleubayeva.

The startup aims to simplify the visa process for technologists, founders and researchers by providing personal legal advisors, helping to speed up document processing and digitally organizing the entire process. And like other companies working in this area, including Migrun, Boundless and Lawfully, Alma wants to fast-track international talent into Americas tech ecosystem, Marat said.

Marat says Alma differs from some competitors by leverage proprietary technology to provide high-quality services faster and employing its own immigration attorney.

Immigrants deserve high-quality services because so much depends on the immigration attorney that you find, Marat said. All the repetitive and mundane things that lawyers hate, we can automate so that lawyers actually focus on all clients and provide a really good strategy to get higher approval rates.

Helping to move the company forward is $5.1 million in combined seed and pre-seed funding that Alma recently raised. The company is backed by Bling Capital, Forerunner, Village Global, NFX, Conviction, MVP, NEA and Silkroad Innovation Hub. Much of the funding will go toward new hires for product and technology development.

View original post here:

Alma co-founder had such a bad immigration experience she founded a legal AI startup to fix it - TechCrunch

Read More..

China’s homegrown OS fires back at AI PCs openKylin gets AI assistant, text-to-image generation, and local LLM support – Tom’s Hardware

'AI' and'AI PC'are, of course, two big buzzwords these days and not only in the U.S. and Europe, but also in China, where openKylin just released what it's calling an operating system for AI PCs. OpenKylin is an open source OS based on Linux and maintained by the OpenKylin community, which is backed by a number of Chinese companies including Hygon and Phytium. Clearly, Chinese PC makers are interested in getting in on the AI PC craze but Windows remains China's most-used OS.

The new version of openKylin launched Sunday, and is "deeply" integrated with AI, featuring support for on-device large language models (LLMs), an AI-assistant, and text-to-image generation, according to a report by theSouth China Morning Post.

OpenKylin wants to get in on the AI PC trend, which has been driven by hope that AI applications will reinvigorate demand for PCs. The company says that AI integration is designed to boost "productivity and user experience" for those using domestic operating systems. AI PCs are generally equipped with advanced processors capable of running generative AI tasks locally instead of having to rely on cloud processing. It's unclear how the OS is supposed to accelerate AI workloads, and also whether it can take advantage of a CPU's built-in AI capabilities.

Also, while Microsoft'sCopilot+-badged PCsrequire neural processing units capable of handling at least 40 trillion operations per second (TOPS), it's not clear whether openKylin has any performance requirements for CPUs or NPUs.

PC maker Lenovo sees China as a unique market for AI PCs due to its data-localization requirements, so openKylin might be correct in assuming it can get big in China with AI capabilities. On Monday, China's state-run newspaper, Science and Technology Daily, described the AI-enhanced version of openKylin as "secure, stable, and controllable." This is because, unlike Windows, openKylin is developed entirely in China by 3,876 developers from 271 different companies.

OpenKylin resembles Microsoft Windows, and is part of China's effort to decrease its dependence on foreign operating systems (and American technology in general). This initiative has been ongoing for a while, and was given an extra boost by the ongoing trade war between the U.S. and China.

Despite these efforts, home-grown operating systems have struggled to gain significant traction and Windows remains the dominant operating system in China with nearly 80% of the market as of June 2024, according to StatCounter.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

The rest is here:

China's homegrown OS fires back at AI PCs openKylin gets AI assistant, text-to-image generation, and local LLM support - Tom's Hardware

Read More..

AI-powered Regard nabs $61M to find missed illness, boost hospital revenue – TechCrunch

People in tech often say that data is the new oil. That phrase, coined by British mathematician Clive Humby, of course implies that data is valuable.

Data about a persons health can also provide meaningful insights and improve outcomes, but only 3% of patient data is currently used by physicians, according to the World Economic Forum. Although doctors know they can glean useful information from patient data, they dont have the time to review every detail in the medical record.

Regard, a digital health startup founded in 2017, wants to help physicians save time and increase the accuracy of diagnosis by analyzing patients health data using AI.

Regard announced on Thursday that it raised a $61 million Series B round led by Oak HC/FT, with participation from Cedars-Sinai Health Ventures and existing investors TenOneTen, Calibrate Ventures and Techstars. The company is now valued at $350 million, according to a person familiar with the matter.

The companys software mines thousands of data points in a medical chart and presents data in a way that allows doctors to detect health conditions more easily.

Doctors use our product because we help them make sure that nothing important is missed in the data, Eli Ben-Joseph, Regards co-founder and CEO, told TechCrunch. Every single doctor we work with has a story about: I used your product and I found something that changed the way I treat this patient.

Ben-Joseph said that Regard has helped one general physician catch atrial fibrillation (irregular heartbeat) that the cardiologist did not notice. She now feels its irresponsible not to use our product, he said.

But doctors are not the only ones who find Regard valuable. Hospital financial administrators are also big fans, according to Ben-Joseph. Regards ability to identify new conditions creates new billing opportunities for medical systems.

The company grew its revenue by 4.5 times in 2023 and is on a path to do a similar amount of growth this year, Ben-Joseph said. The company expects to reach profitability within the next two years.

Such fast growth has investors excited.

We absolutely fell in love with [Regard] because it has a direct impact on physician productivity, burnout, proper coding and clinical outcomes, said Nancy Brown, general partner at Oak HC/FT.

Brown, who has over 30 years of experience as an operator and investor in healthcare technology, has always dreamt that a computer would provide insights from patient information. That dream has been foiled [over the years] by the lack of tech, she said. Thats why when she met Ben-Joseph at a healthcare conference earlier this year, she instantly recognized that Regard is the technology she has been dreaming about.

Since launching its product in 2021, Regard has signed up a number of large healthcare systems, including Banner Health (one of Arizonas largest health providers), Virginia-based Sentara Healthcare, New Yorks Montefiore Medical Center and Cedars-Sinai Medical Center in Los Angeles. Some details of its latest funding round were previously reported by Business Insider.

The companys competitors include Engage One, a product developed by multinational conglomerate 3M, and startup Pieces, according to Ben-Joseph.

Brown has no doubts that Regard is the leader in the space. They are a beautiful scaling company with great margins, and they are delivering a solid ROI for their clients, Brown said.

Visit link:

AI-powered Regard nabs $61M to find missed illness, boost hospital revenue - TechCrunch

Read More..

OpenAI and Arianna Huffington are working together on an AI health coach – The Verge

AI leaders are increasingly optimistic about the technologys potential in the health sector, especially when it comes to personalized bots that can comprehend and address individual health concerns.

OpenAI and Arianna Huffington are now jointly funding the development of an AI health coach through Thrive AI Health. In a Time magazine op-ed, OpenAI CEO Sam Altman and Huffington stated that the bot will be trained on the best peer-reviewed science alongside the personal biometric, lab, and other medical data youve chosen to share with it.

The company tapped DeCarlos Love, a formerGoogleexecutive who previously worked on Fitbit and other wearables, to be CEO. Thrive AI Health also established research partnerships with several academic institutions and medical centers likeStanford Medicine, the Rockefeller Neuroscience Institute atWest Virginia University, and the Alice L. Walton School of Medicine. (The Alice L. Walton Foundation is also a strategic investor in Thrive AI Health.)

AI-powered health coaches have become a popular fad: Fitbit is working on an AI chatbot coach, and Whoop added a ChatGPT-powered coach to give users more insight into their health metrics. In San Francisco, health data obsession is a staple. You wont go far without seeing someone wearing an Oura Ring or bragging about their sleep data from their Eight Sleep mattress.

Thrive AI Healths goal is to provide powerful insights to those who otherwise wouldnt have access like a single mother looking for quick meal ideas for her gluten-free child or an immunocompromised person in need of instant advice in between doctors appointments. Personally, Id use it to ask about every unusual headache, rather than relying on WebMDs often alarming diagnoses.

But one doesnt have to think hard to come up with reasons to be cautious: sharing your health data with anyone other than a primary care doctor could result in a leak of that information. Then theres the potential for the bot to provide dangerous or even fatal misinformation as well as the risk that quality care could be reduced to quick and flawed responses without human oversight.

The bot is still in its early stages, adopting an Atomic Habits approach. Its goal is to gently encourage small changes in five key areas of your life: sleep, nutrition, fitness, stress management, and social connection. By making minor adjustments, such as suggesting a 10-minute walk after picking up your child from school, Thrive AI Health aims to positively impact people with chronic conditions like heart disease. It doesnt claim to be ready to provide real diagnosis like a doctor would but instead aims to guide users into a healthier lifestyle.

AI is already greatly accelerating the rate of scientific progress in medicine offering breakthroughs in drug development, diagnoses, and increasing the rate of scientific progress around diseases like cancer, the op-ed read.

Advancing the medical system with AI could be tremendously beneficial for society, provided it actually works. While a bot that tells you to get more sleep isnt exactly on par with AI miracle cures, there has been some promising AI progress in the health sector, such as a study suggesting that a radiologist supported by a specialized AI tool can detect breast cancer from mammogram images as accurately as two radiologists. There are also AI-designed drugs currently in clinical trials, like one to treat fibrosis, and a team of M.I.T researchers used AI in 2020 to discover an antibiotic capable of killingE. coli.

For Altman and Huffington, the challenge will be building trust for a product that handles some of your most private information while navigating the limits of AIs power.

View original post here:

OpenAI and Arianna Huffington are working together on an AI health coach - The Verge

Read More..

Can you do better than top-level AI models on these basic vision tests? – Ars Technica

Enlarge / Whatever you do, don't ask the AI how many horizontal lines are in this image.

Getty Images

In the provocatively titled pre-print paper "Vision language models are blind" (which has a PDF version that includes a dark sunglasses emoji in the title), researchers from Auburn University and the University of Alberta create eight simple visual acuity tests with objectively correct answers. These range from identifying how often two colored lines intersect to identifying which letter in a long word has been circled to counting how many nested shapes exist in an image (representative examples and results can be viewed on the research team's webpage).

If you can solve these kinds of puzzles, you may have better visual reasoning than state-of-the-art AIs.

The puzzles on the right are like something out of Highlights magazine.

A representative sample shows AI models failing at a task that most human children would find trivial.

Crucially, these tests are generated by custom code and don't rely on pre-existing images or tests that could be found on the public Internet, thereby "minimiz[ing] the chance that VLMs can solve by memorization," according to the researchers. The tests also "require minimal to zero world knowledge" beyond basic 2D shapes, making it difficult for the answer to be inferred from "textual question and choices alone" (which has been identified as an issue for some other visual AI benchmarks).

After running multiple tests across four different visual modelsGPT-4o, Gemini-1.5 Pro, Sonnet-3, and Sonnet-3.5the researchers found all four fell well short of the 100 percent accuracy you might expect for such simple visual analysis tasks (and which most sighted humans would have little trouble achieving). But the size of the AI underperformance varied greatly depending on the specific task. When asked to count the number of rows and columns in a blank grid, for instance, the best-performing model only gave an accurate answer less than 60 percent of the time. On the other hand, Gemini-1.5 Pro hit nearly 93 percent accuracy in identifying circled letters, approaching human-level performance.

For some reason, the models tend to incorrectly guess the "o" is circled a lot more often than all the other letters in this test.

The models performed perfectly in counting five interlocking circles, a pattern they might be familiar with from common images of the Olympic rings.

Do you have an easier time counting columns than rows in a grid? If so, you probably aren't an AI.

Even small changes to the tasks could also lead to huge changes in results. While all four tested models were able to correctly identify five overlapping hollow circles, the accuracy across all models dropped to well below 50 percent when six to nine circles were involved. The researchers hypothesize that this "suggests that VLMs are biased towards the well-known Olympic logo, which has 5 circles." In other cases, models occasionally hallucinated nonsensical answers, such as guessing "9," "n", or "" as the circled letter in the word "Subdermatoglyphic."

These gaps in VLM capabilities could come down to the inability of these systems to generalize beyond the kinds of content they are explicitly trained on. Yet when the researchers tried fine-tuning a model using specific images drawn from one of their tasks (the "are two circles touching?" test), that model showed only modest improvement, from 17 percent accuracy up to around 37 percent. "The loss values for all these experiments were very close to zero, indicating that the model overfits the training set but fails to generalize," the researchers write.

The researchers propose that the VLM capability gap may be related to the so-called "late fusion" of vision encoders onto pre-trained large language models. An "early fusion" training approach that integrates visual encoding alongside language training could lead to better results on these low-level tasks, the researchers suggest (without providing any sort of analysis of this question).

Read this article:

Can you do better than top-level AI models on these basic vision tests? - Ars Technica

Read More..