Skip to main content

Microsoft introduces AI model that can understand image content, pass IQ tests

posted onMarch 2, 2023
by l33tdawg
Arstechnica
Credit: Arstechnica

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. The researchers believe multimodal AI—which integrates different modes of input such as text, audio, images, and video—is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human.

"Being a basic part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and grounding to the real world," the researchers write in their academic paper, "Language Is Not All You Need: Aligning Perception with Language Models."

Visual examples from the Kosmos-1 paper show the model analyzing images and answering questions about them, reading text from an image, writing captions for images, and taking a visual IQ test with 22–26 percent accuracy (more on that below).
 

Source

Tags

Industry News

You May Also Like

Recent News

Friday, November 29th

Tuesday, November 19th

Friday, November 8th

Friday, November 1st

Tuesday, July 9th

Wednesday, July 3rd

Friday, June 28th

Thursday, June 27th

Thursday, June 13th

Wednesday, June 12th

Tuesday, June 11th