May 28, 2024

xAI presented the first multimodal version of Grok-1.5V.

DALL-E 2 integrated into the program for compiling photofits
DALL-E 2 integrated into the program for compiling photofits

Elon Musk’s company xAI presented a new model of the chatbot Grok, capable of processing requests in various formats.

The presentation took place a few weeks after the release of the previous version.

“Grok-1.5V competes with existing multimodal models in several areas: from interdisciplinary reasoning to understanding scientific diagrams, graphs, and screenshots,” the blog says.

In the press release, the developers provided several examples demonstrating the new capabilities of the chatbot:

  • converting a sketch of a block diagram into Python code;
  • generating a bedtime story from a child’s drawing;
  • explaining memes;
  • converting a table into a CSV file format.
Startup xAI presented the first multimodal version of Grok-1.5VStartup xAI presented the first multimodal version of Grok-1.5V
Example of converting a sketch of a diagram into Python code. Data: xAI.

Having tested competitors such as GPT-4V, Claude 3Sonnet, Claude 3 Opus, and Gemini Pro 1.5, xAI claims that its multimodal model is leading in many parameters.

Startup xAI presented the first multimodal version of Grok-1.5VStartup xAI presented the first multimodal version of Grok-1.5V
Comparison of AI models. Data: xAI.

The company representatives emphasized that Grok-1.5V outperforms its competitors in the RealWorldQA benchmark – a new metric created to evaluate spatial understanding of the real world.

Startup xAI presented the first multimodal version of Grok-1.5VStartup xAI presented the first multimodal version of Grok-1.5V
Examples of RealWorldQA passage. Data: xAI.

To pass the test, the AI model was trained on more than 700 images, each accompanied by a question and answer for each element. xAI released RealWorldQA under a Creative Commons license.

Grok-1.5V was released less than a month after xAI published the open-source code of the model.

According to the developers, “significant” updates will be made to the chatbot’s capabilities for understanding and generating multimodal signals in the coming months.

Early testers and current users will have access to Grok-1.5V soon.

Recall that in December 2023, xAI notified the SEC of plans to raise $1 billion through a private sale of equity securities.

Follow ForkLog on social networks

Found a mistake in the text? Highlight it and press CTRL+ENTER

ForkLog newsletters: stay up to date with the Bitcoin industry!

Leave a Reply

Your email address will not be published. Required fields are marked *