• Horizon AI
  • Posts
  • Grok Vision - First Multimodal Model from xAI 🔥

Grok Vision - First Multimodal Model from xAI 🔥

+ Transcribe videos and audio in seconds

Welcome to another edition of Horizon AI,

Elon Musk’s xAI has introduced its first multimodal model with improved ‘understanding of the physical world’. Meanwhile, OpenAI opens its first Asian office in Tokyo, Japan as the company aims to expand its global presence.

Let's jump into it!

Read Time: 3.5 min

Here's what's new today in the Horizon AI

  • A Chart to Kickstart Your Day 📈: What do Americans use Generative AI products/services for?

  • Elon Musk's xAI Previews Grok-1.5V, Its First Multimodal AI Model 🔥

  • OpenAI Launches Japanese-Optimized GPT-4 Turbo

  • AI Tutorial: Transcribe videos and audio in seconds using Whisper Web

  • AI Tools to check out

  • The Latest in AI and Tech 💡

A chart to kickstart your day

What do Americans use Generative AI products/services for?

  • Searching for information is the most common use case for AI. Aside from that, there doesn't seem to be any consensus use cases for AI.

AI News

XAI

Elon Musk's xAI Previews Grok-1.5V, Its First Multimodal AI Model 🔥

Source: xAI

Elon Musk's xAI has taken a significant step forward with the introduction of its first multimodal model, Grok-1.5V. It not only comprehends text but also can process and understand a variety of visual inputs, including documents, diagrams, charts, screenshots, and photographs.

Details:

  • xAI claims that Grok-1.5V is competitive with today's best multimodal models in several areas, from multidisciplinary reasoning to understanding documents, scientific diagrams, graphics, screenshots, and photos.

  • The model performs particularly well on the new RealWorldQA benchmark, which measures spatial understanding of the real world, surpassing both GPT-4 and Claude 3, as well as Gemini.

  • The company provided examples demonstrating Grok-1.5V's proficiency in processing and comprehending various forms of visual information. For instance, here's Grok explaining a meme:

Source: xAI

Grok-1.5 Vision will soon be available to early testers and existing Grok users.

OPENAI

OpenAI Launches Japanese-Optimized GPT-4 Turbo

Source: OpenAI

OpenAI is opening its first office in Asia, located in Tokyo, Japan. Alongside this geographical expansion, the company is also launching a GPT-4 model specifically optimized for the Japanese language.

Details:

  • OpenAI aims to work closely with the Japanese government, local companies, and research institutions to develop secure AI tools that cater to Japan's unique needs and requirements.

  • Tadao Nagasaki will serve as the president of OpenAI Japan, leading the company's efforts in the region.

  • As an initial step, OpenAI is providing Japanese companies with a custom GPT-4 model that offers improved performance in translating and summarizing Japanese text, is cost effective, and operates up to 3x faster than its predecessor.

Currently, OpenAI is granting early access to the custom GPT-4 model to select local businesses, with access expected to gradually expand through the OpenAI API in the coming months.

AI Tutorial

Transcribe videos and audio in seconds using Whisper Web

You can do this without signing up, installing, or paying anything, just follow these easy steps:

  1. Go to Whisper Web

  2. Select the source

  • From URL: Copy and paste the url of your video

  • From file: Upload your file from your device

  • Record: Record audio using your microphone and transcribe in real-time

  1. Click on ‘Transcribe audio’

  1. Check the text transcript and export it to a TXT file or a JSON file.

AI Tools to check out

🗣 LangAI: Leverage AI to become fluent in a new language.

🌐 Wonders: AI search engine that helps you learn and answer questions directly from credible scientific sources.

🚀 Persana: Supercharge your prospecting with AI.

🩺 Sequel: AI longevity assistant designed to provide a comprehensive view of your well-being.

🤖 Laterbase: Save, search and chat with your bookmarks using AI.

The latest in AI and Tech

OpenAI's latest GPT-4 model has once again taken the top spot in the chatbot arena, surpassing Anthropic's Claude 3 Opus. Updated GPT-4 shows superior performance in areas like coding and reasoning; however, Claude 3 Opus still maintains an edge when handling longer inputs exceeding 500 tokens.

In his annual letter to shareholders, Amazon CEO Andy Jassy said generative AI could be the biggest technological change since cloud computing, and perhaps even since the internet.

Ideogram's AI image generator has been updated to version 1.0, introducing several enhancements. The updates include improved text rendering, higher-quality photorealism, and new features like image description, negative prompts, and greater control over rendering quality and speed.

Meta is testing an AI-powered search bar that allows users to chat with Meta's AI assistant and discover content. Search queries trigger a conversation with the AI in Instagram's direct message area, enabling users to ask questions or leverage pre-built prompts.

A recent report revealed that Adobe's AI image generator Firefly included AI-generated images from competitors like Midjourney in its training data. Adobe said a relatively small amount – about 5% – of the images used to train its AI tool was generated by other AI platforms.

That’s a wrap!

Thanks for sticking with us to the end! Let’s stay connected on LinkedIn and Twitter.

We'd love to hear your thoughts on today's email!

Your feedback helps us improve our content

Login or Subscribe to participate in polls.

Not subscribed yet? Sign up here and send it to a colleague or friend!

See you in our next edition!

Gina 👩🏻‍💻