Kyutai’s Moshi AI vs. ChatGPT: New AI Chatbot with Real-Time Voice Interaction and Offline Capabilities

French AI company Kyutai released its new chatbot that understands the user voice’s tone, interrupts and responds faster than the ChatGPT’s now-delayed ‘Advanced Voice Mode’ GPT-4o.

Kyutai Labs launched its Moshi AI with voice recognition and multiple speaking styles

Kyutai Labs launched its Moshi AI with voice recognition and multiple speaking styles
Kyutai Labs launched its Moshi AI with voice recognition and multiple speaking styles

On 3rd July Kyutai Labs launched their new AI-powered chatbot Moshi AI, which can respond verbally in real time. This French AI company announced their new chatbot’s entire audio language model was developed in-house. Currently, it is available to all users at no cost and the model’s conversation has been restricted to five minutes.

Using the 7B parameter large language model (LLM) called Helium, the Moshi AI can speak in various accents, and 70 different emotional and speaking styles. It can handle two audio streams simultaneously which means the AI can listen and talk at the same time.

Moshi integrates text and audio training and optimizes for multiple backends (CUDA, Metal, CPU) with support for with support for 4-bit and 8-bit quantization.

The company also said that they developed this AI with a team of eight people in six months.in the unveiling event of the AI model in Paris, the Kyutai labs said that it is not just an AI assistant but it is a prototype that can be used to develop various tools for different use cases.

The response time of Moshi AI is just 200 milliseconds which makes Mosh AI faster than GPT-4o’s Advanced Voice Mode, it is typically ranges between 232 to 320 milliseconds. Moshi AI is named after the Japanese way of answering phone calls.

The company mentioned that to enhance the voice quality of its AI, it even collaborated with professional voice artists. Using text-to-speech technology, the Moshi AI is trained on 1,00,000 synthetic dialogues.

Kyutai Labs also working on the development of an AI-powered audio identification, watermarking, and signature tracking system, in the future, these features will also be integrated with Moshi there is no doubt. Even though Moshi is not a competitor to Chat Gpt it is a big step towards the development of open-sourced models that can run offline.

Kyutai Labs chief Kyutai chief Patrick Pérez said that Moshi can revolutionize human-machine communication, Moshi thinks while it talks. The company also plans to release the full model which includes, the inference codebase, the audio code, the 7B model, and the optimized stack.

The company was founded in November 2023 with €300 million investment from various investors which includes French billionaire Xavier Niel. The main aim of the start-up is to contribute to open research in AI and foster an ecosystem.

It also challenges major companies like Open AI which recently faced criticisms for their delay in release due to safety concerns. Mainly Open AI is withholding the release of their video generation platform Sora, the voice mode feature of GPT 4o, and the voice engine.

Potential Applications of Moshi

  • It can be used as a Virtual Assistant that can provide conversational assistance in various tasks.
  • Companies can use moshi AI to enhance their customer service
  • Content creators can use this to generate audio content with different emotions and styles.
  • It can be used for the visually impaired individual because it will improve their accessibility to the internet since it is based on voice-based interactions. 

Moshi AI features

  • Offers real-time natural and responsive conversations.
  • Simple and user-friendly interface
  • With low latency time, AI responds very quickly
  • It thinks while it speaks
  • Users can access this on their laptop or mobile, which offers multiple device capabilities.
  • AI can understand the tone of the user respond in an emotive voice, and speak in different styles and voice modulations.
  • It is open-source, so anyone can download and install it locally and run it on an unconnected device.
  • Users can check and adjust the loudness of their voices when they speak and also display the responses of AI in separate text boxes.
  • Users can also access the other technical details like latency, missed calls, and latency of their call near the top of the display.
  • It is also connected to the internet, it can fetch answers for the queries that need web.
  • Currently, the chatbot doesn’t allow text prompts so users can use their voice as a medium to interact with the AI.
  • It supports multiple backends (CUDA, Metal, CPU)
  • Watermarking to identify AI-generated audio [development in progress]

Note: currently few users have experienced lag in the response time that can exceed 10-15 seconds, it may be due to server load. Some users also complained that verbal prompts were not registering even after raising the volume meter to three-fourths.

Also, read our other article YouTube’s New AI Song Eraser Tool 2024: Know How to Remove Copyrighted Music from Your Videos?

The company has made the chatbot publicly available for free, users can access this here. Users can enter the email and join the queue.