Voice cloning technology, often synonymous with AI, has been making waves in the startup ecosystem, with companies like ElevenLabs already raising funds to develop proprietary algorithms and AI software for crafting voice clones. However, a novel solution named OpenVoice has emerged, a collaborative effort between researchers at the Massachusetts Institute of Technology (MIT), Tsinghua University in Beijing, China, and Canadian AI startup MyShell. OpenVoice distinguishes itself by offering nearly instantaneous, open-source voice cloning with unprecedented granular controls, setting it apart from other platforms in the field.
Harnessing OpenVoice: A User-Friendly Approach
VentureBeat conducted tests of OpenVoice, an innovative voice cloning with AI, on HuggingFace, demonstrating its impressive capabilities. Users can generate convincing voice clones rapidly, within seconds, using completely random speech. Unlike other platforms, OpenVoice doesn’t require users to read specific texts for cloning. Instead, users can speak spontaneously for a few seconds, with the model generating a voice clone nearly immediately. Users can also adjust styles, including cheerful, sad, friendly, and angry. This offers a diverse range of emotional tones.

The OpenVoice team’s scientific paper outlines the platform’s creation, involving two distinct AI models: a text-to-speech (TTS) model and a tone convertor, showcasing the remarkable capabilities of voice cloning with AI. The TTS model controls style parameters and languages and was trained on 30,000 sentences from English, Chinese, and Japanese speakers. The tone convertor model draws from 300,000 audio samples across various speakers. By converting human speech into phonemes and representing them as vector embeddings, OpenVoice, through the innovative approach of voice cloning with AI, effectively reproduces users’ voices and alters their tone, colour, or emotional expression.
MyShell, which is the driving force behind OpenVoice, shared insights into its capabilities, allowing users to achieve precise voice cloning with AI. The platform provides granular control over various elements, including tone, emotion, accent, rhythm, pauses, and intonation, all achievable with just a small audio clip. The company emphasised its commitment to benefiting the entire research community, making OpenVoice the first step in its broader initiative. Zengyi Qin, one of the lead researchers from MIT and MyShell, expressed MyShell’s vision as “AI for All,” outlining plans to provide grants, datasets, and computing power to support open-source research in the future.
The Simple Elegance Behind OpenVoice

The OpenVoice team emphasises the conceptually simple yet effective nature of their approach. The platform enables flexible instant voice cloning with precise control over styles, emotions, accents, and adaptability to any language. The team’s lead researcher, Zengyi Qin, highlighted their dedication to developing the most flexible instant voice cloning model, which proved challenging in the past.
The elegant decoupling of complex tasks into manageable subtasks allowed OpenVoice to achieve what seemed overly difficult as a whole. Explore the revolutionary capabilities of OpenVoice for voice cloning with AI, offering unparalleled flexibility and control.
Other Notable Mentions: ElevenLabs Elevating Voice Cloning to Unprecedented Realism
OpenVoice isn’t the only one making headlines when it comes to voice cloning with AI. As mentioned before, ElevenLabs has also pioneered a proprietary AI model renowned for its ability to replicate human intonation and inflexions with unparalleled fidelity, dynamically adjusting the delivery based on contextual cues. Positioned as the market’s most advanced and realistic voice cloning AI, the company can clone any voice from just a few minutes of audio. Recent announcements reveal significant funding secured by ElevenLabs from prominent investors in the AI and media industries. This marks a strategic move to further enhance its voice cloning with AI technology.
In a bid to advance its capabilities, ElevenLabs plans to use the raised funds to expand its team, scale its platform, and introduce new products and features. Among the innovations in progress are services like Instant Voice Cloning, which enables users to clone their voices with minimal audio input for generating speech across 29 supported languages. The company is also developing professional voice cloning, offering top-tier voice clones virtually indistinguishable from real voices and suitable for professional applications such as videos, audiobooks, podcasts, and video games.
Additionally, ElevenLabs is working on Text to Speech, a service that converts text to speech using natural AI voices customised with various styles, emotions, and languages. The lineup also includes AI chatbots, designed to create chatbots with human-like voices for integration into diverse platforms and applications. With a vision to become the foremost provider of voice cloning with AI solutions, ElevenLabs seeks to democratise access to high-quality and personalised voice content. The company envisions voice cloning as a transformative force poised to revolutionise the dynamics of communication, content creation, and audio consumption.
A Little About MyShell: The Driving Force Behind OpenVoice
Founded in 2020 in Calgary, Alberta, MyShell has already made significant strides in the world of voice cloning with AI, securing a $5.6 million seed round led by INCE Capital and additional investments from Folius Ventures, Hashkey Capital, SevenX Ventures, TSVC, and OP Crypto. With over 400,000 users, MyShell positions itself as a decentralised platform for discovering, creating, and staking AI-native apps. While OpenVoice remains open source, MyShell generates revenue through monthly subscriptions for its web app users, third-party bot creators promoting products within the app, and AI training data fees.

At MyShell, users can harness the power of generative AI technologies, including large language models (LLM), stable diffusion, and voice cloning. These advancements empower individuals to become super creators, allowing them to leverage generative AI capabilities for constructing sophisticated AI-native apps without the need for extensive coding skills. MyShell envisions a future where creators can achieve such feats within minutes, eliminating the traditional coding barriers.
Read More: Brand New ChatGPT Voice: Revolutionising Interactive AI Creative Experiences
MyShell’s Innovative Approach to Creator Economy Challenges in the Future
MyShell stands out in the creator economy landscape by addressing issues of centralisation, misaligned incentives, and concerns over user data privacy prevalent in the current AI landscape.
OpenVoice’s introduction marks a significant milestone in the voice cloning world, disrupting traditional proprietary approaches. MyShell’s commitment to fostering open-source research aligns with its overarching mission of making AI accessible to all.
Author Profile
Latest entries
GAMING2024.06.12Top 4 Female Tekken 8 Fighters to Obliterate Your Opponents in Style!
NEWS2024.03.18Elon Musk’s SpaceX Ventures into National Security to Empower Spy Satellite Network for U.S.
GAMING2024.03.17PS Plus: 7 New Games for March and Beyond
GAMING2024.03.17Last Epoch Necromancer Builds: All You Need To Know About It
								



