Xunfei Zhizuo is a platform developed by Xunfei to provide artificial intelligence content creation services. Its core function is to convert user-entered text into speech, a process often referred to as "AI dubbing" or "speech synthesis". Users can choose from a variety of pre-programmed virtual voices (i.e., "anchors") with different styles, such as news broadcasting, film and TV commentary, or witty humor, to suit different application scenarios.
In addition to pure audio generation, Xunfei Intelligent Work Platform can further combine text and voiceover to generate videos broadcast by "AI digital people". Users only need to provide the text, you can quickly create a virtual anchor oral video, eliminating the need for a real person to appear on camera. The platform integrates the complete process from copywriting to audio to video, providing a tool that can reduce costs and improve efficiency for users who need to mass-produce promotional videos, instructional videos, short videos and other content. It utilizes KU Xunfei's technology in the areas of speech recognition, natural language processing and image generation.
Function List
- AI dubbing: Input text, select favorite anchor voice and background music to generate dubbing quickly. It supports adjusting the speed, tone and volume of speech, and you can insert polyphonic words and the correct pronunciation of English vocabulary.
- Digimon video: Input text, select an AI virtual anchor image, and the system will automatically generate an anchor broadcast video. Support to customize the anchor's image, clothing and background.
- sound reproduction: Users can upload their own voice samples, and the platform is able to clone exclusive voices that sound similar to the user's for subsequent dubbing.
- dub: It supports assigning different AI anchors to dub different paragraphs in the same file, which facilitates the production of audio in the form of dialog.
- Video Templates: Provides a variety of preset video templates, users can quickly apply the generated digital human video to the templates to generate a complete video with graphic wrappers.
- intelligent captioning: Subtitles can be automatically generated and matched to the video content while generating the video.
- AIGC Toolbox: Integrates with other AI authoring tools to assist users in content creation.
Using Help
Xunfei Intelligent Work Platform does not require installation and can be used directly by visiting its official website through a browser. Its main operation is centered around two core functions: "AI Voiceover" and "Digital Human Video".
I. AI dubbing function operation process
The goal of the AI Dubbing feature is to convert text transcripts directly into high-quality audio files.
- Creating a voiceover project
- Visit Xunfei Intelligent Work website, find and click "AI Dubbing" or "Create Now" button in the main interface.
- When you enter the Voiceover Workbench, you will be prompted to create a new project.
- Entering or importing text
- In the text editing area in the center of the workbench, you can directly type or paste the transcript you need to dub.
- If the text is long, you can use the
导入文档
function that supports.txt
,.docx
and other formats.
- Choosing an AI Anchor
- On the right side of the workbench is the "Anchor Selection" panel. There are hundreds of different AI voices, called "anchors", displayed here.
- You can filter anchors by tags such as language (Mandarin, dialect, foreign language), style (e.g. news, advertising, fiction, customer service) or gender.
- Click on an anchor's avatar to audition that anchor's tone. Choose an anchor that you think best matches the style of the manuscript's content.
- Fine tuning
- Multi-Anchor Voiceover: If your piece is in the form of a dialog, you can select a paragraph and assign a specific anchor to it. In this way, a piece of text can be dialogued by more than one "person".
- pause (in speech): Where a pause is needed, click on the toolbar's
插入停顿
button, you can set the duration of the mute from 0.1 to several seconds for a more natural rhythm of the utterance. - Adjustment of pronunciation: For polyphonic characters, the system can usually determine them automatically based on context, but they can also be corrected manually. Select the Chinese character and use the
多音字
function to select the correct pinyin. For numbers or English, it is also possible to select the correct pinyin in the数字/英文
function to set how it is read (e.g., whether it is read as a numeric value or a sequence of numbers). - Adjusting speed/tone of voice: In the right panel you can adjust the overall speed (speech rate) and level (intonation) of the generated speech.
- Add background music
- In the "Background Music" area below, click "Add Music" to select from the platform's music library or upload your own music files.
- The volume level of the background music can be adjusted to make sure it doesn't overpower the vocals.
- Generate and Export
- After completing all the settings, click on the "Start Synthesis" or "Audition" button and the system will quickly generate a short audio clip for preview.
- After confirming that you are satisfied with the result, click "Generate Full Audio". Once generated, you can find the audio in your personal work center, and choose to export it to
.mp3
or other formats.
II. Operation process of digital human video function
This feature adds avatars to the AI dubbing to generate videos directly.
- Selecting the video production mode
- Select "Digital Human Video" or the relevant portal on the homepage of the official website.
- Platforms usually offer two modes: one is to use the platform's preset templates (recommended for newbies), and the other is to create freely.
- Choosing a digital persona and scenario
- To enter the Video Workbench, first select a "digital person" image on the right-hand side. The platform offers a wide range of avatars with different styles, professions and ages.
- Next, choose a background for your digital person. It can be a solid color background, a picture background, or a preset scene such as a studio or office. You can also upload your own picture or video as a background.
- Input Driver Text
- As with the AI voiceover, enter your video copy in the text box. The text here will be used both to generate the voiceover and to drive the mouthpiece of the digitizer.
- At that point, you likewise need to choose an appropriate AI anchor voice for this text. This voice will be the voice of your digital person.
- Arranging Video Screens
- CyberSmartWorks provides a timeline interface similar to video editing software.
- You can add "stickers", "text" and other elements to the screen, and set their appearance and disappearance time.
- If desired, you can also upload your own pictures or video clips and intersperse them with the Digital Man broadcast to enrich the video content.
- Preview and Generation
- After finishing all the editing, click the "Preview" button and the system will render a small preview of the video. Check the accuracy of the digital person's mouth, voice and screen elements.
- After confirming that there are no errors, click "Generate Full Video". Video rendering will take some time depending on the complexity and length of the video.
- Once completed, you can download the final in your personal artwork
.mp4
Video file.
application scenario
- Short video content creation
Individual bloggers or marketing teams can quickly generate a large number of spoken-word videos, such as knowledge science, product introductions, movie commentaries, and so on. Simply prepare the copy and you can replace the real person with an AI digital person, which greatly improves the frequency of content updates. - Corporate Communications and Training
Enterprises can use it to produce internal training materials, policy presentation videos or corporate news for external release. The use of a unified digital persona and voice helps to develop a standardized brand image while reducing the cost of hiring actors and film crews. - Educational courseware production
Teachers or educational institutions can use the platform to quickly convert written lesson plans into audiobooks or teaching videos. It is especially suitable for language learning, historical storytelling and other scenarios, where vivid audio and video formats are more appealing to students than simple text. - Advertisement broadcasting and notification
Shopping malls, subways, online stores and other places can quickly generate audio advertisements or service notifications for promotional activities. The low-cost and high-efficiency advantages of AI dubbing are very obvious when it comes to scenarios that require frequent content changes.
QA
- Is Cyberwisdom free?
The platform offers a free trial amount, which allows users to experience basic dubbing and video generation features. However, the free version is limited in the number of anchors available, the quality and length of exported files, and may come with a watermark. For access to more premium anchors, higher quality audio and video outputs, and longer production durations, a membership is required. - Are the generated sounds and videos commercially available?
It depends on the membership package you purchase. Usually, the audio and video content generated by the paid commercial version of the membership is authorized for commercial use. Commercial use of content generated by the free or personal version may be at risk of copyright, and you need to read the platform's service agreement carefully before using it. - Can I dub in my own voice?
You can. Xunfei Zhizuo provides a "voice replication" function. You need to follow the prompts to record a specified text (usually need dozens to hundreds of sentences), the platform will use these recordings to train an AI voice model that imitates your tone. After that, you can choose this "cloned" voice when dubbing. - Do the mouth and voice of the Digimon video match up exactly?
In most cases, the synchronization of mouth shape and voice is relatively accurate. One of the core technologies of the platform is lip prediction, which will drive the mouth movements of the avatar based on pronunciation. However, for some words spoken too fast or complex combinations of words, there may occasionally be minor deviations, which can be optimized by adjusting the pause and speed of the text.