Voice overs: Quality Control Standards – Bunny Studio Help Center

The Quality Control team has a set of questions they ask when reviewing any deliverable submitted to the platform. Keeping them in mind when producing your deliverable will help avoid revision requests and rejections.

The revision process breaks down the deliverables by element, ensuring every aspect is reviewed. These are the questions asked for each element:

Project Requirements:

Does the voice match the age, gender, accent, and language selected by the client?

Project Instructions:

Did the Pro follow the project instructions provided in the remarks section?
Did the Pro use the reference of the sample or the external material attached by the client?
Did the Pro read the script exactly as provided?
If the Pro submitted multiple takes, are they complete and organized?

Technical Requirements:

Is there room echo?
Does the voice sound at a proper distance from the mic?
Is there distortion/clipping?
Is the volume normalized around -3dB Peak?
Are there any editing issues (i.e. clicks, pops, audible cuts)?
Is there the right amount of silence at the start and/or end (0.5 secs is suggested)?
Does the audio sound heavily processed (i.e. noise reduction plugin, EQ)?
Is there heavy compression and/or limiting?
Does the voice sound as if it was recorded with professional equipment?
Are there loud, undesired, or distracting breath noises?
Are there loud, undesired, or distracting mouth clicks/mouth noises?
Are there plosives?
Is there hiss/white noise?
Is there electrical noise/hum?
Are there any background noises (i.e. cars, mouse clicks, fans, pages, people, etc.)?
Are there any sibilance issues?

Performance Requirements:

Does the voice sound monotone/flat?
Does the voice sound robotic or computer-generated?
Are there pronunciation issues?
Are there mumbled or unclear words?
Does the voice sound nasal or raspy? (the voice shouldn't sound nasal or raspy unless specifically requested by the client)

Other requirements:

Did the Pro include any contact details, or content not relevant to the project?
If the project is marked with the syncing option: Is the deliverable synced as instructed by the client?
If the audio has background music: is the voice easy to understand?
Does the sample convey a complete idea? (I.e. the sentences are complete, and the demo is clear)
If the sample has multiple voices: is it clear which is the Pro's voice?
If the sample has background music, is the voice easy to understand? Is the mix balanced?
Is the sample well-labeled by category and other relevant attributes (gender, age, purpose, language, accent)?

Required formatting for audio files

.wav
Mono
44.1 kHz sample frequency
16-bit depth
Normalized to -3dB peak

Quality Control Terms & Definitions

Pro: All the accepted writers, designers, creatives, and artists that craft top-notch deliverables within the Bunny Studio platform.
Deliverable: The final product that Pros submit to the system (article, voiceover, logo, etc.)
Brief: A section of the project form that clients fill up to describe details of what they need. It includes information on how the client's deliverable should be crafted and what the content should communicate.
.wav: A .wav is a type of audio format that is required by Bunny Studio Voice. Audio data is saved in an uncompressed PCM format. We require that you record your .wav in mono, 44.1 kHz, 16-bit. Find out more about file formats here.
Pop filter: This is a piece of foam or a "windscreen" that attaches to your microphone. It prevents plosive sound by reducing the amount of air that hits the microphone when you speak. These are very inexpensive and can be purchased at any music or online store.
De-esser: A special process that is sensitively tuned to sounds with high frequencies such as the sound produced by the letter “s”, hence the name de-esser.
Normalization: The application of a constant amount of gain to an audio recording to bring the average or peak amplitude to a target level (the norm). We ask that your audio is normalized to -3db.
Loudness Integrated Value (LUFS): This is a measure of the average loudness of an audio track over its entire duration, expressed in LUFS (Loudness Units Full Scale). It represents how loud a piece of audio sounds to the human ear, averaged over time.
True Peak value: This is the highest level of an audio signal, accounting for potential inter-sample peaks that occur during digital-to-analog conversion. It's measured in dBTP (decibels True Peak) and helps ensure the audio doesn't distort when played back on different devices.
Noise floor: The measurement of signals created by all noise sources and unwanted signals. Noise is defined here, as any signal other than the one being monitored.
Over-modulated sound: This occurs when a signal - be it from an acoustic source, such as sound recorded into a microphone, or an electronic signal passing through a console - is too strong for its intended target to handle. This results in audio that sounds distorted. If your read is rejected because of this, consider the possibility that your mic gain might be too loud, or that the file recording may be at a level too high. Record your audio between -3db and 0db for the best results.
Clip (audio): This is a type of distortion that occurs at the loudest points of an audio file. Sometimes it results in a pop, click, or cut at that point in the audio. Ensure you are recording and normalizing around -3db.
Plosive: This is a popping sound created by air hitting the microphone when certain sounds are produced, such as the "p," "b," "t," and other "hard consonant" sounds. Adding a pop filter to your microphone, as well as improving your technique on using it, can reduce this effect. Bunny Studio Voice will not accept reads or samples containing these loud sounds.
Room Echo: This is the sound of a voice bouncing off walls, ceilings, or other hard surfaces. The sounds are reverted back into the microphone, creating an echo effect that sounds like the person is speaking in a large room or a bathroom. (The actual size of the room doesn't affect this).
Sibilance: High-frequency hissing sound. Most commonly presented when speaking in the consonants 's', 'sh', 'ch', 'z', 'zh', and others similar.
Robotic: A read that sounds flat, or one that doesn't convey any emotion. Robotic reads are the main reason why our clients reject projects.
AI-Generated Voice: A voice that is artificially created using software. These voices may sound flat, lack emotional nuance, or have unnatural timing and pronunciation. It's important to detect these as they do not meet Bunny Studio's quality standards.
Over-processing: Excessive use of audio enhancement tools like noise reduction, EQ, or compression, which can lead to a loss of natural sound quality.

Related articles