Speech Synthesis Markup Language (SSML)
SSML is an XML-based markup language for controlling pitch, rate, pauses, emphasis, and emotion in synthesized speech. Wrap your content in a <speak> tag:
Escaping Characters
Transforming text into SSML requires escaping certain characters to ensure correct interpretation:
Supported SSML Tags
prosody
The prosody tag controls the expressiveness of synthesized speech by manipulating pitch, rate, and volume.
Parameters
Adjusts the pitch of speech delivery.
Values:
x-low,low,medium(default),high,x-high- Percentage adjustments:
-83%to+100%(e.g.,+20%,-30%)
Alters speech speed.
Values:
x-slow,slow,medium(default),fast,x-fast- Percentage adjustments:
-50%to+9900%(e.g.,+20%,-30%)
Controls speech loudness.
Values:
silent,x-soft,medium(default),loud,x-loud- Decibel adjustments: Number with
dBsuffix (e.g.,-6dB) - Percentage adjustments (e.g.,
+20%,-30%)
break
The break tag controls pausing between words, following W3 specifications.
Parameters
Specifies pause strength.
Values:
none: 0msx-weak: 250msweak: 500msmedium: 750msstrong: 1000msx-strong: 1250ms
Specifies pause duration (0-10 seconds).
Values:
- Milliseconds:
mssuffix (e.g.,100ms) - Seconds:
ssuffix (e.g.,1s)
emphasis
The emphasis tag adds or removes emphasis from text, modifying speech similarly to prosody but without setting individual attributes.
Parameters
Specifies emphasis level.
Values:
reducedmoderatestrong
sub
The sub tag replaces pronunciation for contained text, following W3 specifications.
Parameters
Specifies text to be spoken instead of enclosed text.
speechify:style
The speechify:style tag controls emotion of the voice. See Emotion Control for the full list of 13 supported emotions and best practices.
Parameters
Sets the voice emotion. Values: angry, cheerful, sad, terrified, relaxed, fearful, surprised, calm, assertive, energetic, warm, direct, bright.