Text-to-Speech Using the 'Chatterbox' Engine


[Up] [Top]

Documentation for package ‘chatterbox’ version 0.2.1

Help Pages

A B C D E F G H I L M N P Q R S T U V W

-- A --

apply_llama3_rope_scaling Apply Llama3-style RoPE scaling
apply_rotary_emb_s3 Apply rotary position embeddings
apply_rotary_pos_emb Apply rotary position embeddings to Q and K
attention_block Attention block for perceiver

-- B --

basic_res_block Basic residual block for FCM
basic_transformer_block Basic transformer block

-- C --

campplus CAMPPlus speaker encoder
cam_dense_tdnn_block CAM Dense TDNN Block (multiple layers with dense connections)
cam_dense_tdnn_layer CAM Dense TDNN Layer
cam_layer CAM (Context-Aware Masking) Layer
causal_block1d Causal Block 1D - CausalConv + LayerNorm + Mish
causal_cfm Causal Conditional Flow Matching
causal_conv1d Causal Conv1d - pads left only
causal_masked_diff_xvec Causal Masked Diff with Xvector
causal_resnet_block1d Causal ResNet Block 1D
cfm_attention Self-attention for transformer block
cfm_estimator CFM Estimator (ConditionalDecoder)
chatterbox Create (and load) a Chatterbox TTS model
chatterbox_gc_options Recommended torch garbage-collection settings for chatterbox
compute_rope_frequencies Compute rotary position embeddings frequencies
compute_ve_mel Compute mel spectrogram for voice encoder
conformer_encoder_layer Conformer Encoder Layer
conv_rnn_f0_predictor Convolutional RNN F0 Predictor
create_kv_cache Create pre-allocated KV cache
create_mel_filterbank Create mel filterbank
create_voice_embedding Create voice embedding from reference audio

-- D --

dense_layer Dense layer for final embedding
download_chatterbox_models Download Chatterbox Models from HuggingFace
download_chatterbox_turbo_models Download Chatterbox Turbo Models from HuggingFace
drop_invalid_tokens Drop invalid speech tokens

-- E --

espnet_rel_positional_encoding Sinusoidal positional encoding (Espnet RelPositionalEncoding)

-- F --

fcm_module Factorized Convolutional Module (FCM)
feed_forward Feed-forward network for transformer Matches diffusers FeedForward: net = [GELU(proj), Dropout, Linear]
fsmn_multi_head_attention FSMN Multi-Head Attention
fsq_codebook FSQ Codebook module
fsq_vector_quantization FSQ Vector Quantization wrapper

-- G --

gelu_with_proj GELU activation with projection (matches diffusers GELU structure)
generate Generate speech from text
generate_batch Generate speech for several texts with one batched synthesis pass
get_conv_padding Get padding for convolution
get_traced_layers Get or create traced layers for cached inference
gpt2_attention GPT-2 Attention (combined QKV projection)
gpt2_block GPT-2 Transformer Block
gpt2_config GPT-2 Model Configuration
gpt2_layer_norm GPT-2 Layer Normalization
gpt2_mlp GPT-2 MLP (GELU activation)
gpt2_model GPT-2 Model (transformer backbone)

-- H --

hifigan_resblock HiFiGAN Residual Block
hift_generator HiFTNet Generator

-- I --

init_cache_from_first Initialize cache with first token K/V values
integrated_loudness Integrated loudness (ITU-R BS.1770-4)
is_loaded Check if model is loaded

-- L --

learned_position_embeddings Learned position embeddings module
linear_no_subsampling Linear No Subsampling layer
llama_attention Llama attention module
llama_config_520m Create Llama 520M configuration
llama_decoder_layer Llama decoder layer
llama_mlp Llama MLP module
llama_model Llama model (decoder only)
llama_rms_norm RMS Normalization module
load_chatterbox Load Chatterbox model weights
load_chatterbox_turbo Load Chatterbox Turbo model weights
load_conformer_encoder_weights Load Conformer Encoder weights
load_llama_weights Load weights from safetensors into Llama model
load_t3_turbo_weights Load T3 turbo weights from safetensors
load_t3_weights Load T3 weights from safetensors
load_tokenizer Load tokenizer from JSON file (internal)
load_voice_embedding Load a voice embedding from disk
load_voice_encoder_weights Load voice encoder weights from safetensors

-- M --

make_non_pad_mask_s3 Create non-padding mask
make_pad_mask Create padding mask
mask_to_bias Convert mask to attention bias
mish_activation Mish activation
models_available Check if Models are Downloaded

-- N --

normalize_loudness Normalize audio to a target loudness
normalize_tts_text Normalize text for TTS

-- P --

pad_audio_for_tokenizer Pad audio to multiple of token rate
perceiver_resampler Perceiver resampler for conditioning compression
positionwise_feedforward Positionwise Feed Forward
precompute_freqs_cis Precompute rotary position embedding frequencies
pre_lookahead_layer Pre-Lookahead Layer
print.chatterbox Print method for chatterbox
print.chatterbox_gc_options Print method for chatterbox_gc_options
print.voice_embedding Print method for voice_embedding
punc_norm Normalize punctuation for TTS

-- Q --

quick_tts Quick TTS - one-line text-to-speech

-- R --

read_audio Read audio file
reflection_pad1d Reflection padding for 1D (nn_reflection_pad1d equivalent)
rel_position_attention Relative Position Multi-Headed Attention
resample_audio Resample audio
rotate_half Rotate half of the tensor for RoPE

-- S --

s3gen S3Gen Token to Waveform
s3_audio_encoder S3 Audio Encoder V2
s3_log_mel_spectrogram Compute log mel spectrogram for S3Tokenizer
s3_multi_head_attention Multi-Head Attention base module
s3_residual_attention_block Residual attention block
s3_tokenizer S3Tokenizer V2 module
s3_tokenizer_config S3Tokenizer model configuration
save_voice_embedding Save a voice embedding to disk
serve Serve chatterbox over HTTP
sine_gen Sine Generator
sinusoidal_pos_emb Sinusoidal positional embedding for timesteps
snake_activation Snake activation function
source_module_hn_nsf Source Module for Neural Source Filter
statistics_pooling Statistics pooling

-- T --

t3_cond Create T3 conditioning object
t3_cond_enc T3 conditioning encoder
t3_cond_to_device Move T3 conditioning to device
t3_config_english Create T3 configuration (English-only)
t3_config_turbo Create T3 turbo configuration (GPT-2 backbone)
t3_inference_traced T3 inference with JIT tracing (optimized)
t3_model T3 Token-to-Token TTS model
t3_model_turbo T3 Token-to-Token TTS model (Turbo variant with GPT-2 backbone)
tdnn_layer TDNN Layer
timestep_embedding Timestep embedding MLP
tokenize_text Encode text to token IDs using BPE
traceable_attention Traceable attention module with pre-allocated KV cache
traceable_decoder_layer Traceable decoder layer with pre-allocated KV cache
traceable_kv_projector Traceable K/V projection module
traceable_transformer_cached Traceable transformer for cached inference
traceable_transformer_first Traceable transformer for first token (no cache)
transit_layer Transit layer (channel reduction)
transpose_layer Transpose layer for use in sequential
tts_chunked Generate speech for long text (the long-form policy layer)
tts_to_file Generate speech and save to file
turbo_models_available Check if Turbo Models are Downloaded

-- U --

update_kv_cache Update KV cache with new K/V values
update_valid_mask Update valid mask to include new position
upsample_1d Upsample 1D
upsample_conformer_encoder Upsample Conformer Encoder
upsample_conformer_encoder_full Upsample Conformer Encoder

-- V --

voice_convert Convert speech to a target voice
voice_encoder Voice encoder module
voice_encoder_config Voice encoder configuration

-- W --

write_audio Write audio file