Chatterbox Egyptian Arabic (Masri) TTS

Model Summary

Chatterbox Egyptian Arabic (Masri) TTS is a Text-to-Speech model built on top of the Chatterbox Multilingual TTS architecture and configured to generate Egyptian Arabic (Masri) speech.

The model supports:

Egyptian Arabic text input (language_id = "ar")
Natural prosody and conversational tone
Optional reference audio prompting for speaker/style transfer

This repository contains the model checkpoints and assets required for inference.

Supported Language

Arabic (ar)
- Intended usage: Egyptian Arabic (Masri) text
- Not optimized for Modern Standard Arabic (MSA) pronunciation

Intended Use

Primary Use Cases

Egyptian Arabic voice synthesis
Conversational agents and assistants
Prototyping Arabic voice UX
Content creation (narration, demos, accessibility)
Research and experimentation in Arabic TTS

Out-of-Scope Uses

Voice impersonation without consent
Identity spoofing or deceptive content
Legal, medical, or emergency-critical systems
Guaranteed accent purity across all Arabic dialects

Inference Behavior

Input

Text in Egyptian Arabic
Maximum recommended length: ~300 characters

Optional Reference Audio

A short reference clip may be provided to influence:
- Speaker identity
- Voice style
- Prosody

If the reference audio is not Egyptian Arabic, accent leakage may occur.

Example Usage

import numpy as np
from huggingface_hub import snapshot_download
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

# Select device
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Download model checkpoint
ckpt_dir = snapshot_download(
    repo_id="oddadmix/chatterbox-egyptian-v0",
    repo_type="model",
    revision="main",
)

# Load model
model = ChatterboxMultilingualTTS.from_checkpoint(
    str(ckpt_dir) + "/",
    DEVICE
)

# Optional: move to device explicitly
if hasattr(model, "to"):
    model.to(DEVICE)

# Egyptian Arabic (Masri) text
text = "أنا رايح الشغل دلوقتي وهكلمك أول ما أوصل."

# Generate speech
wav = model.generate(
    text=text,
    language_id="ar",
    temperature=0.8,
    cfg_weight=0.5,
    exaggeration=0.5,
)

# Save output audio
import soundfile as sf
sf.write(
    "egyptian_tts.wav",
    wav.squeeze(0).cpu().numpy(),
    model.sr
)

print("Audio saved as egyptian_tts.wav")

Limitations

Accent transfer from reference audio can override dialect
Long-form synthesis may lose prosodic consistency
Not fine-tuned exclusively on Egyptian-only corpora
No speaker identity guarantees
Dialectal spelling variations affect pronunciation

Ethical Considerations

This model can generate realistic human-like speech.
Users must:

Disclose synthetic audio where appropriate
Obtain consent for reference voices
Avoid misuse for impersonation or deception

Citation

If you use this model in research or demos, please cite:

@misc{chatterbox_egyptian_tts,
  title={Chatterbox Egyptian Arabic (Masri) Text-to-Speech},
  author={oddadmix},
  year={2025},
  howpublished={\url{https://huggingface.co/oddadmix/chatterbox-egyptian-v0}}
}

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for oddadmix/chatterbox-egyptian-v0

Base model

ResembleAI/chatterbox

Finetuned

(23)

this model

oddadmix
/

chatterbox-egyptian-v0