Chatterbox Egyptian Arabic (Masri) TTS

Model Summary

Chatterbox Egyptian Arabic (Masri) TTS is a Text-to-Speech model built on top of the Chatterbox Multilingual TTS architecture and configured to generate Egyptian Arabic (Masri) speech.

The model supports:

  • Egyptian Arabic text input (language_id = "ar")
  • Natural prosody and conversational tone
  • Optional reference audio prompting for speaker/style transfer

This repository contains the model checkpoints and assets required for inference.


Supported Language

  • Arabic (ar)
    • Intended usage: Egyptian Arabic (Masri) text
    • Not optimized for Modern Standard Arabic (MSA) pronunciation

Intended Use

Primary Use Cases

  • Egyptian Arabic voice synthesis
  • Conversational agents and assistants
  • Prototyping Arabic voice UX
  • Content creation (narration, demos, accessibility)
  • Research and experimentation in Arabic TTS

Out-of-Scope Uses

  • Voice impersonation without consent
  • Identity spoofing or deceptive content
  • Legal, medical, or emergency-critical systems
  • Guaranteed accent purity across all Arabic dialects

Inference Behavior

Input

  • Text in Egyptian Arabic
  • Maximum recommended length: ~300 characters

Optional Reference Audio

  • A short reference clip may be provided to influence:
    • Speaker identity
    • Voice style
    • Prosody

If the reference audio is not Egyptian Arabic, accent leakage may occur.


Example Usage

import numpy as np
from huggingface_hub import snapshot_download
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

# Select device
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Download model checkpoint
ckpt_dir = snapshot_download(
    repo_id="oddadmix/chatterbox-egyptian-v0",
    repo_type="model",
    revision="main",
)

# Load model
model = ChatterboxMultilingualTTS.from_checkpoint(
    str(ckpt_dir) + "/",
    DEVICE
)

# Optional: move to device explicitly
if hasattr(model, "to"):
    model.to(DEVICE)

# Egyptian Arabic (Masri) text
text = "ุฃู†ุง ุฑุงูŠุญ ุงู„ุดุบู„ ุฏู„ูˆู‚ุชูŠ ูˆู‡ูƒู„ู…ูƒ ุฃูˆู„ ู…ุง ุฃูˆุตู„."

# Generate speech
wav = model.generate(
    text=text,
    language_id="ar",
    temperature=0.8,
    cfg_weight=0.5,
    exaggeration=0.5,
)

# Save output audio
import soundfile as sf
sf.write(
    "egyptian_tts.wav",
    wav.squeeze(0).cpu().numpy(),
    model.sr
)

print("Audio saved as egyptian_tts.wav")

Limitations

  • Accent transfer from reference audio can override dialect
  • Long-form synthesis may lose prosodic consistency
  • Not fine-tuned exclusively on Egyptian-only corpora
  • No speaker identity guarantees
  • Dialectal spelling variations affect pronunciation

Ethical Considerations

This model can generate realistic human-like speech.
Users must:

  • Disclose synthetic audio where appropriate
  • Obtain consent for reference voices
  • Avoid misuse for impersonation or deception

Citation

If you use this model in research or demos, please cite:

@misc{chatterbox_egyptian_tts,
  title={Chatterbox Egyptian Arabic (Masri) Text-to-Speech},
  author={oddadmix},
  year={2025},
  howpublished={\url{https://huggingface.co/oddadmix/chatterbox-egyptian-v0}}
}
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for oddadmix/chatterbox-egyptian-v0

Finetuned
(23)
this model

Space using oddadmix/chatterbox-egyptian-v0 1