LusakaLang Topic Analysis Model

This model was trained using its sister model, mbert_LusakaLang_Sentiment_Analysis, which was fine‑tuned on sentiment data spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication.

Training Details

- Base model: `mbert_LusakaLang_Sentiment_Analysis`
- Epochs: 20  
- Class weights: enabled (to correct class imbalance)  
- Optimizer: AdamW  
- Loss: Weighted cross‑entropy  
- Temperature scaling: T = 2.3 (applied at inference time)

Why Temperature Scaling?

Class‑weighted training sharpens logits.  
Temperature scaling at T = 2.3 improves:

- Confidence calibration  
- Noise robustness  
- Handling of positive/neutral text  
- Foreign‑language generalization  
- Reduction of overconfident misclassifications  

Training Data

The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia.  
To ensure authenticity:

- All samples were reviewed by a native Zambian speaker  
- Mixed langauge and slang patterns were corrected  
- Local idioms and slang were added  
- Unnatural AI‑generated phrasing was removed  
- Bemba/Nyanja grammars and tone were validated  

This hybrid approach ensures tha the dataset reflects real Zambian communication style.

Train and Validation Loss

image

Confusion Matrix

image

Word Cloud

image

Downloads last month
114
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kelvinmbewe/mbert_LusakaLang_Topic

Evaluation results