LusakaLang Topic Analysis Model
This model was trained using its sister model, mbert_LusakaLang_Sentiment_Analysis, which was fine‑tuned on sentiment data
spanning English, Bemba, Nyanja, Zambian slang, and mixed Zambian language varieties commonly used in everyday communication.
Training Details
- Base model: `mbert_LusakaLang_Sentiment_Analysis`
- Epochs: 20
- Class weights: enabled (to correct class imbalance)
- Optimizer: AdamW
- Loss: Weighted cross‑entropy
- Temperature scaling: T = 2.3 (applied at inference time)
Why Temperature Scaling?
Class‑weighted training sharpens logits.
Temperature scaling at T = 2.3 improves:
- Confidence calibration
- Noise robustness
- Handling of positive/neutral text
- Foreign‑language generalization
- Reduction of overconfident misclassifications
Training Data
The dataset was primarily synthetic, generated to simulate realistic ride‑hailing feedback in Zambia.
To ensure authenticity:
- All samples were reviewed by a native Zambian speaker
- Mixed langauge and slang patterns were corrected
- Local idioms and slang were added
- Unnatural AI‑generated phrasing was removed
- Bemba/Nyanja grammars and tone were validated
This hybrid approach ensures tha the dataset reflects real Zambian communication style.
Train and Validation Loss
Confusion Matrix
Word Cloud
- Downloads last month
- 114
Model tree for Kelvinmbewe/mbert_LusakaLang_Topic
Base model
google-bert/bert-base-multilingual-casedEvaluation results
- accuracy on LusakaLang Topic Datasetvalidation set self-reported0.993
- precision on LusakaLang Topic Datasetvalidation set self-reported0.987
- recall on LusakaLang Topic Datasetvalidation set self-reported0.991
- macro_f1 on LusakaLang Topic Datasetvalidation set self-reported0.989
- micro_f1 on LusakaLang Topic Datasetvalidation set self-reported0.993
- validation_loss on LusakaLang Topic Datasetvalidation set self-reported0.052


