whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR
This model is a fine-tuned version of openai/whisper-small optimized for Turkish Automatic Speech Recognition (ASR).
Model Description
Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.
- Base Model: openai/whisper-small
- Language: Turkish (tr)
- Task: Automatic Speech Recognition
- Dataset: Codyfederer/tr-full-dataset
Training Data
The model uses the Codyfederer/tr-full-dataset, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.
Training Parameters
Training utilized the Hugging Face Trainer with the following Seq2SeqTrainingArguments:
output_dir:./whisper-small-trper_device_train_batch_size: 16gradient_accumulation_steps: 1learning_rate: 3e-5warmup_steps: 50num_train_epochs: 3weight_decay: 0.005gradient_checkpointing: Truefp16: Trueeval_strategy: "steps"per_device_eval_batch_size: 8predict_with_generate: Truegeneration_max_length: 225save_steps: 200eval_steps: 200logging_steps: 25report_to: ["tensorboard"]load_best_model_at_end: Truemetric_for_best_model: "wer"greater_is_better: Falsepush_to_hub: Truehub_model_id: whisper-small-troptim: adamw_torchdataloader_num_workers: 4dataloader_pin_memory: Truesave_total_limit: 2
Performance
Test set evaluation results:
- Word Error Rate (WER): 7.75%
- Character Error Rate (CER): 1.95%
- Loss: 0.1321
The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.
Usage
Basic Usage
from transformers import pipeline
import torch
pipe = pipeline(
task="automatic-speech-recognition",
model="emredeveloper/whisper-small-tr",
chunk_length_s=30,
device="cuda" if torch.cuda.is_available() else "cpu",
)
audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])
Gradio Demo
import gradio as gr
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="emredeveloper/whisper-small-tr"
)
def transcribe(audio):
if audio is None:
return ""
return pipe(audio)["text"]
demo = gr.Interface(
fn=transcribe,
inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
outputs="text",
title="Turkish Speech Recognition",
description="Upload or record Turkish audio to transcribe."
)
demo.launch(share=True)
Advanced Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")
audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
Limitations
- Trained on 3,000 samples, which may limit generalization
- Performance may vary on noisy audio or non-standard dialects
- Best results with clear audio at 16kHz sampling rate
Citation
@misc{whisper-small-tr,
author = {emredeveloper},
title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}
Acknowledgments
- Base model: openai/whisper-small
- Dataset: Codyfederer/tr-full-dataset
- Built with Hugging Face Transformers
- Downloads last month
- 40
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Dataset used to train emredeveloper/whisper-small-tr
Evaluation results
- Word Error Rateself-reported7.750
- Character Error Rateself-reported1.950