whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR

This model is a fine-tuned version of openai/whisper-small optimized for Turkish Automatic Speech Recognition (ASR).

Model Description

Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.

  • Base Model: openai/whisper-small
  • Language: Turkish (tr)
  • Task: Automatic Speech Recognition
  • Dataset: Codyfederer/tr-full-dataset

Training Data

The model uses the Codyfederer/tr-full-dataset, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.

Training Parameters

Training utilized the Hugging Face Trainer with the following Seq2SeqTrainingArguments:

  • output_dir: ./whisper-small-tr
  • per_device_train_batch_size: 16
  • gradient_accumulation_steps: 1
  • learning_rate: 3e-5
  • warmup_steps: 50
  • num_train_epochs: 3
  • weight_decay: 0.005
  • gradient_checkpointing: True
  • fp16: True
  • eval_strategy: "steps"
  • per_device_eval_batch_size: 8
  • predict_with_generate: True
  • generation_max_length: 225
  • save_steps: 200
  • eval_steps: 200
  • logging_steps: 25
  • report_to: ["tensorboard"]
  • load_best_model_at_end: True
  • metric_for_best_model: "wer"
  • greater_is_better: False
  • push_to_hub: True
  • hub_model_id: whisper-small-tr
  • optim: adamw_torch
  • dataloader_num_workers: 4
  • dataloader_pin_memory: True
  • save_total_limit: 2

Performance

Test set evaluation results:

  • Word Error Rate (WER): 7.75%
  • Character Error Rate (CER): 1.95%
  • Loss: 0.1321

The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.

Usage

Basic Usage

from transformers import pipeline
import torch

pipe = pipeline(
    task="automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr",
    chunk_length_s=30,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])

Gradio Demo

import gradio as gr
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr"
)

def transcribe(audio):
    if audio is None:
        return ""
    return pipe(audio)["text"]

demo = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
    outputs="text",
    title="Turkish Speech Recognition",
    description="Upload or record Turkish audio to transcribe."
)

demo.launch(share=True)

Advanced Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa

processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")

audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription[0])

Limitations

  • Trained on 3,000 samples, which may limit generalization
  • Performance may vary on noisy audio or non-standard dialects
  • Best results with clear audio at 16kHz sampling rate

Citation

@misc{whisper-small-tr,
  author = {emredeveloper},
  title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}

Acknowledgments

Downloads last month
40
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train emredeveloper/whisper-small-tr

Evaluation results