whisper-small-tr - Fine-tuned Whisper Small for Turkish ASR

This model is a fine-tuned version of openai/whisper-small optimized for Turkish Automatic Speech Recognition (ASR).

Model Description

Whisper is a pre-trained model for automatic speech recognition and speech translation. This version has been fine-tuned on Turkish audio data to improve performance on Turkish speech recognition tasks.

Base Model: openai/whisper-small
Language: Turkish (tr)
Task: Automatic Speech Recognition
Dataset: Codyfederer/tr-full-dataset

Training Data

The model uses the Codyfederer/tr-full-dataset, consisting of 3,000 Turkish audio-transcription samples, split into 90% training and 10% testing.

Training Parameters

Training utilized the Hugging Face Trainer with the following Seq2SeqTrainingArguments:

output_dir: ./whisper-small-tr
per_device_train_batch_size: 16
gradient_accumulation_steps: 1
learning_rate: 3e-5
warmup_steps: 50
num_train_epochs: 3
weight_decay: 0.005
gradient_checkpointing: True
fp16: True
eval_strategy: "steps"
per_device_eval_batch_size: 8
predict_with_generate: True
generation_max_length: 225
save_steps: 200
eval_steps: 200
logging_steps: 25
report_to: ["tensorboard"]
load_best_model_at_end: True
metric_for_best_model: "wer"
greater_is_better: False
push_to_hub: True
hub_model_id: whisper-small-tr
optim: adamw_torch
dataloader_num_workers: 4
dataloader_pin_memory: True
save_total_limit: 2

Performance

Test set evaluation results:

Word Error Rate (WER): 7.75%
Character Error Rate (CER): 1.95%
Loss: 0.1321

The fine-tuned model shows significant improvement in Turkish ASR performance compared to the base model.

Usage

Basic Usage

from transformers import pipeline
import torch

pipe = pipeline(
    task="automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr",
    chunk_length_s=30,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

audio_file = "path/to/your/audio.mp3"
result = pipe(audio_file)
print(result["text"])

Gradio Demo

import gradio as gr
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="emredeveloper/whisper-small-tr"
)

def transcribe(audio):
    if audio is None:
        return ""
    return pipe(audio)["text"]

demo = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
    outputs="text",
    title="Turkish Speech Recognition",
    description="Upload or record Turkish audio to transcribe."
)

demo.launch(share=True)

Advanced Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa

processor = WhisperProcessor.from_pretrained("emredeveloper/whisper-small-tr")
model = WhisperForConditionalGeneration.from_pretrained("emredeveloper/whisper-small-tr")

audio, sr = librosa.load("audio.mp3", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

print(transcription[0])

Limitations

Trained on 3,000 samples, which may limit generalization
Performance may vary on noisy audio or non-standard dialects
Best results with clear audio at 16kHz sampling rate

Citation

@misc{whisper-small-tr,
  author = {emredeveloper},
  title = {whisper-small-tr: Fine-tuned Whisper Small for Turkish ASR},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/emredeveloper/whisper-small-tr}}
}

Acknowledgments

Downloads last month: 40

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train emredeveloper/whisper-small-tr

Evaluation results

Word Error Rate
self-reported

7.750
Character Error Rate
self-reported

1.950