Madlab Synthetic Data Generator

🧠 Overview

The Madlab SDG 1.2B is part of the MadlabOSS Synthetic Data Generator family β€” a suite of small, efficient synthetic data generators designed for rule‑consistent, semantically coherent variation.
This model was trained on a closed-source dataset created through a multi-stage synthetic data generation process using a modified Madlab training pipeline. It represents the first in its family to be built upon the cutting-edge LFM2.5-instruct foundation, marking a significant advancement over previous iterations.


πŸš€ Intended Use

This model is optimized for:

  • Madlab synthetic data generation

It is not intended as a general-purpose chatbot.


🧩 Model Details

Base Model: LFM2.5-1.2B-instruct
Parameter Count: 1.2 Billion
Training Type: Supervised fine-tuning
Sequence Length: 1024
Precision: FP16
Framework: PyTorch / Transformers


πŸ“¦ Training Data

The model was trained on:

  • 1444 compressed and encoded dataset pairs
  • High variation in output
  • Preservation of semantic meaning
  • Data entirely generated with Madlab

πŸ‹οΈ Training Procedure

Hyperparameters

  • Epochs: 6
  • Batch size: 48
  • Learning rate: cosine schedule, peak ~4e-5
  • Optimizer: AdamW
  • Gradient clipping: 1.0
  • Gradient accumulation: 1

Hardware

Training was performed on:

  • RTX 6000 Blackwell (96GB)

πŸ“Š Evaluation

multi_model_dashboard

Synthetic Data Expansion Benchmark

A curated set of 30 input/target pairs was programmatically expanded using a Python script.
Metrics include seed pairs covered, total variation count, and semantic quality.
The task is to generate 5 variations of each incoming pair.

note: run numbers not aligned with multi_model_dashboard

Run Model Semantic Quality Variations Seeds Covered Efficiency (Variations/Param) Dataset
1 LFM2-350M-16 6.5 94 23 268.57 Madlab sdg small
2 LFM2-350M-16 3.5 46 11 131.43 base model
3 LFM2-350M-f16 6.5 97 22 277.14 Madlab sdg small
4 Qwen3-coder-30B-instruct-q8 8.2 149 26 4.97 base model
5 LFM2-350M-f16 7.5 136 21 388.57 Madlab sdg medium
6 LFM2-2.6B-f16 9.0 137 25 52.69 Madlab sdg medium
7 LFM2-2.6B-f16 9.9 180 25 69.23 Madlab sdg large
8 LFM2-2.6B-f16 6.2 157 20 60.38 Madlab sdg test
9 LFM2-2.6B-f16 10.0 248 27 95.38 Madlab sdg large
10 Qwen3-235B-q3-k_m 9.5 150 27 0.64 base model
11 LFM2.5-1.2B-instruct-f16 9.1 244 30 203.33 Madlab sdg large

Qualitative Behavior

  • Overperforms in variation count
  • Maintains strict semantic correctness

πŸ”’ Safety

This model is a synthetic data generator. It is not designed for conversational use and is not suitable for anything other than generating synthetic datasets.

It is not designed for:

  • Political advice
  • Medical advice
  • Legal advice
  • General-purpose conversation

⚠️ Limitations

  • Not a general assistant
  • Not trained for coding, math, or open-domain reasoning
  • May refuse tasks outside the Madlab SDG scope

Downloads last month
3
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MadlabOSS/LFM2.5-1.2B-Instruct-SDG

Finetuned
(9)
this model
Quantizations
1 model

Collection including MadlabOSS/LFM2.5-1.2B-Instruct-SDG