Model Description

The model was first fine-tuned on the SciQ dataset to learn general scientific question generation from context. It was subsequently fine-tuned on a custom IGCSE / A-Level Biology dataset, developed with the assistance of Gemini 3, to adapt the model for paraphrasing and academic style transfer. Although the second stage focuses on Biology, the model retains strong performance on other science domains (e.g., general science and basic physics or chemistry questions) due to its initial SciQ training.

Examples

Input: generate a question: answer: astronomers context: Earth is just a tiny speck in the universe. Our planet is surrounded by lots of space. Light travels across empty space. Astronomers can study light from stars to learn about the universe. Light is the visible part of the electromagnetic spectrum . Astronomers use the light that comes to us to gather information about the universe.
Output: What type of researchers use the visible part of the electromagnetic spectrum to gather information about the universe?

Input: generate a question: answer: protostars context: Stars form from giant molecular clouds of gas and dust called nebulae. Under gravity, these clouds collapse and heat up to form protostars. When the core temperature becomes high enough, nuclear fusion begins, converting hydrogen into helium. This process releases energy that exerts outward pressure, balancing gravity and stabilizing the star as a main sequence star.
Output: What do we call the nebulae that collapse and heat up to form under gravitational pressure?

Input: generate a question: answer: tunica media context: Arteries are adapted to withstand high blood pressure. Their walls consist of three layers: the tunica intima (endothelium), the tunica media (smooth muscle and elastic fibers), and the tunica externa (collagen). The elastic fibers allow the artery to stretch during ventricular systole and recoil during diastole, helping to smooth out blood flow and maintain pressure.
Output: Which layer of the arterial wall is composed of smooth muscle and elastic fibers?

Input: generate a question: answer: heterotrophs context: Consumers are organisms that depend on other organisms for food. They take in organic molecules by essentially "eating" other living things. They include all animals and fungi. (Fungi don't really "eat"; they absorb nutrients from other organisms.) They also include many bacteria and even a few plants, such as the pitcher plant shown in the Figure below. Consumers are also called heterotrophs. Heterotrophs are classified by what they eat:
Output: What is the name for the group of organisms that depend on other organisms for food?

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("banyaroo/t5-paraphrased-question-generation")

text = "generate a question: answer: photolysis context: The light-dependent reactions of photosynthesis occur in the thylakoid membranes. Here, light energy splits water (photolysis) to produce oxygen, ATP, and NADPH."

inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=True,
    top_k=5,
    temperature=0.6,
    no_repeat_ngram_size=3,
)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
# Which chemical reaction occurs in the thylakoid membranes to split water?
Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for banyaroo/t5-paraphrased-question-generation

Base model

google-t5/t5-base
Finetuned
(724)
this model

Dataset used to train banyaroo/t5-paraphrased-question-generation