ostris/Z-Image-De-Turbo · A De-distilled Z-Image-Turbo

Even_Adder@lemmy.dbzer0.com · 7 days ago

ostris/Z-Image-De-Turbo · A De-distilled Z-Image-Turbo

Even_Adder@lemmy.dbzer0.com · edit-2 7 days ago

It’s basically when you use a larger model to train a smaller one. You use a dataset of data generated by the teacher model and ground truth data to train the student model, and by some strange alchemy I don’t quite understand you get a much smaller model that resembles the teacher model.

It’s really hard training on a distilled model without breaking it, so people prefer models undistilled whenever possible. Without the teacher model, distilled models are basically cripple-ware.

Scrubbles@poptalk.scrubbles.tech · 7 days ago

Thanks for explaining!

ostris/Z-Image-De-Turbo · A De-distilled Z-Image-Turbo

ostris/Z-Image-De-Turbo · A De-distilled Z-Image-Turbo

ostris/Z-Image-De-Turbo · Hugging Face