|
Framework of DUKD
When distil the super-resolution (SR) network, the imitating targets of the student model, i.e. outputs of the teacher model, are actually noisy approximations to the ground-truth distribution of high-quality images (GT). So the teacher's distribution information is shaded by the GT however the training data are augmented (recycled usage), resulting in limited KD effects. To utilize teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer teacher model’s knowledge to student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization for SRKD by the paired invertible augmentations to improve the generalizability.
|