I need to implement some state-of-the-art knowledge distillation (KD) methods to distill dark knowledge of the teacher network to the student network with Pytorch. I would really appreciated to any advice to find the state-of-the-art KD methods.
Asked
Active
Viewed 57 times
0
-
https://paperswithcode.com/sota/knowledge-distillation-on-cifar-100 https://arxiv.org/pdf/2112.00459v3.pdf is very simple to implement. Other methods can be more computationally expensive, or require training much more additional layers. – Roy Apr 12 '23 at 14:38