ModReduce: A Multi-Knowledge Distillation Framework

Abbas, Yahya; Badawy, Abdelhakim; Mahfouz, Mohamed; Hussein, Samah; Ayman, Samah; Eraqi, Hesham M.; Salama, Cherif

doi:xxxxxx

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Preprint
→
View Item

dc.contributor.author	Abbas, Yahya
dc.contributor.author	Badawy, Abdelhakim
dc.contributor.author	Mahfouz, Mohamed
dc.contributor.author	Hussein, Samah
dc.contributor.author	Ayman, Samah
dc.contributor.author	Eraqi, Hesham M.
dc.contributor.author	Salama, Cherif
dc.date.accessioned	2024-08-24T23:56:43Z
dc.date.available	2024-08-24T23:56:43Z
dc.date.issued	2024-08-25
dc.identifier.issn	2210-142X
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/5877
dc.description.abstract	Deep neural networks have achieved revolutionary results in several domains; nevertheless, they require extensive computational resources and memory footprint. Research has been conducted in the field of knowledge distillation, aiming to enhance the performance of smaller models by transferring knowledge from larger networks, which can be categorized into three main types: response-based, feature-based, and relation-based. Existing works explored using one or two knowledge types; however, we hypothesize that distilling all three knowledge types should lead to more comprehensive transfer of information and would improve the student's accuracy. In this paper, we propose ModReduce; a unified knowledge distillation framework that utilizes the three knowledge types using a combination of offline and online knowledge distillation. ModReduce is a generic distillation framework that utilizes state-of-the-art methods for each knowledge distillation type to learn a better student. As such, it can be updated with new state-of-the-art methods as they become available. During training, three student instances each learn a single knowledge type from the teacher using offline distillation before leveraging online distillation to teach each other what they learned; analogous to peer learning in real life where different students can excel in different parts of a subject they are learning from their teacher and then help each other learn the other parts. During inference, only the best performing student is used, so no additional inference costs are introduced. Extensive experimentation on 15 different Teacher-Student architectures demonstrated that ModReduce produces a student that outperforms state-of-the-art methods with an average relative improvement up to 48.29% without additional inference cost. Source code is available at https://github.com/Yahya-Abbas/ModReduce.	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	Knowledge Distillation; Model Compression; Deep Learning; Response Knowledge; Relational Knowledge; Feature Knowledge	en_US
dc.title	ModReduce: A Multi-Knowledge Distillation Framework	en_US
dc.identifier.doi	xxxxxx
dc.volume	16	en_US
dc.issue	1	en_US
dc.pagestart	1	en_US
dc.pageend	10	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authorcountry	Egypt	en_US
dc.contributor.authoraffiliation	The American University In Cairo	en_US
dc.contributor.authoraffiliation	The American University In Cairo	en_US
dc.contributor.authoraffiliation	The American University In Cairo	en_US
dc.contributor.authoraffiliation	The American University In Cairo	en_US
dc.contributor.authoraffiliation	The American University In Cairo	en_US
dc.contributor.authoraffiliation	Amazon	en_US
dc.contributor.authoraffiliation	The American University in Cairo & Ain Shams University	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US