BMCook
The toolkit for big model “slimming”. BMCook performs efficient compression for big models to improve operating efficiency.
Through the combination of algorithms such as quantization, pruning, distillation, and MoEfication, 90%+ effects of the original model can be maintained, and model inference can be accelerated by 10 times.
GitHub
Doc
Share
Features
Model Quantization
4 times faster operation speed, using 1/4 storage space.
Model Pruning
Pruning 50% of the parameters can speed up 1 times
Model MoEfication
Reduce 80% linear layer parameters, can speed up 1 times
Model Distillation
Provide better supervision models for the above modules
Supported Methods
Compared to existing compression toolkits, BMCook supports all mainstream acceleration methods for pre-trained language models.
Combination in Any Way
Decoupled implementations, the compression methods can be combined in any way towards extreme acceleration.