The toolkit for big model “slimming”. BMCook performs efficient compression for big models to improve operating efficiency.
Through the combination of algorithms such as quantization, pruning, distillation, and MoEfication, 90%+ effects of the original model can be maintained, and model inference can be accelerated by 10 times.
Model Quantization
4 times faster operation speed, using 1/4 storage space.
Model Pruning
Pruning 50% of the parameters can speed up 1 times
Model MoEfication
Reduce 80% linear layer parameters, can speed up 1 times
Model Distillation
Provide better supervision models for the above modules
Supported Methods
Compared to existing compression toolkits, BMCook supports all mainstream acceleration methods for pre-trained language models.
Combination in Any Way
Decoupled implementations, the compression methods can be combined in any way towards extreme acceleration.