OpenBMB - Perform big model inference on a thousand-yuan GPU

BMInf

Perform big model inference on a thousand-yuan GPU. BMInf performs low-cost and high-efficiency inference for big models,which can perform big model inference with more than 10 billion parameters on a single thousand-yuan GPU (GTX 1060).

GitHub

Doc

Features

Hardware Friendly

BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU.

Open Source

The parameters of models are open-source. Users can access big models locally without accessing an online API.

Comprehensive Ability

BMInf supports CPM1, CPM2.1, and EVA. The abilities of these models cover text completion, text generation, and dialogue generation.

Convenient Deployment

Fast and convenient for developing downstream applications.

Capabilities

With BMInf, you can run inference on big models from anywhere.

Performance

We benchmarked BMInf with CPM2 decoding task on different platforms, and the results far exceeded PyTorch.

10B Model Decoding Speed

BMInf

PyTorch

Supported Models

CPM2.1

CPM2.1 is an upgraded version of CPM2 , which is a general Chinese pre-trained language model with 11 billion parameters. Based on CPM2, CPM2.1 introduces a generative pre-training task and was trained via the continual learning paradigm. In experiments, CPM2.1 has a better generation ability than CPM2.

CPM1

CPM1 is a generative Chinese pre-trained language model with 2.6 billion parameters. The architecture of CPM1 is similar to GPT and it can be used in various NLP tasks such as conversation, essay generation, cloze test, and language understanding.

EVA

EVA is a Chinese pre-trained dialogue model with 2.8 billion parameters. EVA performs well on many dialogue tasks, especially in the multi-turn interaction of human-bot conversations.

Toolkits

BMTrain

BMCook

BMInf

OpenPrompt

OpenDelta

ModelCenter

Resources

General Model License

Community

Blogs

BM Course

GitHub

About OpenBMB

About

Paper