You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-\[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
53
53
-\[2024/08\] LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
54
54
-\[2024/07\] Support Llama3.1 8B, 70B and its TOOLS CALLING
55
-
-\[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
55
+
-\[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
56
56
-\[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
57
57
-\[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
58
-
-\[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5, LLaVa, InternLMXComposer2
59
-
-\[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
58
+
-\[2024/05\] Support 4-bits weight-only quantization and inference on VLMs, such as InternVL v1.5 and LLaVa
59
+
-\[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2 and MiniGemini.
60
60
-\[2024/04\] TurboMind adds online int8/int4 KV cache quantization and inference for all supported devices. Refer [here](docs/en/quantization/kv_quant.md) for detailed guide
61
61
-\[2024/04\] TurboMind latest upgrade boosts GQA, rocketing the [internlm2-20b](https://huggingface.co/internlm/internlm2-20b) model inference to 16+ RPS, about 1.8x faster than vLLM.
62
62
-\[2024/04\] Support Qwen1.5-MOE and dbrx.
@@ -171,8 +171,6 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
0 commit comments