OpenGVLab
/

InternVL2_5-4B-MPO

@@ -65,7 +65,7 @@ To construct this dataset, we propose an efficient data construction pipeline. S
 - **For samples with clear ground truths:**
   the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
-  Responses matching the ground truth answer constitute the positive set \\(mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
   Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
 - **For samples without clear ground truths:**
@@ -160,7 +160,7 @@ To comprehensively compare InternVL's performance before and after MPO, we emplo
 ## Quick Start
-We provide an example code to run `InternVL2_5-1B` using `transformers`.
 > Please use transformers>=4.37.2 to ensure the model works normally.
@@ -171,7 +171,7 @@ We provide an example code to run `InternVL2_5-1B` using `transformers`.
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
-path = "OpenGVLab/InternVL2_5-1B"
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
@@ -185,7 +185,7 @@ model = AutoModel.from_pretrained(
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
-path = "OpenGVLab/InternVL2_5-1B"
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
@@ -230,8 +230,8 @@ def split_model(model_name):
     return device_map
-path = "OpenGVLab/InternVL2_5-1B"
-device_map = split_model('InternVL2_5-1B')
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
@@ -327,7 +327,7 @@ def load_image(image_file, input_size=448, max_num=12):
     return pixel_values
 # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
-path = 'OpenGVLab/InternVL2_5-1B'
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
@@ -510,7 +510,7 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language
 from lmdeploy import pipeline, TurbomindEngineConfig
 from lmdeploy.vl import load_image
-model = 'OpenGVLab/InternVL2_5-1B'
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 response = pipe(('describe this image', image))
@@ -528,7 +528,7 @@ from lmdeploy import pipeline, TurbomindEngineConfig
 from lmdeploy.vl import load_image
 from lmdeploy.vl.constants import IMAGE_TOKEN
-model = 'OpenGVLab/InternVL2_5-1B'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image_urls=[
@@ -550,7 +550,7 @@ Conducting inference with batch prompts is quite straightforward; just place the
 from lmdeploy import pipeline, TurbomindEngineConfig
 from lmdeploy.vl import load_image
-model = 'OpenGVLab/InternVL2_5-1B'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image_urls=[
@@ -570,7 +570,7 @@ There are two ways to do the multi-turn conversations with the pipeline. One is
 from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
 from lmdeploy.vl import load_image
-model = 'OpenGVLab/InternVL2_5-1B'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
@@ -586,7 +586,7 @@ print(sess.response.text)
 LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
 ```shell
-lmdeploy serve api_server OpenGVLab/InternVL2_5-1B --server-port 23333
 ```
 To use the OpenAI-style interface, you need to install OpenAI:
@@ -625,7 +625,7 @@ print(response)
 ## License
-This project is released under the MIT License. This project uses the pre-trained Qwen2.5-0.5B-Instruct as a component, which is licensed under the Apache License 2.0.
 ## Citation

 - **For samples with clear ground truths:**
   the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
+  Responses matching the ground truth answer constitute the positive set \\(\mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
   Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
 - **For samples without clear ground truths:**
 ## Quick Start
+We provide an example code to run `InternVL2_5-4B-MPO` using `transformers`.
 > Please use transformers>=4.37.2 to ensure the model works normally.
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
+path = "OpenGVLab/InternVL2_5-4B-MPO"
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
 ```python
 import torch
 from transformers import AutoTokenizer, AutoModel
+path = "OpenGVLab/InternVL2_5-4B-MPO"
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
     return device_map
+path = "OpenGVLab/InternVL2_5-4B-MPO"
+device_map = split_model('InternVL2_5-4B')
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
     return pixel_values
 # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
+path = 'OpenGVLab/InternVL2_5-4B-MPO'
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,
 from lmdeploy import pipeline, TurbomindEngineConfig
 from lmdeploy.vl import load_image
+model = 'OpenGVLab/InternVL2_5-4B-MPO'
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 response = pipe(('describe this image', image))
 from lmdeploy.vl import load_image
 from lmdeploy.vl.constants import IMAGE_TOKEN
+model = 'OpenGVLab/InternVL2_5-4B-MPO'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image_urls=[
 from lmdeploy import pipeline, TurbomindEngineConfig
 from lmdeploy.vl import load_image
+model = 'OpenGVLab/InternVL2_5-4B-MPO'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image_urls=[
 from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
 from lmdeploy.vl import load_image
+model = 'OpenGVLab/InternVL2_5-4B-MPO'
 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
 image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
 LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
 ```shell
+lmdeploy serve api_server OpenGVLab/InternVL2_5-4B-MPO --server-port 23333
 ```
 To use the OpenAI-style interface, you need to install OpenAI:
 ## License
+This project is released under the MIT License. This project uses the pre-trained Qwen2.5-3B-Instruct as a component, which is licensed under the Apache License 2.0.
 ## Citation