InternLM-XComposer-2.5#
介绍#
InternLM-XComposer-2.5 在各种文本-图像理解和创作应用中表现出色,仅使用7B LLM后端即可达到GPT-4V水平的能力。IXC-2.5通过24K交错的图像-文本上下文进行训练,可以通过RoPE外推无缝扩展到96K长上下文。这种长上下文能力使IXC-2.5在需要大量输入和输出上下文的任务中表现尤为出色。LMDeploy支持在TurboMind引擎中使用internlm/internlm-xcomposer2d5-7b模型。
快速开始#
安装#
请按照安装指南安装LMDeploy,并安装InternLM-XComposer-2.5所需的其他包
pip install decord
离线推理管道#
以下示例代码展示了VLM管道的基本用法。更多示例,请参考VLM离线推理管道
from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.vl.constants import IMAGE_TOKEN
pipe = pipeline('internlm/internlm-xcomposer2d5-7b')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe((f'describe this image', image))
print(response)
Lora 模型#
InternLM-XComposer-2.5 训练了用于网页创建和文章写作的LoRA权重。由于TurboMind后端不支持slora,一次只能部署一个LoRA模型,并且在部署模型时需要合并LoRA权重。LMDeploy提供了相应的转换脚本,使用方法如下:
export HF_MODEL=internlm/internlm-xcomposer2d5-7b
export WORK_DIR=internlm/internlm-xcomposer2d5-7b-web
export TASK=web
python -m lmdeploy.vl.tools.merge_xcomposer2d5_task $HF_MODEL $WORK_DIR --task $TASK
量化#
以下以基础模型为例展示量化方法。如果您想使用LoRA模型,请根据前一节合并LoRA模型。
export HF_MODEL=internlm/internlm-xcomposer2d5-7b
export WORK_DIR=internlm/internlm-xcomposer2d5-7b-4bit
lmdeploy lite auto_awq \
$HF_MODEL \
--work-dir $WORK_DIR
更多示例#
Video Understanding
以下以pipeline.chat
接口api为例,演示其使用方法。其他接口api也支持推理,但需要手动拼接对话内容。
from lmdeploy import pipeline, GenerationConfig
from transformers.dynamic_module_utils import get_class_from_dynamic_module
HF_MODEL = 'internlm/internlm-xcomposer2d5-7b'
load_video = get_class_from_dynamic_module('ixc_utils.load_video', HF_MODEL)
frame2img = get_class_from_dynamic_module('ixc_utils.frame2img', HF_MODEL)
Video_transform = get_class_from_dynamic_module('ixc_utils.Video_transform', HF_MODEL)
get_font = get_class_from_dynamic_module('ixc_utils.get_font', HF_MODEL)
video = load_video('liuxiang.mp4') # https://github.com/InternLM/InternLM-XComposer/raw/main/examples/liuxiang.mp4
img = frame2img(video, get_font())
img = Video_transform(img)
pipe = pipeline(HF_MODEL)
gen_config = GenerationConfig(top_k=50, top_p=0.8, temperature=1.0)
query = 'Here are some frames of a video. Describe this video in detail'
sess = pipe.chat((query, img), gen_config=gen_config)
print(sess.response.text)
query = 'tell me the athlete code of Liu Xiang'
sess = pipe.chat(query, session=sess, gen_config=gen_config)
print(sess.response.text)
Multi-Image
from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.vl import load_image
query = f'Image1 {IMAGE_TOKEN}; Image2 {IMAGE_TOKEN}; Image3 {IMAGE_TOKEN}; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
urls = ['https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars1.jpg',
'https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars2.jpg',
'https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars3.jpg']
images = [load_image(url) for url in urls]
pipe = pipeline('internlm/internlm-xcomposer2d5-7b', log_level='INFO')
output = pipe((query, images), gen_config=GenerationConfig(top_k=0, top_p=0.8, random_seed=89247526689433939))
由于LMDeploy不支持beam search,生成的结果与使用beam search的transformers会有很大不同。建议关闭top_k或使用更大的top_k采样以增加多样性。
Instruction to Webpage
请首先使用上述说明转换网络模型。
from lmdeploy import pipeline, GenerationConfig
pipe = pipeline('/nvme/shared/internlm-xcomposer2d5-7b-web', log_level='INFO')
pipe.chat_template.meta_instruction = None
query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
output = pipe(query, gen_config=GenerationConfig(max_new_tokens=2048))
在使用transformers进行测试时,发现如果设置了repetition_penalty,当num_beams
设置为1时,解码阶段有很大概率不会停止。由于LMDeploy不支持beam search,建议在使用LMDeploy进行推理时关闭repetition_penalty。
Write Article
请先使用上述说明转换写模型。
from lmdeploy import pipeline, GenerationConfig
pipe = pipeline('/nvme/shared/internlm-xcomposer2d5-7b-write', log_level='INFO')
pipe.chat_template.meta_instruction = None
query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
output = pipe(query, gen_config=GenerationConfig(max_new_tokens=8192))