InternLM-XComposer-2.5#

介绍#

InternLM-XComposer-2.5 在各种文本-图像理解和创作应用中表现出色，仅使用7B LLM后端即可达到GPT-4V水平的能力。IXC-2.5通过24K交错的图像-文本上下文进行训练，可以通过RoPE外推无缝扩展到96K长上下文。这种长上下文能力使IXC-2.5在需要大量输入和输出上下文的任务中表现尤为出色。LMDeploy支持在TurboMind引擎中使用internlm/internlm-xcomposer2d5-7b模型。

快速开始#

安装#

请按照安装指南安装LMDeploy，并安装InternLM-XComposer-2.5所需的其他包

pip install decord

离线推理管道#

以下示例代码展示了VLM管道的基本用法。更多示例，请参考VLM离线推理管道

from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy.vl.constants import IMAGE_TOKEN

pipe = pipeline('internlm/internlm-xcomposer2d5-7b')

image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe((f'describe this image', image))
print(response)

Lora 模型#

InternLM-XComposer-2.5 训练了用于网页创建和文章写作的LoRA权重。由于TurboMind后端不支持slora，一次只能部署一个LoRA模型，并且在部署模型时需要合并LoRA权重。LMDeploy提供了相应的转换脚本，使用方法如下：

export HF_MODEL=internlm/internlm-xcomposer2d5-7b
export WORK_DIR=internlm/internlm-xcomposer2d5-7b-web
export TASK=web
python -m lmdeploy.vl.tools.merge_xcomposer2d5_task $HF_MODEL $WORK_DIR --task $TASK

量化#

以下以基础模型为例展示量化方法。如果您想使用LoRA模型，请根据前一节合并LoRA模型。

export HF_MODEL=internlm/internlm-xcomposer2d5-7b
export WORK_DIR=internlm/internlm-xcomposer2d5-7b-4bit

lmdeploy lite auto_awq \
   $HF_MODEL \
  --work-dir $WORK_DIR

更多示例#

Video Understanding

以下以pipeline.chat接口api为例，演示其使用方法。其他接口api也支持推理，但需要手动拼接对话内容。

from lmdeploy import pipeline, GenerationConfig
from transformers.dynamic_module_utils import get_class_from_dynamic_module

HF_MODEL = 'internlm/internlm-xcomposer2d5-7b'
load_video = get_class_from_dynamic_module('ixc_utils.load_video', HF_MODEL)
frame2img = get_class_from_dynamic_module('ixc_utils.frame2img', HF_MODEL)
Video_transform = get_class_from_dynamic_module('ixc_utils.Video_transform', HF_MODEL)
get_font = get_class_from_dynamic_module('ixc_utils.get_font', HF_MODEL)

video = load_video('liuxiang.mp4') # https://github.com/InternLM/InternLM-XComposer/raw/main/examples/liuxiang.mp4
img = frame2img(video, get_font())
img = Video_transform(img)

pipe = pipeline(HF_MODEL)
gen_config = GenerationConfig(top_k=50, top_p=0.8, temperature=1.0)
query = 'Here are some frames of a video. Describe this video in detail'
sess = pipe.chat((query, img), gen_config=gen_config)
print(sess.response.text)

query = 'tell me the athlete code of Liu Xiang'
sess = pipe.chat(query, session=sess, gen_config=gen_config)
print(sess.response.text)

Multi-Image

from lmdeploy import pipeline, GenerationConfig
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.vl import load_image

query = f'Image1 {IMAGE_TOKEN}; Image2 {IMAGE_TOKEN}; Image3 {IMAGE_TOKEN}; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'

urls = ['https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars1.jpg',
        'https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars2.jpg',
        'https://raw.githubusercontent.com/InternLM/InternLM-XComposer/main/examples/cars3.jpg']
images = [load_image(url) for url in urls]

pipe = pipeline('internlm/internlm-xcomposer2d5-7b', log_level='INFO')
output = pipe((query, images), gen_config=GenerationConfig(top_k=0, top_p=0.8, random_seed=89247526689433939))

由于LMDeploy不支持beam search，生成的结果与使用beam search的transformers会有很大不同。建议关闭top_k或使用更大的top_k采样以增加多样性。

Instruction to Webpage

请首先使用上述说明转换网络模型。

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('/nvme/shared/internlm-xcomposer2d5-7b-web', log_level='INFO')
pipe.chat_template.meta_instruction = None

query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
output = pipe(query, gen_config=GenerationConfig(max_new_tokens=2048))

在使用transformers进行测试时，发现如果设置了repetition_penalty，当num_beams设置为1时，解码阶段有很大概率不会停止。由于LMDeploy不支持beam search，建议在使用LMDeploy进行推理时关闭repetition_penalty。

Write Article

请先使用上述说明转换写模型。

from lmdeploy import pipeline, GenerationConfig

pipe = pipeline('/nvme/shared/internlm-xcomposer2d5-7b-write', log_level='INFO')
pipe.chat_template.meta_instruction = None

query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
output = pipe(query, gen_config=GenerationConfig(max_new_tokens=8192))

InternLM-XComposer-2.5

目录

InternLM-XComposer-2.5#

介绍#

快速开始#

安装#

离线推理管道#

Lora 模型#

量化#

更多示例#