离线引擎API#

SGLang 提供了一个直接的推理引擎,无需 HTTP 服务器,特别适用于那些额外的 HTTP 服务器会增加不必要的复杂性或开销的使用场景。以下是两个常见的使用场景:

  • 离线批量推理

  • 引擎之上的自定义服务器

本文档重点介绍离线批量推理,展示了四种不同的推理模式:

  • 非流式同步生成

  • 流式同步生成

  • 非流式异步生成

  • 流式异步生成

此外,您可以在SGLang离线引擎的基础上轻松构建自定义服务器。一个在Python脚本中工作的详细示例可以在custom_server中找到。

离线批量推理#

SGLang离线引擎支持高效的调度批量推理。

[1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.15it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.07it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.56it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.33it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]

100%|██████████| 23/23 [00:04<00:00,  4.76it/s]

非流式同步生成#

[2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
===============================
Prompt: Hello, my name is
Generated text:  Sante Applegate and I am excited to be part of the Lee’s Summit R-7 School District team as a school counselor at Lee’s Summit High School. This will be my 14th year in education, and I am passionate about helping students succeed academically, personally, and professionally.
Prior to joining the district, I worked as a counselor in several schools in the Kansas City area. I have a Master’s degree in School Counseling from the University of Kansas and a Bachelor’s degree in Psychology from Benedictine College.
I am committed to providing a supportive and inclusive environment for all students. As a school counselor, my goal
===============================
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the most powerful person in the world. The president is elected by the Electoral College, and serves a four-year term. The president has many responsibilities, including setting national policy, commanding the military, and representing the United States abroad.
The president's powers and duties are outlined in the Constitution and other laws, but some of the key responsibilities include:
1. Setting national policy: The president has the power to propose and implement national policies on a wide range of issues, including foreign policy, domestic policy, and economic policy.
2. Commanding the military: The president
===============================
Prompt: The capital of France is
Generated text:  Paris. The country is located in Western Europe, and its official name is the French Republic. France is a significant economic and political power in Europe and the world, and it is one of the most culturally influential countries in the world. The country has a population of around 67 million people, and the official language is French.
France is a parliamentary republic, with a president as head of state and a prime minister as head of government. The country has a long history of monarchies, empires, and revolutions, and it has played a significant role in shaping modern Europe. France is known for its beautiful cities, art, fashion,
===============================
Prompt: The future of AI is
Generated text:  at the intersection of human experience and technological advancements.
“From the beginning, we knew that technology was going to change the world, but we never imagined how rapidly it would progress. Today, we are at the cusp of a new era where human experience and technological advancements intersect. Artificial intelligence (AI) has the potential to revolutionize the way we live, work, and interact with each other. The key to unlocking its full potential is not just about developing more complex algorithms or hardware but also about understanding the human side of AI – the emotional, social, and cognitive aspects that will determine how we adopt and integrate AI into our lives.

流式同步生成#

[3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text:  Rev. Linda Campbell and I am a licensed and ordained United Methodist Minister. I have been serving in a variety of settings, including congregations, camps, and non-profit organizations, for over 20 years. My passion is to help people discover and deepen their faith, to find meaning and purpose in their lives, and to live more authentically as children of God. I have a Master's degree in Divinity from Drew University, and I am a certified spiritual director and coach. I am also a trained mediator and conflict resolver, and I enjoy facilitating small groups and workshops on a variety of topics related to faith, spirituality, and

Prompt: The capital of France is
Generated text:  one of the most romantic and beautiful cities in the world. From the Eiffel Tower to the Louvre Museum, there is so much to see and experience in Paris. Here are a few reasons why you should visit this amazing city:
The Eiffel Tower, one of the most iconic landmarks in the world, is a must-visit attraction in Paris. You can take the stairs or elevator to the top for breathtaking views of the city. The tower was built for the World's Fair in 1889 and was initially intended to be a temporary structure. However, it has become an iconic symbol of Paris and one of the most

Prompt: The future of AI is
Generated text:  not about the machines themselves, but about how they can improve and enhance our lives. In his book, “Life 3.0: Being Human in the Age of Artificial Intelligence,” Max Tegmark, a professor of physics at MIT, argues that AI has the potential to solve some of the world’s most pressing problems, such as disease, poverty, and climate change.
Tegmark suggests that AI can be a powerful tool for good, but it also raises important questions about the future of work, the nature of consciousness, and the potential risks of creating intelligent machines. He argues that we need to have a more nuanced and informed

非流式异步生成#

[4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())

=== Testing asynchronous batch generation ===

Prompt: Hello, my name is
Generated text:  Captain Jack Sparrow, and I have been transported back in time to the year 1777. The American colonies are still under British rule, and the seeds of revolution are beginning to be sown. I am in the midst of a heist, stealing a valuable shipment of rum, when I am suddenly confronted by a group of Patriot militia. They are armed to the teeth and look like they mean business.
As the leader of the group steps forward, I can see the fire of rebellion burning in his eyes. He introduces himself as Benjamin Franklin, and I can sense the intellectual and cunning that lies beneath his humble demeanor. I quickly

Prompt: The capital of France is
Generated text:  known for its rich history, stunning architecture, and famous landmarks. It is the epicenter of French culture, cuisine, and fashion. However, the city is also known for its less-than-favorable reputation when it comes to street crime and petty theft. Visitors to Paris are often warned to be cautious of their belongings and to keep a close eye on their luggage.
In light of this, it is no surprise that many tourists and travelers flock to Paris with a sense of skepticism and even fear. But what if you could experience the beauty and charm of Paris without the worries of petty theft and crime? There are several ways to do this

Prompt: The future of AI is
Generated text:  here: Robot chef cooks up a storm in the kitchen
An AI-powered robot chef is cooking up a storm in a Chinese restaurant, marking the beginning of a new era in culinary innovation.
The robot, named Moley, uses artificial intelligence to prepare and cook complex dishes with ease, including intricate sauces and elaborate desserts. It can even learn and adapt to new recipes, making it an invaluable asset to any kitchen.
Moley's AI system is powered by a combination of computer vision, machine learning, and natural language processing, allowing it to analyze and understand the nuances of cooking. The robot can recognize ingredients, detect cooking temperatures and times,

流式异步生成#

[5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text:  Victoria. I am a senior at Oklahoma Christian University, majoring in early childhood education. This is my first year as a student teacher and I am so excited to be working with you and your child this year! I am looking forward to getting to know you and your child and to watching them grow and learn over the next few months.
As a student teacher, I will be working closely with Mrs. [Teacher's Name] to plan and implement lessons that will meet the needs of your child. I will also be assisting with classroom management, grading, and other tasks that will help make our classroom run smoothly.
If you have any questions

Prompt: The capital of France is
Generated text:  getting a major makeover
Paris is getting a massive renovation that will transform the city's infrastructure, public spaces, and architecture. The city's mayor, Anne Hidalgo, has announced a €20 billion investment plan to revamp the city, which will include the construction of new metro lines, the renovation of historic buildings, and the creation of new public spaces. The plan is expected to create thousands of jobs and boost the city's economy.
One of the most significant projects is the creation of a new metro line, which will connect the city center to the suburbs. The line will be built using innovative and sustainable technologies, and will feature

Prompt: The future of AI is
Generated text:  being shaped by innovations in various fields, including computer vision, natural language processing, and machine learning. One of the most exciting areas of research is in the development of multimodal AI, which enables machines to process and understand multiple sources of data, such as text, images, and audio, simultaneously. In this article, we'll explore the advancements in multimodal AI and their potential applications.
What is Multimodal AI?
Multimodal AI refers to the ability of machines to process and integrate multiple modalities of data, such as text, images, audio, and video. This involves developing AI models that can recognize and understand patterns across
[6]:
llm.shutdown()