更新服务#

SkyServe 支持 更新 已部署的服务,可用于更改:

  • 副本代码(例如,run/setup;用于调试)

  • resources中的副本资源规格(例如,加速器或实例类型)

  • 服务规范在 service 中(例如,副本数量或自动扩展规范)

在更新期间,服务将保持可访问性,没有停机时间,其端点将保持不变。默认情况下,应用滚动更新,同时您也可以指定蓝绿更新

滚动更新#

要更新现有服务,请使用 sky serve update

$ sky serve update service-name new_service.yaml

SkyServe 将启动由 new_service.yaml 描述的新副本,其行为如下:

  • 更新已启动,流量将继续重定向到现有的(旧的)副本。

  • 新的副本(带有新设置)在后台启动。

  • 每当旧副本和新副本的总数超过预期副本数(基于自动缩放器的决定),多余的旧副本将被缩减。

  • 流量将被重定向到旧的和新的副本,直到所有新的副本准备就绪。

提示

当仅更新service字段且未在服务任务中指定workdirfile_mounts时,SkyServe将通过应用新的服务规范并提升其版本来重用旧副本(请参阅sky serve status以获取版本信息)。这将显著减少更新服务的时间,并避免潜在的配额问题。

示例#

我们首先启动一个简单的HTTP服务

$ sky serve up examples/serve/http_server/task.yaml -n http-server

我们可以使用 sky serve status http-server 来检查服务的状态:

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        1m 41s  READY   2/2       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED    RESOURCES       STATUS  REGION
http-server   1   1        54.173.203.169  2 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   2   1        52.87.241.103   2 mins ago  1x AWS(vCPU=2)  READY   us-east-1

服务 http-server 的初始版本为1。

假设我们想要将服务的副本数从2个更新为3个。我们可以通过修改任务yaml文件examples/serve/http_server/task.yaml中的replicas字段来实现:

# examples/serve/http_server/task.yaml
service:
  readiness_probe:
    path: /health
    initial_delay_seconds: 20
  replicas: 3

resources:
  ports: 8081
  cpus: 2+

workdir: examples/serve/http_server

run: python3 server.py

然后我们可以使用 sky serve update 来更新服务:

$ sky serve update http-server examples/serve/http_server/task.yaml

SkyServe 将触发启动三个新的副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        6m 15s  READY   2/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS        REGION
http-server   1   1        54.173.203.169  6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   2   1        52.87.241.103   6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   3   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   4   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   5   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1

每当一个新的副本准备就绪时,流量将被重定向到旧的和新的副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1,2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   4   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1

一旦新旧副本的总数超过请求的数量,旧副本将被缩减。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1,2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   4   2        18.206.226.82   1 min ago    1x AWS(vCPU=2)  READY          us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=2)  PROVISIONING   us-east-1

最终,我们将只有新的副本准备好为用户请求提供服务。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME   STATUS  REPLICAS  ENDPOINT
http-server  2        11m 42s  READY   3/3       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP             LAUNCHED    RESOURCES       STATUS  REGION
http-server   3   2        3.93.241.163   3 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   4   2        18.206.226.82  3 mins ago  1x AWS(vCPU=2)  READY   us-east-1
http-server   5   2        3.26.232.31    1 min ago   1x AWS(vCPU=2)  READY   us-east-1

蓝绿更新#

SkyServe 还支持通过以下命令进行蓝绿更新:

$ sky serve update --mode blue_green service-name new_service.yaml

在此更新模式下,SkyServe 将启动由 new_service.yaml 描述的新副本,其行为如下:

  • 更新已启动,流量将继续重定向到现有的(旧的)副本。

  • 新的副本(带有新设置)在后台启动。

  • 只有当所有新副本准备就绪时,流量才会被重定向到新副本。

  • 在所有新副本准备就绪后,旧副本将被缩减。

在更新期间,流量完全由旧版本或新版本的副本提供服务。sky serve status 显示最新的服务版本和每个副本的版本。

示例#

我们使用相同的服务 http-server 作为示例。然后我们可以使用 sky serve update --mode blue_green 来更新服务:

$ sky serve update http-server --mode blue_green examples/serve/http_server/task.yaml

SkyServe 将触发启动三个新副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        6m 15s  READY   2/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS        REGION
http-server   1   1        54.173.203.169  6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   2   1        52.87.241.103   6 mins ago   1x AWS(vCPU=2)  READY         us-east-1
http-server   3   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   4   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1
http-server   5   2        -               21 secs ago  1x AWS(vCPU=2)  PROVISIONING  us-east-1

当新副本准备就绪时,流量仍将被重定向到旧副本。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  1        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  READY          us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   4   2        -               1 min ago    1x AWS(vCPU=4)  PROVISIONING   us-east-1
http-server   5   2        -               1 min ago    1x AWS(vCPU=4)  PROVISIONING   us-east-1

一旦新副本的总数满足要求,流量将被重定向到新副本,旧副本将被缩减。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME  STATUS  REPLICAS  ENDPOINT
http-server  2        10m 4s  READY   3/5       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP              LAUNCHED     RESOURCES       STATUS         REGION
http-server   1   1        54.173.203.169  10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   2   1        52.87.241.103   10 mins ago  1x AWS(vCPU=2)  SHUTTING_DOWN  us-east-1
http-server   3   2        3.93.241.163    1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   4   2        18.206.226.82   1 min ago    1x AWS(vCPU=4)  READY          us-east-1
http-server   5   2        3.26.232.31     1 min ago    1x AWS(vCPU=4)  READY          us-east-1

最终,与滚动更新相同,我们将只有新的副本准备好为用户请求提供服务。

$ sky serve status http-server

Services
NAME         VERSION  UPTIME   STATUS  REPLICAS  ENDPOINT
http-server  2        11m 42s  READY   3/3       44.206.240.249:30002

Service Replicas
SERVICE_NAME  ID  VERSION  IP             LAUNCHED    RESOURCES       STATUS  REGION
http-server   3   2        3.93.241.163   3 mins ago  1x AWS(vCPU=4)  READY   us-east-1
http-server   4   2        18.206.226.82  3 mins ago  1x AWS(vCPU=4)  READY   us-east-1
http-server   5   2        3.26.232.31    1 min ago   1x AWS(vCPU=4)  READY   us-east-1