Kubernetes#
当在您的Kubernetes集群外部运行时,SkyPilot使用您本地的~/.kube/config文件进行身份验证并在您的Kubernetes集群上创建资源。
当在您的Kubernetes集群内运行时(例如,作为Spot控制器或Serve控制器),SkyPilot可以使用以下三种认证方法中的任何一种进行操作:
自动创建服务账户:SkyPilot 可以自动创建服务账户和角色,以便在 Kubernetes 集群中管理资源。这是在集群内运行时的默认方法,不需要额外的配置。
有关授予服务账户的权限的详细信息,请参阅下面的SkyPilot所需的最低权限部分。
使用自定义服务账户:如果您有一个具有必要权限的自定义服务账户,您可以通过将此添加到您的~/.sky/config.yaml文件来配置SkyPilot使用它:
kubernetes: remote_identity: your-service-account-name
使用本地kubeconfig文件:在这种情况下,SkyPilot会将您本地的
~/.kube/config文件复制到控制器pod中,并使用它进行身份验证。要使用此方法,请在~/.sky/config.yaml文件中将remote_identity: LOCAL_CREDENTIALS设置为您的Kubernetes配置:kubernetes: remote_identity: LOCAL_CREDENTIALS
注意
如果您的集群在
~/.kube/config文件中使用基于exec的认证(例如,GKE默认使用exec认证),SkyPilot可能无法使用此方法进行认证。在这种情况下,请考虑使用以下的服务账户方法。
注意
基于服务账户的认证仅适用于远程SkyPilot集群(包括spot和serve控制器)在Kubernetes集群内部启动时。当在集群外部运行时(例如,在AWS上),SkyPilot将使用本地的~/.kube/config文件进行认证。
以下是SkyPilot所需的权限以及一个示例服务账户YAML,您可以使用它来创建具有必要权限的服务账户。
SkyPilot所需的最低权限#
SkyPilot 需要相当于以下角色的权限,以便能够管理 Kubernetes 集群中的资源:
# Namespaced role for the service account
# Required for creating pods, services and other necessary resources in the namespace.
# Note these permissions only apply in the namespace where SkyPilot is deployed, and the namespace can be changed below.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
---
# ClusterRole for accessing cluster-wide resources. Details for each resource below:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sky-sa-cluster-role # Can be changed if needed
namespace: default # Change to your namespace if using a different one.
labels:
parent: skypilot
rules:
- apiGroups: [""]
resources: ["nodes"] # Required for getting node resources.
verbs: ["get", "list", "watch"]
- apiGroups: ["node.k8s.io"]
resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
verbs: ["get", "list", "watch"]
提示
如果您使用的命名空间不是default,请确保更改上述清单中的命名空间。
这些角色必须同时适用于kubeconfig文件中配置的用户账户和SkyPilot使用的服务账户(如果已配置)。
如果您需要使用sky show-gpus查看实时GPU可用性,您的任务使用对象存储挂载或您的任务需要访问入口资源,您将需要授予如下所述的额外权限。
sky show-gpus的权限#
sky show-gpus 需要列出所有命名空间中的所有 pod 以计算 GPU 可用性。为此,SkyPilot 需要 get 和 list 权限来获取 ClusterRole 中的 pod:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: sky-sa-cluster-role-pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
提示
如果此角色未授予服务账户,sky show-gpus 仍然可以工作,但它只会显示节点上的总GPU数量,而不是空闲的GPU数量。
对象存储挂载的权限#
如果你的任务使用对象存储挂载(例如,S3、GCS等),SkyPilot 将需要运行一个 DaemonSet,将 FUSE 设备作为 Kubernetes 资源暴露给 SkyPilot 的 pod。
为了实现这一点,您还需要创建一个skypilot-system命名空间,该命名空间将运行DaemonSet并授予该命名空间中的服务帐户必要的权限。
# Required only if using object store mounting
# Create namespace for SkyPilot system
apiVersion: v1
kind: Namespace
metadata:
name: skypilot-system # Do not change this
labels:
parent: skypilot
---
# Role for the skypilot-system namespace to create FUSE device manager and
# any other system components required by SkyPilot.
# This role must be bound in the skypilot-system namespace to the service account used for SkyPilot.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skypilot-system-service-account-role # Can be changed if needed
namespace: skypilot-system # Do not change this namespace
labels:
parent: skypilot
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["*"]
使用Ingress的权限#
如果你的任务使用Ingress来暴露端口,你将需要授予ingress-nginx命名空间中的服务账户必要的权限。
# Required only if using ingresses
# Role for accessing ingress service IP
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: ingress-nginx # Do not change this
name: sky-sa-role-ingress-nginx # Can be changed if needed
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["list", "get"]
使用自定义服务账户的示例#
要创建一个具有SkyPilot所有必要权限(包括访问对象存储)的服务账户,您可以使用以下YAML。
提示
在这个例子中,服务账户名为 sky-sa,并在 default 命名空间中创建。
根据需要更改命名空间和服务账户名称。
1 # create-sky-sa.yaml
2 kind: ServiceAccount
3 apiVersion: v1
4 metadata:
5 name: sky-sa # Change to your service account name
6 namespace: default # Change to your namespace if using a different one.
7 labels:
8 parent: skypilot
9 ---
10 # Role for the service account
11 kind: Role
12 apiVersion: rbac.authorization.k8s.io/v1
13 metadata:
14 name: sky-sa-role # Can be changed if needed
15 namespace: default # Change to your namespace if using a different one.
16 labels:
17 parent: skypilot
18 rules:
19 - apiGroups: ["*"] # Required for creating pods, services, secrets and other necessary resources in the namespace.
20 resources: ["*"]
21 verbs: ["*"]
22 ---
23 # RoleBinding for the service account
24 kind: RoleBinding
25 apiVersion: rbac.authorization.k8s.io/v1
26 metadata:
27 name: sky-sa-rb # Can be changed if needed
28 namespace: default # Change to your namespace if using a different one.
29 labels:
30 parent: skypilot
31 subjects:
32 - kind: ServiceAccount
33 name: sky-sa # Change to your service account name
34 roleRef:
35 kind: Role
36 name: sky-sa-role # Use the same name as the role at line 14
37 apiGroup: rbac.authorization.k8s.io
38 ---
39 # ClusterRole for the service account
40 kind: ClusterRole
41 apiVersion: rbac.authorization.k8s.io/v1
42 metadata:
43 name: sky-sa-cluster-role # Can be changed if needed
44 namespace: default # Change to your namespace if using a different one.
45 labels:
46 parent: skypilot
47 rules:
48 - apiGroups: [""]
49 resources: ["nodes"] # Required for getting node resources.
50 verbs: ["get", "list", "watch"]
51 - apiGroups: ["node.k8s.io"]
52 resources: ["runtimeclasses"] # Required for autodetecting the runtime class of the nodes.
53 verbs: ["get", "list", "watch"]
54 - apiGroups: ["networking.k8s.io"] # Required for exposing services through ingresses
55 resources: ["ingressclasses"]
56 verbs: ["get", "list", "watch"]
57 - apiGroups: [""] # Required for `sky show-gpus` command
58 resources: ["pods"]
59 verbs: ["get", "list"]
60 ---
61 # ClusterRoleBinding for the service account
62 apiVersion: rbac.authorization.k8s.io/v1
63 kind: ClusterRoleBinding
64 metadata:
65 name: sky-sa-cluster-role-binding # Can be changed if needed
66 namespace: default # Change to your namespace if using a different one.
67 labels:
68 parent: skypilot
69 subjects:
70 - kind: ServiceAccount
71 name: sky-sa # Change to your service account name
72 namespace: default # Change to your namespace if using a different one.
73 roleRef:
74 kind: ClusterRole
75 name: sky-sa-cluster-role # Use the same name as the cluster role at line 43
76 apiGroup: rbac.authorization.k8s.io
77 ---
78 # Optional: If using object store mounting, create the skypilot-system namespace
79 apiVersion: v1
80 kind: Namespace
81 metadata:
82 name: skypilot-system # Do not change this
83 labels:
84 parent: skypilot
85 ---
86 # Optional: If using object store mounting, create role in the skypilot-system
87 # namespace to create FUSE device manager.
88 kind: Role
89 apiVersion: rbac.authorization.k8s.io/v1
90 metadata:
91 name: skypilot-system-service-account-role # Can be changed if needed
92 namespace: skypilot-system # Do not change this namespace
93 labels:
94 parent: skypilot
95 rules:
96 - apiGroups: ["*"]
97 resources: ["*"]
98 verbs: ["*"]
99 ---
100 # Optional: If using object store mounting, create rolebinding in the skypilot-system
101 # namespace to create FUSE device manager.
102 apiVersion: rbac.authorization.k8s.io/v1
103 kind: RoleBinding
104 metadata:
105 name: sky-sa-skypilot-system-role-binding
106 namespace: skypilot-system # Do not change this namespace
107 labels:
108 parent: skypilot
109 subjects:
110 - kind: ServiceAccount
111 name: sky-sa # Change to your service account name
112 namespace: default # Change this to the namespace where the service account is created
113 roleRef:
114 kind: Role
115 name: skypilot-system-service-account-role # Use the same name as the role at line 88
116 apiGroup: rbac.authorization.k8s.io
117 ---
118 # Optional: Role for accessing ingress resources
119 apiVersion: rbac.authorization.k8s.io/v1
120 kind: Role
121 metadata:
122 name: sky-sa-role-ingress-nginx # Can be changed if needed
123 namespace: ingress-nginx # Do not change this namespace
124 labels:
125 parent: skypilot
126 rules:
127 - apiGroups: [""]
128 resources: ["services"]
129 verbs: ["list", "get", "watch"]
130 - apiGroups: ["rbac.authorization.k8s.io"]
131 resources: ["roles", "rolebindings"]
132 verbs: ["list", "get", "watch"]
133 ---
134 # Optional: RoleBinding for accessing ingress resources
135 apiVersion: rbac.authorization.k8s.io/v1
136 kind: RoleBinding
137 metadata:
138 name: sky-sa-rolebinding-ingress-nginx # Can be changed if needed
139 namespace: ingress-nginx # Do not change this namespace
140 labels:
141 parent: skypilot
142 subjects:
143 - kind: ServiceAccount
144 name: sky-sa # Change to your service account name
145 namespace: default # Change this to the namespace where the service account is created
146 roleRef:
147 kind: Role
148 name: sky-sa-role-ingress-nginx # Use the same name as the role at line 119
149 apiGroup: rbac.authorization.k8s.io
使用以下命令创建服务账户:
$ kubectl apply -f create-sky-sa.yaml
创建服务账户后,集群管理员可以向需要访问集群的用户分发带有sky-sa服务账户的kubeconfigs。
用户还应配置SkyPilot以通过~/.sky/config.yaml使用sky-sa服务账户:
# ~/.sky/config.yaml
kubernetes:
remote_identity: sky-sa # Or your service account name