Spot VMs | Kubernetes Engine Documentation | Google Cloud
source link: https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
This page provides an overview of support for Spot VMs in Google Kubernetes Engine (GKE).
Preview
This feature is covered by the Pre-GA Offerings Terms of the Google Cloud Terms of Service. Pre-GA features might have limited support, and changes to pre-GA features might not be compatible with other pre-GA versions. For more information, see the launch stage descriptions.
Overview
Spot VMs are Compute Engine virtual machine (VM) instances that are priced lower than on-demand Compute Engine VMs. Spot VMs offer the same machine types and options as on-demand VMs, but provide no availability guarantees.
Note: For better availability, use smaller machine types.You can use Spot VMs in your clusters and node pools to run stateless, batch, or fault-tolerant workloads that can tolerate disruptions caused by the ephemeral nature of Spot VMs.
Spot VMs remain available until Compute Engine requires the resources for on-demand VMs. To maximize your cost efficiency, combine using Spot VMs with Best practices for running cost-optimized Kubernetes applications on GKE.
To learn more about Spot VMs, see Spot VMs in the Compute Engine documentation.
Benefits
Spot VMs and preemptible VMs share many benefits, including the following:
- Lower pricing than on-demand Compute Engine VMs.
- Useful for stateless, fault-tolerant workloads that are resilient to the ephemeral nature of these VMs.
- Works with the cluster autoscaler and node auto-provisioning.
In contrast to preemptible VMs, which expire after 24 hours, Spot VMs have no expiration time. Spot VMs are only terminated when Compute Engine needs the resources elsewhere.
Note: GKE continues to support using preemptible VMs in your clusters and node pools. Preemptible VMs are generally available. However, Spot VMs are recommended and replace the need to use preemptible VMs.How Spot VMs work in GKE
When you create a cluster or node pool with Spot VMs, GKE creates underlying Compute Engine Spot VMs that behave like a managed instance group (MIG). Nodes that use Spot VMs behave like on-demand GKE nodes, but with no guarantee of availability. When the resources used by Spot VMs are required to run on-demand VMs, Compute Engine terminates those Spot VMs to use the resources elsewhere.
Termination and graceful shutdown of Spot VMs
When Compute Engine needs to reclaim the resources used by Spot VMs, a termination notice is sent to GKE. Spot VMs terminate 30 seconds after receiving a termination notice.
On clusters running GKE version 1.20 and later, the kubelet graceful node shutdown feature is enabled by default. The kubelet notices the termination notice and gracefully terminates Pods that are running on the node.
The kubelet grants non-system Pods 25 seconds to gracefully terminate, after
which system Pods (with the system-cluster-critical
or system-node-critical
priority classes) have five seconds to gracefully terminate.
terminationGracePeriodSeconds
to more than 25
in your Pod spec has no effect on the graceful termination of nodes that use
Spot VMs.During graceful Pod termination, the kubelet assigns a Failed
status and a
Shutdown
reason to the terminated Pods. When the number of terminated Pods
reaches a threshold, garbage collection
cleans up the Pods.
You can also delete shutdown Pods manually using the following command:
kubectl get pods --all-namespaces | grep -i shutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
Scheduling workloads on Spot VMs
GKE automatically adds the cloud.google.com/gke-spot=true
label
to nodes that use Spot VMs. You can schedule specific Pods on nodes that use
Spot VMs using the nodeSelector
field in your Pod spec, like in the following example:
apiVersion: v1
kind: Pod
spec:
nodeSelector:
cloud.google.com/gke-spot: "true"
Alternatively, you can use node affinity to tell GKE to schedule Pods on Spot VMs, similar to the following example:
apiVersion: v1
kind: Pod
spec:
...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values:
- true
...
You can also use nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution
to prefer that GKE places Pods on nodes that use Spot VMs.
Preferring Spot VMs is not recommended, because GKE might
schedule the Pods onto existing viable nodes that use on-demand VMs instead.
preferredDuringSchedulingIgnoredDuringExecution
when making autoscaling
decisions.Using taints and tolerations for scheduling
To avoid system disruptions, use a node taint to ensure that GKE doesn't schedule critical workloads onto Spot VMs. When you taint nodes that use Spot VMs, GKE only schedules Pods that have the corresponding toleration onto those nodes.
If you use node taints, ensure that your cluster also has at least one node pool that uses on-demand Compute Engine VMs. Node pools that use on-demand VMs provide a reliable place for GKE to schedule critical system components like DNS.
For information on using a node taint for Spot VMs, see Use taints and tolerations for Spot VMs.
Using Spot VMs with GPU node pools
Spot VMs support using GPUs.
When you create a new GPU node pool, GKE automatically adds the
nvidia.com/gpu=present:NoSchedule
taint to the new nodes. Only Pods with the
corresponding toleration can run on these nodes. GKE automatically
adds this toleration to Pods that request GPUs.
Your cluster must have at least one existing non-GPU node pool that uses
on-demand VMs before you create a GPU node pool that uses Spot VMs. If your
cluster only has a GPU node pool with Spot VMs, GKE doesn't add
the nvidia.com/gpu=present:NoSchedule
taint to those nodes. As a result, GKE
might schedule system workloads onto the GPU node pools with Spot VMs, which
can lead to disruptions because of the Spot VMs and can increase your
resource consumption because GPU nodes are more expensive than non-GPU nodes.
Cluster autoscaler and node auto-provisioning
You can use the cluster autoscaler and node auto-provisioning to automatically scale your clusters and node pools based on the demands of your workloads. Both the cluster autoscaler and node auto-provisioning support using Spot VMs.
Spot VMs and node auto-provisioning
Node auto-provisioning automatically creates and deletes node pools in your
cluster to meet the demands of your workloads. When node auto-provisioning
creates new node pools to accommodate Pods that require Spot VMs, GKE
automatically adds the cloud.google.com/gke-spot=true:NoSchedule
taint to nodes
in the new node pools. Only Pods with the corresponding toleration can run on
nodes in those node pools. You must add the corresponding toleration to your
deployments to allow GKE to place the Pods on Spot VMs.
You can ensure that GKE only schedules your Pods on Spot VMs
by using both a toleration and either a nodeSelector
or node affinity rule to
filter for Spot VMs.
You can also use only a toleration without filtering for Spot VMs using
nodeSelector
or a node affinity. In this case, GKE attempts to
schedule the Pods on Spot VMs. If there are no available Spot VMs but
there are existing on-demand VMs with capacity, GKE schedules the
Pods onto the on-demand VMs instead.
Spot VMs and cluster autoscaler
The cluster autoscaler automatically adds and removes nodes in your node pools based on demand. If your cluster has Pods that can't be placed on existing Spot VMs, the cluster autoscaler adds new nodes that use Spot VMs.
Modifications to Kubernetes behavior
Using Spot VMs on GKE modifies some guarantees and constraints that Kubernetes provides, such as the following:
On clusters running GKE versions prior to 1.20, the kubelet graceful node shutdown feature is disabled by default. GKE shuts down Spot VMs without a grace period for Pods, 30 seconds after receiving a preemption notice from Compute Engine.
Reclamation of Spot VMs is involuntary and is not covered by the guarantees of
PodDisruptionBudgets
. You might experience greater unavailability than your configuredPodDisruptionBudget
.
Best practices
When designing a system that uses Spot VMs, you can avoid major disruptions by using the following guidelines:
- Spot VMs have no availability guarantees. Design your systems under the assumption that GKE might reclaim any or all your Spot VMs at any time, with no guarantee of when new instances become available.
- There is no guarantee that Pods running on Spot VMs will shut down gracefully. GKE might not notice that the node was reclaimed until a few minutes after reclamation occurs, which delays the rescheduling of those Pods onto a new node.
- To ensure that your workloads and Jobs are processed even when no Spot VMs are available, ensure that your clusters have a mix of node pools that use Spot VMs and node pools that use on-demand Compute Engine VMs.
- Ensure that your cluster has at least one non-GPU node pool that uses on-demand VMs before you add a GPU node pool that uses Spot VMs.
- Use the Kubernetes on GCP Node Termination Event Handler on clusters running GKE versions prior to 1.20, where the kubelet graceful node shutdown feature is disabled. The handler gracefully terminates your Pods when Spot VMs are preempted.
- While the node names do not usually change when nodes are recreated, the internal and external IP addresses used by Spot VMs might change after recreation.
- Use node taints and tolerations to ensure that critical Pods aren't scheduled onto node pools that use Spot VMs.
- Do not use stateful Pods with Spot VMs. StatefulSets inherently have at-most-one Pod per index semantics, which preemption of Spot VMs could violate, leading to data loss.
- Follow the Kubernetes Pod termination best practices.
What's next
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK