5. Storage for Kubernetes Infrastructure Services

Trident focuses on providing persistence to Kubernetes applications, but before you can host those applications, you need to run a healthy, protected Kubernetes cluster. Those clusters are made up of a number of services with their own persistence requirements that need to be considered.

Node-local container storage, a.k.a. graph driver storage

One of the often overlooked components of a Kubernetes deployment is the storage which the container instances consume on the Kubernetes cluster nodes, usually referred to as Graph Driver Storage. When a container is instantiated on a node it consumes capacity and IOPS to do many of its operations, as only data which is read from or written to a persistent volume is offloaded to the external storage. If the Kubernetes nodes are expected to host dozens, hundreds, or more containers this may be a significant amount of temporary capacity and IOPS which are expected of the node-local storage.

Even if you don’t have a requirement to keep the data, the containers still need enough performance and capacity to execute their application. The Kubernetes administrator and storage administrator should work closely together to determine the requirements for graph storage and ensure adequate performance and capacity is available.

The typical method of augmenting the graph driver storage is to use a block device mounted at the location where the container instance storage is located, e.g. /var/lib/docker. The host operating systems being used to underpin the Kubernetes deployment will each have different methods for how to replace the graph storage with something more robust than a simple directory on the node. Refer to the documentation from Red Hat, Ubuntu, SUSE, etc. for those instructions.

Note

Block protocol is specifically recommended for graph storage due to the nature of how the graph drivers work. In particular, they create thin clones, using a variety of methods depending on the driver, of the container image. NFS does not support this functionality and results in a full copy of the container image file system for each instance, resulting in significant performance and capacity implications.

If the Kubernetes nodes are virtualized, this could also be addressed by ensuring that the datastore they reside on meets the performance and capacity needs, however, the flexibility of having a separate device, even an additional virtual disk, should be carefully considered. Using a separate device gives the ability to independently control capacity, performance, and data protection to tailor the policies according to needs. Often the capacity and performance needed for graph storage can fluctuate dramatically, however, data protection is not necessary.

5.1. Persistent storage for cluster services

There are several critical services hosted on the kubernetes master nodes and other cluster critical services which may be hosted on other node types.

Using capacity provisioned from enterprise storage adds several benefits for each service as delineated below:

  • Performance – Leveraging enterprise storage means being able to provide latency in-line with application needs and controlling performance using QoS policies. This can be used to limit performance for bullies or ensure performance for applications as needed.
  • Resiliency – In the event of a node failure, being able to quickly recover the data and present it to a replacement ensures that the application is minimally affected.
  • Data protection – Putting application data onto dedicated volumes allows the data to have its own snapshot, replication, and retention policies.
  • Data security – Ensuring that data is encrypted and protected all the way to the disk level is important for meeting many compliance requirements.
  • Scale – Using enterprise storage enables deploying fewer instances, with the instance count depending on compute resources, rather than having to increase the total count for resiliency and performance purposes. Data which is protected automatically, and provides adequate performance, means that horizontal scale-out doesn’t need to compensate for limited performance.

There are some best practices which apply across all of the cluster services, or any application, which should be addressed as well.

  • Determining the amount of acceptable data loss, i.e. the Recovery Point Objective (RPO), is a critical part of deploying a production level system. Having an understanding of this will greatly simplify the decisions around other items described in this section.
  • Cluster service volumes should have a snapshot policy which enables the rapid recovery of data according to your requirements, as defined by the RPO. This enables quick and easy restoration of a service to a point in time by reverting the volume or the ability to recover individual files if needed.
  • Replication can provide both backup and disaster recovery capabilities, each service should be evaluated and have the recovery plan considered carefully. Using storage replication may provide rapid recovery with a higher RPO, or can provide a starting baseline which expedites restore operations without having to recover all data from other locations.

5.1.1. etcd considerations

You should carefully consider the high availability and data protection policy for the etcd instance used by the Kubernetes master(s). This service is arguably the most critical to the overall functionality, recoverability, and serviceability of the cluster as a whole, so it’s imperative that its deployment meets your goals.

The most common method of providing high availability and resiliency to the etcd instance is to horizontally scale the application by having multiple instances across multiple nodes. A minimum of three nodes is recommended.

Kubernetes etcd will, by default, use local storage for its persistence. This holds true whether etcd is co-located with other master services or is hosted on dedicated nodes. To use enterprise storage for etcd a volume must be provisioned and mounted to the node at the correct location prior to deployment. This storage cannot be dynamically provisioned by Trident or any other storage class provisioner as it is needed prior to the Kubernetes cluster being operational.

Refer to your Kubernetes provider’s documentation on where to mount the volume and/or customize the etcd configuration to use non-default storage.

5.1.2. logging considerations

Centralized logging for applications deployed to the Kubernetes cluster is an optional service. Using enterprise storage for logging has the same benefits as with etcd: performance, resiliency, protection, security, and scale.

Depending on how and when the service is deployed, the storage class may define how to dynamically provision storage for the service. Refer to your vendor’s documentation on how to customize the storage for logging services.

5.1.3. metrics considerations

Many third-party metrics and analytics services are available for monitoring and reporting the status and health of every aspect of the cluster and application. Many of these tools require persistent storage with specific performance characteristics for it to function as intended.

Architecturally, many of these function similarly. An agent exists on each node which forwards that node’s data to a centralized collector. The collector aggregates, analyzes, and displays the data. Similar to the logging service, using enterprise storage allows the aggregation service to move across nodes in the cluster seamlessly and ensures the data is protected in the event of node failure.

Each vendor has different recommendations and deployment methodology. Work with your vendor to identify requirements and, if needed, provision storage from an enterprise array to meet the requirements. This document will discuss the Red Hat OpenShift metrics service in a later chapter.

5.1.4. registry considerations

The users and applications interact most often with the registry service. Slow access to the registry service can have a dramatic effect on the Kubernetes cluster’s perceived performance as a whole. A slow image push or pull operation can result in lengthy times for tasks which can directly affect the developer and application.

Fortunately, the registry is flexible with regard to storage protocol. Keep in mind different protocols have different implications.

  • Object storage is the default recommendation and is the simplest to use for Kubernetes deployments which expect to have a significant scale or where the images need to be accessed across geographic regions.
  • NFS is a good choice for many deployments as it allows a single repository for the container images while allowing many registry endpoints to front the capacity.
  • Block protocols, such as iSCSI, can be used for registry storage, but they introduce a single point of failure. The block device can only be attached to a single registry node due to the single-writer limitation of the supported filesystems.

Each organization and each application has different priorities for protecting the images stored in the registry. Generally, registry images are either cached from upstream registries or have images pushed to them during the application build process. The Recovery Time Objective (RTO) influences the chosen protection scheme because it affects the recovery process. If RTO is not an issue, then the applications may be able to simply rebuild the container images and push them into a new instance of the registry. If faster RTO is desired, then use a replication policy which adheres to the desired recovery goal.

5.2. Design choices and guidelines when using ONTAP

A few design and implementation considerations should be addressed prior to deployment using Trident and ONTAP.

5.2.1. Storage Virtual Machines

Storage Virtual Machines (SVMs) are used for administrative delegation within ONTAP. By creating different SVMs, the storage administrator can isolate a particular user, group, or application to only specific resources. When Trident accesses the storage system via an SVM, it is prevented from doing many system level management tasks, which provides additional isolation of capabilities for storage provisioning and management tasks.

Trident can leverage SVMs with several different methods, each is explained below. Multiple Trident deployments, i.e. multiple Kubernetes clusters, does not change the below methods. When an SVM is shared with multiple Trident instances they simply need distinct prefixes defined in the backend configuration files.

SVM shared with non Trident-managed workloads

This configuration uses a single, or small number of, SVMs to host all of the workloads on the cluster and results in the containerized applications being hosted by the same SVM as other non-containerized workloads. The shared SVM model is common in organizations where there exists multiple network segments which are isolated and adding additional IP addresses is difficult or impossible.

There is nothing inherently wrong with this configuration, however, it is more challenging to apply policies which affect only the container workloads.

Dedicated SVM for Trident-managed workloads

By default, NetApp recommends creating an SVM used solely by Trident for provisioning and deprovisioning volumes. This enables the storage administrator to put controls in place to limit the amount of resources which Trident is able to consume.

As mentioned above, having multiple Kubernetes clusters connect to and consume storage from the same SVM is acceptable, the only change to the Trident configuration should be to provide a different prefix.

When creating different backends which connect to the same underlying SVM resources but have different features applied (e.g. snapshot policies) use different prefixes. The unique prefixes help identify the volumes to the storage administrator and ensure no confusion results.

Multiple SVMs dedicated to Trident-managed workloads

You may consider using multiple SVMs with Trident for many different reasons, including isolating applications and resource domains, strict control over resources, and to facilitate multitenancy. It’s also worth considering using at least two SVMs with any Kubernetes cluster to isolate persistent storage for cluster services from application storage.

Using multiple SVMs with one dedicated to cluster services isolates and controls the workload. Since the persistent storage resources needed by the Kubernets cluster must exist prior to Trident deployment, the Kubernetes cluster services SVM will not have dynamic provisioning in the same manner as the application SVM will have.

5.2.2. Kubernetes cluster services

Even for cluster services persistent volumes created by Trident, there should be serious consideration given to using per-volume QoS policies, including QoS minimums when possible, and customizing the volume options for the application. Below are the default recommendations for the cluster services, however you should evaluate your needs and adjust policies according to your data protection, performance, and availability requirements. Despite these recommendations, you will still want to evaluate and determine what works best for your Kubernetes cluster and applications.

etcd

  • The default snapshot policy is often adequate for protecting against data corruption and loss, however, snapshots are not a backup strategy. Some consideration should be given to increasing the frequency, and decreasing the retention period for etcd volumes. For example, keeping 24 hourly snapshots or 48 snapshots taken every 30 minutes, but not retaining them for more than one or two days. Since any data loss for etcd can be problematic, having more frequent snapshots makes this scenario easier to recover from.
  • If the disaster recovery plan is to recover the Kubernetes cluster as-is at the destination site, then these volumes should be replicated with SnapMirror or SnapVault.
  • etcd does not have significant IOPS or throughput requirements, however, latency can play a critical role in the responsiveness of the Kubernetes API server. Whenever possible use the lowest latency storage or set your storage to the lowest latency possible.
  • A QoS policy should be leveraged to provide a minimum amount of IOPS to the etcd volume(s). The minimum value will depend on the number of nodes and pods which are deployed to your Kubernetes cluster. Monitoring should be used to verify that the configured policy is adequate and adjusted over time as the Kubernetes cluster expands.
  • The etcd volumes should have their export policy or iGroup limited to only the nodes which are hosting, or could potentially host, etcd instances.

logging

  • Volumes which are providing storage capacity for aggregated logging services need to be protected, however, an average RPO is adequate in many instances since logging data is often not critical to recovery. If your application has strict compliance requirements, this may be different however.
  • Using the default snapshot policy is generally adequate. Optionally, depending on the needs of the administrators, reducing the snapshot policy to one which keeps as few as seven daily snapshots may be acceptable.
  • Logging volumes should be replicated to protect the historical data for use by the application and by administrators, however, recovery may be deprioritized for other services.
  • Logging services have widely variable IOPS requirements and read/write patterns. It’s important to consider the number of nodes, pods, and other objects in the cluster. Each of these will generate data which needs to be stored, indexed, analyzed, and presented, so a larger cluster may have substantially more data than expected.
  • A QoS policy may be leveraged to provide both a minimum and maximum amount of throughput available. Note that the maximum may need to be adjusted as additional pods are deployed, so close monitoring of the performance should be used to verify that logging is not being adversely affected by storage performance.
  • The volumes export policy or iGroup should be limited to nodes which host the logging service. This will depend on the particular solution used and the chosen configuration. For example, OpenShift’s logging service is deployed to the infrastructure nodes.

metrics

  • Kubernetes autoscale feature relies on metrics to provide data for when scale operations need to occur. Also, metrics data often plays a critical role in show-back and charge-back operations, so ensure that you are working to address the needs of the entire business with the RPO policy. Also, ensure that your RPO and RTO meet the needs of these functions.
  • As the number of cluster nodes and deployed pods increases, so too does the amount of data which is collected and retained by the metrics service. It’s important to understand the performance and capacity recommendations provided by the vendor for your metrics service as they can vary dramatically, particularly depending on the amount of time for which the data is retained and the number of metrics which are being monitored.
  • A QoS policy can be used to limit the number of IOPS or throughput which the metrics services uses, however it is generally not necessary to use a minimum policy.
  • It is recommended to limit the export policy or iGroup to the hosts which the metrics service is executed from. Note that it’s important to understand the architecture of your metrics provider. Many have agents which run on all hosts in the cluster, however, those will report metrics to a centralised repository for storage and reporting. Only that group of nodes needs access.

registry

  • Using a snapshot policy for the registry data may be valuable for recovering from data corruption or other issues, however it is not necessary. A basic snapshot policy is recommended, however, individual container images cannot be recovered (they are stored in a hashed manner), only a full volume revert can be used to recover data.
  • The workload for the registry can vary widely, however, the general rule is that push operations happen infrequently, while pull operations happen frequently. If a CI/CD pipeline process is used to build, test, and deploy the application(s) this may result in a predictable workload. Alternatively, and even with a CI/CD system in use, the workload can vary based on application scaling requirements, build requirements, and even Kubernetes node add/remove operations. Close monitoring of the workload should be implemented to adjust as necessary.
  • A QoS policy may be implemented to ensure that application instances are still able to pull and deploy new container images regardless of other workloads on the storage system. In the event of a disaster recovery scenario, the registry may have a heavy read workload while applications are instantiated on the destination site. The configured QoS minimum policy will prevent other disaster recovery operations from slowing application deployment.