All Roads lead to Rome: 5 ways of defining PersistentVolumes in Kubernetes with Helm
Everyone who works with docker uses volume mounts. Everyone working with kubernetes as orchestrator works with Persistent Volumes (and probably Persistent Volume Templates), affectionately referred to as ‘pv’ and ‘pvc’.
Controlling some of the complexity that is kubernetes environments is regularly done by using something like Helm.
If all the previous statements make sense to you, and you are interested in running CI/CD in a (semi-)automated fashion, definitely read on.
This blog scratches the surface of the impact of how the volumes have been defined in your helm charts on upgrades, deletes, installs (your CI/CD pipeline).
Pack your bags
What are we going to do ? We’ll demonstrate and compare just a couple of ways in which a volume can be set up for a kubernetes cluster (we’re using AKS in our example), and what their - sometimes counterintuitive - impact is.
As said, this example runs in Azure, using the Azure Kubernetes Service (AKS). To set things up, we have bundled the 5 examples we want to demonstrate in a single helm chart.
For 2 of the examples we use pre-existing disks. Nothing special, just go into the Resource Group that your AKS has created (others work as well, as long as permissions are correct and you use the correct URI in your yaml), and create some disks:
The 5 roads
pod with direct volume mount
pod with persistent volume claim that refers to a persistent volume that uses a predefined disk
pod with persistent volume claim template (+ implicit pvc) together with defined persistent volume that uses a predefined disk
pod with persistent volume claim that refers to a persistent volume
pod with persistent volume claim template (+ implicit pvc) together with no predefined persistent volume
The templates directory contains these yaml files:
There’s some other stuff running in our ‘vanilla’ setup, so I create a nice empty namespace ‘volumestest' and install the chart.
The ‘before’ picture:
The namespace is empty, but PersistentVolumes (pv) are not in any namespace, so one pv is shown (our api manager volume).
> helm install . --namespace volumetest --name volumes
After a while this should all be running fine.
Two things worth remarking:
Azure disk provisioning is very slow (in relation to aws and gcp), so it may take a while. 30 or 40 seconds is no exception.
Azure sets a hard limit on how many disks you can attach per core. This limit is 2 per core, so if you’re running a 3 node cluster with 4 cores for each VM, you can attach 3x4x2=24 disks. As the VM already needs an OS disk this means 24-3= 21 disks.
Spot the difference
Example 1 is a bit different, as a direct mount (without pv or pvc) creates an ‘emptydir’, which doesn’t show any traces in the list of pv nor pvc. To check you can get a description from the pod:
The other 4 examples look pretty much the same: there is a pod which uses a claim to get a persistent volume. In other words, 4 pods, 4pvcs, 4pvs.
So, 4 examples that are totally equivalent, right ? wrong.
Running production grade clusters involves CI/CD, and with it comes a lot of automation, scripting, redeployments, … . Through terraform, ansible, or whatever tool tickles your fancy, chances are you’ll be invoking helm deletes along the way.
Care to wager a guess what will happen ?
> helm delete volumes
A nice clean slate it ain’t, as there is considerable fallout after the breakdown:
Let’s go through them:
example 1: has a nice delete, nothing remains (there was only a pod to begin with)
example 2: no visible trace (the predefined disk still exists in the background, not attached to anything. installing this chart again will connect back to the same disk. Note that because of the very slow decomissioning and provisioning of these resources our automation often fails because it does a delete and reinstall in a row, and although the delete has passed, azure is not ready yet with the unattaching, so you get an error if you try to reattach. Waiting between those steps, or just rerunning solves it).
example 3: The claim has not been deleted, and the volume will hang in ‘terminating’ indefinitely. A pv cannot be deleted as long as there’s an active claim on it (pvc). It will remain in terminating, and in the logs you can see that it’s waiting for the claim to stop … claiming. The claim is not deleted, because we used a persistent volume claim template. Helm does not consider the derived resources (such as the claim(s) made from the template) as a part of the release! So, a delete will not get rid of this (not even a ‘--purge’ delete).
example 4: no template was used, so the explicitly defined pod, pvc and pv are cleaned up correctly. No traces.
example 5: template used again, so the pvc wasn’t deleted. As the persistent volume itself was not explicitly defined, the derived pv again is not considered a part of the release by Helm, so no termination is requested. redeployments (reinstalling the chart) will use the same claim and same volume. Don’t be surprised to have performed a “helm delete --purge” only to see the same data (and bugs ?) show up. This can be quite confusing if you don’t have a handle on the inner workings of helm and kubernetes (luckily, now you have).
Zooming in on this one element of resource declaration in kubernetes shows that your entire ‘robust’ CI/CD setup can crumble by not investing sufficient time to get to know the tools you’re working with. In our experience the plethora of tools out there is definitely helpful to get off to a flying start, but once you go to production grade deployments you still need to pay your dues and invest the time (or get some help from people who spend all their time doing this, like us at kuori -- plug alert --).
To leave the people who can’t get enough with some extras, here a small quizz. Show us in the comments how much of a Boss you are, or beg us to write blogs about the answers if you’re scratching your head on these.
For reasons of clarity, we didn’t apply one of the most obvious and easy means to avoid catastrophes with deletes of volumes in production. What is it ?
What other type(s) of resource are notoriously problematic for Helm to get right ?
Which of the examples would not have persistence that survives a pod crash (real crash, no helm delete) ?
Getting more serious about the setup of your various volumes, besides what we’ve discussed here, there is another kubernetes resource that is logically grouped with pv,pvc and pvc-templates. Which one is it, and how can it help ? (Hint: check things in the screenshots that weren’t discussed)
First one to get everything right in the comment wins a magnificent prize; an uber-kudo and the eternal respect from the entire kuori team.