Upgrading Ceph and OKD (OpenShift Origin) with TripleO

In OpenStack’s Rocky release, TripleO is transitioning towards a method of deployment we call config-download. Basically, instead of using Heat to deploy the overcloud end-to-end, we’ll be using Heat only to manage the hardware resources and Ansible tasks for individual composable services. Execution of software configuration management (which is Ansible on the top level) will no longer go through Heat, it will be done directly. If you want to know details, i recommend watching James Slagle’s TripleO Deep Dive about config-download.

Transition towards config-download affects also services/components which we deploy by embedding external installers, like Ceph or OKD (aka OpenShift Origin). E.g. previously we’ve deployed Ceph via a Heat resource, which created a Mistral workflow, which executed ceph-ansible. This is no longer possible with config-download, so we had to adapt the solution for external installers.

Deployment architecture

Before talking about upgrades, it is important to understand how we deploy services with external installers when using config-download.

Deployment using external installers with config-download has been developed during OpenStack’s Queens release cycle for the purpose of installing Kubernetes and OpenShift Origin. In Rocky release, installation of Ceph and Skydive services transitioned to using the same method (shout out to Giulio Fidente and Sylvain Afchain who ported those services to the new method).

The general solution is described in my earlier Kubernetes in TripleO blog post. I recommend being somewhat familiar with that before reading on.

Upgrades architecture

In OpenStack, and by extension in TripleO, we distinguish between minor updates and major upgrades, but with external installers the distinction is sometimes blurred. The solution described here was applied to both updates and upgrades. We still make a distinction between updates and upgrades with external installers in TripleO (e.g. by having two different CLI commands), but the architecture is the same for both. I will only mention upgrades in the text below for the sake of brevity, but everything described applies for updates too.

It was more or less given that we would use Ansible tasks for upgrades with external installers, same as we already use Ansible tasks for their deployment. However, we had two possible approaches suggest themselves. Option A was to execute service’s upgrade tasks and then immediately its deploy tasks, favoring service upgrade procedure which reuses a significant part of that service’s deployment procedure. Option B was to execute only upgrade tasks, giving more separation between the deployment and upgrade procedures, at the risk of producing repetitive code in the service templates.

We went with option A (upgrade procedure includes re-execution of deploy tasks). The upgrade tasks in this architecture are mainly meant to set variables which then affect what the deploy tasks do (e.g. select a different Ansible playbook to run). Note that with this solution, it is still possible to fully skip the deploy tasks if needed (using variables and when conditions), but it optimizes for maximum reuse between upgrade and deployment procedures.

Upgrades with external installers

Implementation for Ceph and OKD

With the focus on reuse of deploy tasks, and both ceph-ansible and openshift-ansible being suitable for such approach, implementing upgrades via the architecture described above didn’t require much code.

Feel free to skim through the Ceph upgrade and OKD upgrade patches to get an idea of how the upgrades were implemented.

CLI and workflow

In CLI, the external installer upgrades got a new command openstack overcloud external-upgrade run. (For minor version updates it is openstack overcloud external-update run, service template authors may decide if they want to distinguish between updates and upgrades, or if they want to run the same code.)

The command is a part of the normal upgrade workflow, and should be run between openstack overcloud upgrade prepare and openstack overcloud upgrade converge. It is recommended to execute it after openstack overcloud upgrade run, which corresponds to the place within upgrade workflow where we have been upgrading Ceph.

After introducing the new external-upgrade run command we have removed ceph-upgrade run command. This means that Ceph is no longer a special citizen in the TripleO upgrade procedure, and uses generic commands and hooks available to any other service.

Separate execution of external installers

There might be more services utilizing external installers within a single TripleO-managed environment, and the operator might wish to upgrade them separately. openstack overcloud external-upgrade run would upgrade all of them at the same time.

We started adding Ansible tags to the external upgrade and deploy tasks, allowing us to select which installers we want to run. This way openstack overcloud external-upgrade run --tags ceph would only run ceph-ansible, similarly openstack overcloud external-upgrade run --tags openshift would only run openshift-ansible. This also allows fine tuning the spot in the upgrade workflow where operator wants to run a particular external installer upgrade (e.g. before or after upgrade of natively managed TripleO services).

A full upgrade workflow making use of these possibilities could then perhaps look like this:

openstack overcloud upgrade prepare <args>
openstack overcloud external-upgrade run --tags openshift
openstack overcloud upgrade run --roles Controller
openstack overcloud upgrade run --roles Compute
openstack overcloud external-upgrade run --tags ceph
openstack overcloud upgrade converge <args>