Test it like you Deploy It

Monty Taylor

http://inaugust.com/talks/test-it-like-you-deploy-it.html

twitter: @e_monty

Who am I?

Office of Technology

Zuul

Ansible

Who am I?

Technical Committee

Foundation Board of Directors

Developer Infrastructure Core Team

What are we going to talk about?

  • Zuul
  • Ansible

What is Zuul?

  • Multi-cloud, scalable, elastic CI/CD engine
  • Validation of speculative future states
  • Single-use VM build nodes - safely run tests that need root
  • Multi-node builds
  • Multi-repo projects
  • Native support for gating configuration

Terminology

  • Periodic: jobs run in response to a timer
  • Post: jobs run after a change
  • Check: job run when someone proposes a change
  • Gate: jobs run between change approval and landing

Why?

The original OpenStack use case

  • Fully automated gated commits
  • Full end-to-end integration tests from scratch for every commit
  • Massive scale

OpenStack Scale by the numbers

  • 2 KJPH (kilo-jobs / hour)
  • 2376 arbitrary developers
  • 1474 git repositories
  • 11727 Jobs
  • Merge 10k Changes / Month

ansible has received 13171 PRs (changes), has merged 8190 of them and has 37788 commits in its entire lifetime

Multi Repository Speculative Execution

  • Zuul constructs speculative states as-if a change were merged
  • Tests future states without landing those changes first
  • The as-if spans multiple repos
  • In the Gate pipeline, speculative changes are put into a virtual serial queue, then tested in parallel as-if each change combination in front of them had landed

Zuul Animation

http://docs.openstack.org/infra/publications/zuul/#(18)

Multi-Repo Dependencies


commit 30039f04109efa2263aba6eb302a29bd8d5e8f53
Author: Monty Taylor
Date:   Mon Sep 26 14:14:15 2016 -0500

    Add simple field for disabled flavors

    When we were equalizing out the old silly names from the new pretty
    ones, we missed disabled.

    Change-Id: I4cbf5f7c27f640c566460c18951ab9030aae84e4
    Depends-On: I523e0ab6e376f5ff6205b1cc1748aa6d546919cb
      

Multi-Repo Dependencies - an example

  • shade library runs functional tests against openstack
  • neutron change breaks shade tests (more later)
  • neutron fix is proposed
  • shade change Depends-On proposed neutron change
  • shade tests are run as-if neutron change has landed
  • shade change cannot land until neutron change lands

Isn't this supposed to prevent neutron from breaking shade?

  • shade and neutron do not share a gating relationship
  • shade has, by choice, a test that tests against master of OpenStack
  • Such a test is 'risky' to shade devs
  • shade devs desire the risk - can work around breaks, or submit bugs

Not Specific to OpenStack

  • "Gate" and "Check" are merely configurations
  • 50+ OpenStack Vendors use Zuul for "3rd Party CI" of drivers
  • Wikimedia uses Zuul

Status Pages

https://integration.wikimedia.org/zuul/

http://status.openstack.org/zuul/

Pluggable

  • Triggers
  • Reporters
  • Node Providers
  • Execution content

Zuul v2

  • In production for OpenStack for 4 years
  • What most people run
  • Triggers: Gerrit, Periodic
  • Reporters: Gerrit, Email, MySQL
  • Node Providers: Elastic OpenStack Nodepool, Static servers
  • Jobs executed by Jenkins

image

Zuul v2.5

  • In use only by OpenStack (on purpose)
  • Replaced Jenkins with Ansible
  • Jobs still written using JJB - playbooks generated on the fly

http://logs.openstack.org/09/352209/7/check/gate-networking-ovn-python35/0fae86c/_zuul_ansible/

Why Replace Jenkins?

Not for lack of trying

  • OpenStack started on Jenkins (actually, on Hudson, remember that?)
  • We funded the Jenkins JClouds Plugin
  • We did deep dev in the Gerrit Trigger Plugin
  • Maintain the SCP artifact plugin (added console log support)
  • Added 0mq notification plugin
  • Added Gearman Worker plugin - allowed us to grow to 8 Masters/1000 slaves
  • We wrote Jenkins Job Builder

Before I answer that ...

The world is a better place with Jenkins existing

Jenkins Problems

Security

  • don't run WebUI on the internet
  • ssh slave plugin - it's possible for a slave to run arbitrary code on the master

Stability

  • almost every Jenkins upgrade has broken us

Scalability

  • Jenkins has global mutexes, especially in plugins
  • Extra large cloud server could handle ~100 concurrent jobs
  • We ran 8 Jenkins Masters with slaves sharded across them

Overkill

  • we only used it as a remote shell execution engine

We know a better engine for remote execution

image

Zuul V3

Test things like you deploy them

  • Intended for broad use
  • Jobs written in and executed with ansible
  • Fully support Bare Metal, VMs and Containers
  • Add GitHub PR Support
  • Add AWS and GCE Node Provider Support
  • Multi-node build clusters as first class resource
  • Docker and Kubernetes Build Resources
  • Multi-Tenant
  • In-repo config
  • Self-testing tests

Ansible execution

Merge the mergers and the launchers

  • Constructs relevant repo states and inventory
  • Push repos to test nodes
  • Execute playbook with inventory
  • Playbook can be defined centrally or in a repo

In repo config

  • Minimal config at start - other than "where are my repos"
  • 'trusted' config repos
  • config per-repo
  • At start, zuul asks the fleet of merger-launchers to calculate config

Test tests before you land them

  • Config is in git. Zuul does multo-repo git depends.
  • v3 adds with in-tree test definitions
  • Speculative state includes speculative state of test jobs

First class multi-node support


- job:
    name: ursula
    parent: base
    nodes:
    - name: controller
      image: ubuntu-trusty
    - name: db
      image: centos-7
      

Active node requests

  • zuul requests named resources
  • resources are created/checked out of pool
  • at finish, checked back in
  • Single-use environments delete on check in
  • Multi-use environments become available for subsequent jobs
  • Request non-node resources, such as kubernetes cluster

Status

Focus

  • OpenStack first, and the hard problems that brings
  • Extra-hard is handled. So is simple - but zuul is complex to run if you only have the simple use cases

Next up

  • Get it ready for Ansible project
  • Making it truly suitable for not-OpenStack Infra to run
  • Making the easy tasks simple

For More Information

Thank you!

http://inaugust.com/talks/test-it-like-you-deploy-it.html

twitter: @e_monty