Who am I?

Office of Technology

Zuul

Ansible

Who am I?

Technical Committee

Developer Infrastructure Core Team

Former Foundation Board of Directors

PTL of shade project

What are we going to talk about?

Zuul
Ansible

What is Zuul?

Multi-cloud, scalable, elastic CI/CD engine
Validation of speculative future states
Single-use VM build nodes - safely run tests that need root
Multi-node builds
Multi-repo projects
Native support for gating configuration

Terminology

Periodic: jobs run in response to a timer
Post: jobs run after a change
Check: job run when someone proposes a change
Gate: jobs run between change approval and landing

Why?

The original OpenStack use case

Fully automated gated commits
Full end-to-end integration tests from scratch for every commit
Massive scale

OpenStack Scale by the numbers

2 KJPH (kilo-jobs / hour)
2376 arbitrary developers
1474 git repositories
11727 Jobs
Merge 10k Changes / Month

ansible has received 13171 PRs (changes), has merged 8190 of them and has 37788 commits in its entire lifetime

Multi Repository Speculative Execution

Zuul constructs speculative states as-if a change were merged
Tests future states without landing those changes first
The as-if spans multiple repos
In the Gate pipeline, speculative changes are put into a virtual serial queue, then tested in parallel as-if each change combination in front of them had landed

Multi-Repo Dependencies


commit 30039f04109efa2263aba6eb302a29bd8d5e8f53
Author: Monty Taylor
Date:   Mon Sep 26 14:14:15 2016 -0500

    Add simple field for disabled flavors

    When we were equalizing out the old silly names from the new pretty
    ones, we missed disabled.

    Change-Id: I4cbf5f7c27f640c566460c18951ab9030aae84e4
    Depends-On: I523e0ab6e376f5ff6205b1cc1748aa6d546919cb

Multi-Repo Dependencies - an example

shade library runs functional tests against openstack
neutron change breaks shade tests (more later)
neutron fix is proposed
shade change Depends-On proposed neutron change
shade tests are run as-if neutron change has landed
shade change cannot land until neutron change lands

Isn't this supposed to prevent neutron from breaking shade?

shade and neutron do not share a gating relationship
shade has, by choice, a test that tests against master of OpenStack
Such a test is 'risky' to shade devs
shade devs desire the risk - can work around breaks, or submit bugs

Not Specific to OpenStack

"Gate" and "Check" are configurations
50+ OpenStack Vendors use Zuul for "3rd Party CI" of drivers
BMW and Wikimedia use Zuul

Pluggable

Triggers
Reporters
Node Providers
Execution content

Zuul v2

In production for OpenStack for 4 years
What most people run
Triggers: Gerrit, Periodic
Reporters: Gerrit, Email, MySQL
Node Providers: Elastic OpenStack Nodepool, Static servers
Jobs executed by Jenkins

Zuul v2.5

In use only by OpenStack (on purpose)
Replaced Jenkins with Ansible
Jobs still written using JJB - playbooks generated on the fly

http://logs.openstack.org/09/352209/7/check/gate-networking-ovn-python35/0fae86c/_zuul_ansible/

Why Replace Jenkins?

Not for lack of trying

OpenStack started on Jenkins (actually, on Hudson, remember that?)
We funded the Jenkins JClouds Plugin
We did deep dev in the Gerrit Trigger Plugin
Maintain the SCP artifact plugin (added console log support)
Added 0mq notification plugin
Added Gearman Worker plugin - allowed us to grow to 8 Masters/1000 slaves
We wrote Jenkins Job Builder

Before I answer that ...

The world is a better place with Jenkins existing

Jenkins Problems

Security

don't run WebUI on the internet
ssh slave plugin - it's possible for a slave to run arbitrary code on the master

Stability

almost every Jenkins upgrade has broken us

Scalability

Jenkins has global mutexes, especially in plugins
Extra large cloud server could handle ~100 concurrent jobs
We ran 8 Jenkins Masters with slaves sharded across them

Overkill

we only used it as a remote shell execution engine

We know a better engine for remote execution

Zuul v3

Test things like you deploy them

Intended for broad use
Jobs written in and executed with ansible
GitHub Support
Multi-node build clusters as first class resource
Self-testing tests
Multi-Tenant
In-repo config
Fully support Bare Metal, VMs and Containers

AWS, GCE, Mac Stadium and Static Node Provider Support
Docker and Kubernetes Build Resources

Ansible execution

Merge the mergers and the launchers

Constructs relevant repo states and inventory
Push repos to test nodes
Execute playbook with inventory
Playbook can be defined centrally or in each repo

In repo config

Minimal config at start - other than "where are my repos"
'trusted' config repos
config per-repo
At start, zuul asks the fleet of merger-launchers to calculate config

Test tests before you land them

Config is in git. Zuul does multo-repo git depends.
v3 adds with in-tree test definitions
Speculative state includes speculative state of test jobs

First class multi-node support


- job:
    name: ursula
    nodes:
    - name: controller
      image: fedora-26
    - name: db
      image: rhel-7

Active node requests

zuul requests named resources
resources are created/checked out of pool
at finish, checked back in
Single-use environments delete on check in
Multi-use environments become available for subsequent jobs
Request non-node resources, such as kubernetes cluster

Status

Focus

OpenStack next week, and the hard problems that brings
Extra-hard is handled. So is simple - but zuul is complex to run if you only have the simple use cases

Next up

Get it ready for Ansible project
Making it truly suitable for not-OpenStack Infra to run
Making the easy tasks simple

Test it like you Deploy It

Monty Taylor

http://inaugust.com/talks/test-it-like-you-deploy-it.html

twitter: @e_monty

Who am I?

Who am I?

What are we going to talk about?

What is Zuul?

Terminology

Why?

The original OpenStack use case

OpenStack Scale by the numbers

Multi Repository Speculative Execution

Multi-Repo Dependencies

Multi-Repo Dependencies - an example

Isn't this supposed to prevent neutron from breaking shade?

Not Specific to OpenStack

Pluggable

Zuul v2

Zuul v2.5

Why Replace Jenkins?

Not for lack of trying

Before I answer that ...

Jenkins Problems

We know a better engine for remote execution

Zuul v3

Test things like you deploy them

Ansible execution

Merge the mergers and the launchers

In repo config

Test tests before you land them

First class multi-node support

Active node requests

Status

For More Information

Thank you!

http://inaugust.com/talks/test-it-like-you-deploy-it.html

twitter: @e_monty