Zuul

Monty Taylor

http://inaugust.com/talks/zuul.html

twitter: @e_monty

What is Zuul?

  • Multi-cloud, scalable, elastic CI/CD engine
  • Validation of speculative future states
  • Test things like you deploy them
  • Single-use VM build nodes - safely run tests that need root
  • Fully support Bare Metal, VMs and Containers
  • Multi-node builds
  • Multi-repo projects
  • Native support for gating configuration

Terminology

  • Periodic: jobs run in response to a timer
  • Post: jobs run after a change
  • Check: job run when someone proposes a change
  • Gate: jobs run between change approval and landing

Why - the original OpenStack use case

  • Fully automated gated commits
  • Full end-to-end integration tests from scratch for every commit
  • Massive scale

OpenStack Development Scale by the numbers

When we say "massive scale"

  • 2 KJPH (kilo-jobs / hour)
  • 2500 arbitrary developers
  • 1474 git repositories
  • 11727 Jobs
  • 450k lifetime changes
  • Merge 10k Changes / 42 days

ansible has _received_ 13171 PRs (changes), has merged 8190 of them and has 37788 commits in its entire lifetime

Multi Repository Speculative Execution

  • Zuul constructs speculative states as-if a change were merged
  • Tests future states without landing those changes first
  • The as-if spans multiple repos
  • In the Gate pipeline, speculative changes are put into a virtual serial queue, then tested in parallel as-if each change combination in front of them had landed

Zuul Animation

http://docs.openstack.org/infra/publications/zuul/#(18)

Not Specific to OpenStack

  • "Gate" and "Check" are merely configurations
  • 50+ OpenStack Vendors use Zuul for "3rd Party CI" of drivers
  • HP uses zuul for both OpenStack and non-OpenStack products
  • Wikimedia uses zuul

Status Pages

https://integration.wikimedia.org/zuul/

http://status.openstack.org/zuul/

Pluggable

  • Triggers
  • Reporters
  • Node Providers
  • Execution content (ansible)

Zuul v2

  • In production for OpenStack for 4 years
  • What most people run
  • Triggers: Gerrit, Periodic
  • Reporters: Gerrit, Email, MySQL
  • Node Providers: OpenStack, Long-lived non-managed servers
  • Jobs executed by Jenkins

Zuul v2.5

  • In use only by OpenStack (on purpose)
  • Replaced Jenkins with Ansible
  • Jobs still written using JJB - playbooks generated on the fly

http://logs.openstack.org/09/352209/7/check/gate-networking-ovn-python35/0fae86c/_zuul_ansible/

Why Replace Jenkins?

Not for lack of trying

  • Started on Jenkins (actually, on Hudson, remember that?)
  • Funded the Jenkins JClouds Plugin
  • Did deep dev in the Gerrit Trigger Plugin
  • Maintain the SCP artifact plugin (added console log support)
  • Added 0mq notification plugin
  • Added Gearman Worker plugin (allowed us to grow to 8 Masters/1000 concurrent slaves)
  • Wrote Jenkins Job Builder

Jenkins Problems

Wasn't written originally to be Internet-facing

Security

  • don't run WebUI on the internet
  • ssh slave plugin - it's possible for a slave to run arbitrary code on the master

Stability

  • Almost every Jenkins upgrade has broken us

Scalability

  • Jenkins has global mutexes, especially in plugins
  • Extra large cloud server could handle ~100 concurrent jobs
  • We ran 8 Jenkins Masters with slaves sharded across them

Overkill

  • We only used it as a remote shell execution engine

We know a better engine for remote execution

Zuul V3

  • Jobs written in and executed with Ansible
  • Intended for broad use
  • Triggers: gerrit, periodic, github (? bitbucket, gitlab, stash, fedmsg, email)
  • Reporters: gerrit, email, github (? bitbucket, gitlab, stash, resultsdb)
  • Node providers: pre-existing servers, dynamic cloud slaves (OpenStack, AWS, GCE), k8s clusters
  • In-repo config
  • Multi-node build clusters as first class resource
  • Multi-Tenant

Focus

So far

  • OpenStack, and the hard problems that brings
  • Extra-hard is handled. So is simple - but zuul is complex to run if you only have the simple use cases

Zuul v3

  • Get it ready for Ansible project
  • Making it truly suitable for not-OpenStack Infra to run
  • Making the easy tasks simple
  • Making zuul the thing everyone WANTS to use

For More Information

  • http://docs.openstack.org/infra/zuul/
  • http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html
  • freenode:#zuul
  • http://docs.openstack.org/infra/publications/zuul/#(1)

Thank you!

http://inaugust.com/talks/zuul.html

twitter: @e_monty