From Zero to 2500: Managing Development for OpenStack Completely in the Open

Monty Taylor

http://inaugust.com/talks/zero-to-2500.html

twitter: @e_monty

Who am I?

Office of Technology

Zuul

Ansible

Who am I?

Technical Committee

Developer Infrastructure Core Team

OpenStack

The Four Opens

  • Open Source
    we don't hold back "Enterprise" features, we don't cripple things
  • Open Design
    design process open to all, decisions are not made inside company doors
  • Open Development
    public source code, public code review, all code is reviewed and gated
  • Open Community
    lazy consensus, democratic leadership from participants, public logged meetings in IRC, public archived mailing lists

OpenStack Infra

Tooling, Automation and CI for OpenStack Project

Why?

The original OpenStack use case

  • Fully automated gated commits
  • Full end-to-end integration tests from scratch for every commit
  • Massive scale

OpenStack Scale by the numbers

  • 2 KJPH (kilo-jobs / hour) (1/3 the total Travis job rate)
  • 2376 arbitrary developers
  • 1474 git repositories
  • 11727 Jobs
  • Merge 10k Changes / Month

ansible has received 13171 PRs (changes), has merged 8190 of them and has 37788 commits in its entire lifetime

Infra operates the same way as OpenStack

How do we do this?

Control plane

http://git.openstack.org/cgit/openstack-infra/system-config

  • All server config management in git
  • Puppet manages the servers: puppet apply
  • Ansible runs puppet: ansible puppet module
  • Ansible OpenStack Dynamic Inventory
  • Only thing not public are keys and secrets

It wasn't always this way!

Let me take you on a walk down memory lane ...

We started with 4 cloud servers in Rackspace

  • Hudson Master (https://launchpad.net/~hudson-openstack)
  • Nova Build Node
  • Swift Build Node
  • The other server (now known as old-wiki)

old-wiki is still running! (On Ubuntu 10.04)

I didn't even have access to the cloud account!

The Setup

  • Hudson jobs ran Tarmac, which tested and merged Launchpad Merge Requests
  • Hudson ran Tarmac in a loop, published the build results
  • One Job per project
  • Three of us with direct Hudson Admin permissions

This state persisted for the first year and first three OpenStack releases

Project Proliferation

Each project got a node and a job. Configured by hand. By me.

It got annoying

Config Management!

Please remember we're talking 2011 here

Puppet vs. Chef and git vs. bzr and humans pushing things

We were so excited about sharing Ops best practices!

We were so sad

Brief Rant - I do not want to write Apache configs in Puppet DSL

So we introduced Puppet

http://git.openstack.org/cgit/openstack-infra/system-config/tree/?id=99540d91a75d2b021db01d815e46bd585f9235cd

Open Development

Our developers wanted to collaborate on test jobs.

Giving hundreds of people access to directly edit test jobs == sadness

Did I mention our test jobs implement captive gating?

Jenkins Job Builder

YAML encoding of Jenkins Job definitions with templating

Allowed jobs to go through code review before being applied!

Andreas Jaegar is OpenStack's all-time contributions leader. He works on docs and test jobs

Introduction of Puppetmaster

  • In cron job, config repo updated on puppetmaster
  • puppet agent ran on each node
  • Landing a commit == config running on hosts (eventually)
  • What to do with passwords and keys?

Introduction of Hiera for Secrets

Hiera let us store YAML files with only secrets. Reference secrets by name in puppet manifests

http://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp

Project Creation

  • Create replica repos on git farm (and on github I suppose)
  • Create repo in Gerrit
  • Push contents

Too much clicking!

jeepyb - Gerrit Project Builder

Lesson: Don't let Monty name things

Ansible to run Puppet

Back up: Salt to run Puppet

Ansible to run Puppet

  • puppet ansible module
  • openstack-infra/ansible-role-puppet
  • Role copies subset of hiera secrets to node before puppet
  • Moved from puppetmaster to puppet apply

Remaining manual human tasks

  • Adding new secrets to hiera
  • Launching new servers

Ansible Role Cloud Launcher

http://git.openstack.org/cgit/openstack/ansible-role-cloud-launcher

profiles:
  - name: admin-clouds
    flavors:
      - name: aoclcompany.xlarge
        ram: 128
        vcpus: 1
  - name: ops
    images:
      - name: ubuntu-trusty
        filename: /home/ubuntu/trusty-server-cloudimg-amd64-disk1.img
  - name: bootstrap-keypair
    keypairs:
      - name: bootstrap-key
        public_key_file: /home/ubuntu/.ssh/id_rsa.pub
clouds:
  - name: awesomecloud
    profiles:
      - admin-clouds
      - bootstrap-keypair
  - name: yaycloud-ops
    oscc_cloud: yaycloud-opsuser
    profiles:
      - bootstrap-keypair
      - ops
      

Problems depending on services

Even when the service is Open Source, it can stop being

Transifex

WAS an Open Source translations system.

Zanata!

We run Zanata ourselves now. (Thanks Lyz!)

Remaining external service dependencies

  • Rackspace Public Cloud
  • Launchpad

Launchpad OpenID -> openstackid

Launchpad Bugs -> storyboard

The Multi-cloud OpenStack Story

  • Our build nodes already span 12 different OpenStack Public Clouds
  • Work starting on spreading the Control Plane out
  • Starting with Vexxhost

Thank you!

http://inaugust.com/talks/zero-to-2500.html

twitter: @e_monty