On Friday and Saturday last week I attended DevOpsDays London 2013. Other people have blogged about the event, though I haven’t seen much coverage of the Open Spaces sessions, so here are my thoughts, trying to fill in the gaps I haven’t seen covered elsewhere.

Sam Eaton’s talk was very popular.  It came out during the talk that he’d just left the job about which he was talking on the Friday, so perhaps he was more frank than he would otherwise have been. He said he was wary of “owning” tools, because they end up “owning you”, creating “silos within yourself”. ActiveMQ was crucial to the approach he discussed.  It had been installed for a specific reason and they then made much wider use of it. He claimed that it made the creation of bespoke tools easier, because you wouldn’t have to worry about communication.  The common theme of asking for forgiveness being easier than asking for permission also featured, and he extended this to state that people should

deliver first, then evangelize

His slides are available here.

Gene Kim‘s talk was aimed at helping people sell the DevOps approach.  His idea of the

downward spiral of negative feedback

was quite powerful.

I liked the fact that Simon McCartney‘s Ignite talk included a round-up at the end of tools he wished he’d known about before he started his “Stackkicker” project, because he would have adapted one of them instead of starting from scratch.  A humble and practical admission.  Here are his slides.

Daniel Pope gave an Ignite talk on Saturday following up on something he’d mentioned in a Friday Open Space session I’d attended on storage. For testing he uses honeyd and the Fake ARP Daemon to, in essence, create a fake internet for testing. He also referred to

Test-driven development of infrastructure

and unit testing of infrastructure.

Open Spaces

The Open Spaces format was new to me and it was rather successful. The description in fact makes it sound more complicated than it really is. The storage session had people discussing various approaches people had taken to create distributed storage, some of them using a product I’d never heard of before called MogileFS. People were rather wary of Gluster and even warier of Ceph – “I’ve heard the block storage is done, and the filesystem…isn’t” being one of the responses. Lots of references to logstash and sensu in the session on monitoring, both of which I’d been aware of and now seem to have reached an inflection point in popularity terms.

The two Open Space sessions I attended on Saturday were on Clouds and Deployment.  Here are my bullet point notes from them:

  • Cloud experiences
    • Orchestration and automatic scaling is a problem
    • Need to get used mentally to killing your servers
    • Problem of cloud instance naming
      • Hard to know which machine is which
      • Solve this with tags
      • Or change the machine host name and put the instance id into the role eg with cloud-init
      • Also discovery-type pattern, eg using mcollective
      • Can use other inventories eg Chef
      • test driven infrastructure
        • Automatically test new instances that are part of a service, and kill them if they don’t respond correctly
    • How do you detect and deal with poorly performing nodes?
    • Interview question – what do you do if things are failing?
      • Rollback and rebuild
    • Kill a problem node first time
      • If it happens repeatedly, investigate
    • Riemann as a dashboard
    • Monitoring of cloud instances?
      • Much more dynamic than physical machine monitoring
      • Combination of mcollective and sensu
        • How to handle when instance ends?
        • Maybe a cron job on Sensu server?
        • Need to keep information about past machines in order to enable historical performance comparisons
        • Same host name may be reused with different sized instances
        • Custom tools needed for this at present
        • Need to tie machine’s details with monitoring output for this
        • Maybe keep all logs and process them afterwards
      • We don’t even know what the questions are re. Cloud, never mind how to solve them, compared with physical data centres
        • Difference between things staying mostly the same and things mostly chsnging
    • Handling of multiple regions?
      • Security groups don’t transfer automatically between aws regions
      • VPC should help with this
  • Deployment
    • Prefer to have everything in packages, to be able to track dependencies and check integrity
      • Use mcollective to trigger updates
      • Pulp to manage repositories
      • Build with Jenkins
    • Build with Jenkins and deploy with it too
    • Use versioning in the package, to cope with different application versions
    • Advice to use multiple Jenkins machines for different purposes, rather than try to do everything on one machine
    • How do you know what’s been deployed where, when using Jenkins? (which isn’t primarily a deployment tool)
      • Use post install scripts in rpms to register in graphite
      • Use work flow management plugin in Jenkins
    • Push application artifacts into Nexus, as an alternative approach
    • Liquibase
    • Need to have configuration and binaries integrated ie in same Puppet module, to ensure they’re in synch
    • Want to have a local repository and local mirror of everything you deploy, because you can’t rely on Internet resources being there
    • Be careful using something like Maven, because it will use snapshots from the Internet by default, which hurts reliability and reproducibility
      • Therefore block Internet for such cases
      • Use a proxy?
    • How many developers know how to write spec files?
    • Keep environment configuration separate from application configuration
      • Same tags in version control
    • Restrict access to eg Puppet modules to certain developers
    • Use git tags to keep track of things




