Fedora 18 on Dell XPS 13 Developer edition

My new work laptop, as discussed recently, is the second release of the Dell XPS 13 Developer edition, with the 1920×1080 screen, which fixes my main complaint about the first version. It feels smugly virtuous to buy a machine that doesn’t suffer from the Windows Tax and represents an explicit gesture of support for Linux.  However, Ubuntu isn’t my preferred distribution, so after staying with that for a trip to CERN, I’ve removed it and replaced it with Fedora 18.

Now one of the selling points of Project Sputnik is that special efforts have been made to ensure full hardware compatibility.  These days, Linux’s hardware support is generally very good. It’s not quite clear yet whether all the fixes produced by Dell and Canonical have been submitted upstream, so I knew there might be some regressions.  One thing that did work out of the box in Fedora was the “super” key, which might in other circumstances be better known as the “Windows key”. In both Unity and GNOME Shell it brings up the search and launching interface, so it’s quite important.  There is a package installed in the customised version of Ubuntu 12.04 installed on the XPS 13 that specifically disables this key, as detailed elsewhere. There’s no such problem with Fedora.

Installation of Fedora 18 from a USB stick completed very quickly, testament to the 256GB SSD.

While the brightness keys do bring up the on-screen indicator, screen brightness doesn’t seem to change, while on battery power at least. There is a workaround on the Gentoo wiki here:

echo 0 > /sys/class/backlight/intel_backlight/brightness

[Update: 2013-05-22 – there’s a more automated workaround in this Bugzilla entry, which I haven’t tried yet]

There is a 15W power draw, according to Powertop, without any configuration changes, on full brightness. This decreases to 7.5W when brightness reduced and the various power saving options are activated.

[Update: 2013-05-16 – a couple of times now the battery life indicator in GNOME has been wrong, showing 100% and then suddenly jumping to the correct (much lower) figure – it’s not clear yet whether this is a GNOME problem or something specific to this hardware]

Camera works, sound works, including Fn-key adjustment.

Suspending and resuming works – hibernation not tested yet.

Mini-DisplayPort connection to an external monitor via a VGA adapter works.

I’ll probably try running a newer kernel soon, which is supposed to have various fixes for this hardware, and update the post accordingly.

[Update: 2013-05-16 – today I’ve updated it to the latest Fedora 18 kernel, which is 3.9.2, and should have better touchpad handling]

Misleading CUPS error on Ubuntu 12.04

Last week my new work laptop arrived.  It’s a Dell XPS 13 “Developer Edition”, with a custom version of Ubuntu 12.04 LTS installed.  In time I’ll install Fedora, which is my preferred Linux distribution.  However, I thought I should give the pre-installed version a go, given it’s such a novelty (compared with the usual Windows festooned with various nonsense), and it’s not such a good idea to mess things around too much when you’re travelling.

This week I’m at CERN and need to connect to the local printers. Using the normal printing wizard, I kept receiving a “client-error-not-possible” message.  Most other people with this error solved it by installing smbclient.  However, that’s not relevant here, because the printer connection is via LPD and in any case it was already installed.  In the end, I found that adding the printer via the CUPS web interface worked and after that I could edit the printer settings in the wizard.  A little mysterious, but that’s what you expect from CUPS, I suppose…

DevOpsDays London 2013

On Friday and Saturday last week I attended DevOpsDays London 2013. Other people have blogged about the event, though I haven’t seen much coverage of the Open Spaces sessions, so here are my thoughts, trying to fill in the gaps I haven’t seen covered elsewhere.

Sam Eaton’s talk was very popular.  It came out during the talk that he’d just left the job about which he was talking on the Friday, so perhaps he was more frank than he would otherwise have been. He said he was wary of “owning” tools, because they end up “owning you”, creating “silos within yourself”. ActiveMQ was crucial to the approach he discussed.  It had been installed for a specific reason and they then made much wider use of it. He claimed that it made the creation of bespoke tools easier, because you wouldn’t have to worry about communication.  The common theme of asking for forgiveness being easier than asking for permission also featured, and he extended this to state that people should

deliver first, then evangelize

His slides are available here.

Gene Kim‘s talk was aimed at helping people sell the DevOps approach.  His idea of the

downward spiral of negative feedback

was quite powerful.

I liked the fact that Simon McCartney‘s Ignite talk included a round-up at the end of tools he wished he’d known about before he started his “Stackkicker” project, because he would have adapted one of them instead of starting from scratch.  A humble and practical admission.  Here are his slides.

Daniel Pope gave an Ignite talk on Saturday following up on something he’d mentioned in a Friday Open Space session I’d attended on storage. For testing he uses honeyd and the Fake ARP Daemon to, in essence, create a fake internet for testing. He also referred to

Test-driven development of infrastructure

and unit testing of infrastructure.

Open Spaces

The Open Spaces format was new to me and it was rather successful. The description in fact makes it sound more complicated than it really is. The storage session had people discussing various approaches people had taken to create distributed storage, some of them using a product I’d never heard of before called MogileFS. People were rather wary of Gluster and even warier of Ceph – “I’ve heard the block storage is done, and the filesystem…isn’t” being one of the responses. Lots of references to logstash and sensu in the session on monitoring, both of which I’d been aware of and now seem to have reached an inflection point in popularity terms.

The two Open Space sessions I attended on Saturday were on Clouds and Deployment.  Here are my bullet point notes from them:

  • Cloud experiences
    • Orchestration and automatic scaling is a problem
    • Need to get used mentally to killing your servers
    • Problem of cloud instance naming
      • Hard to know which machine is which
      • Solve this with tags
      • Or change the machine host name and put the instance id into the role eg with cloud-init
      • Also discovery-type pattern, eg using mcollective
      • Can use other inventories eg Chef
      • test driven infrastructure
        • Automatically test new instances that are part of a service, and kill them if they don’t respond correctly
    • How do you detect and deal with poorly performing nodes?
    • Interview question – what do you do if things are failing?
      • Rollback and rebuild
    • Kill a problem node first time
      • If it happens repeatedly, investigate
    • Riemann as a dashboard
    • Monitoring of cloud instances?
      • Much more dynamic than physical machine monitoring
      • Combination of mcollective and sensu
        • How to handle when instance ends?
        • Maybe a cron job on Sensu server?
        • Need to keep information about past machines in order to enable historical performance comparisons
        • Same host name may be reused with different sized instances
        • Custom tools needed for this at present
        • Need to tie machine’s details with monitoring output for this
        • Maybe keep all logs and process them afterwards
      • We don’t even know what the questions are re. Cloud, never mind how to solve them, compared with physical data centres
        • Difference between things staying mostly the same and things mostly chsnging
    • Handling of multiple regions?
      • Security groups don’t transfer automatically between aws regions
      • VPC should help with this
  • Deployment
    • Prefer to have everything in packages, to be able to track dependencies and check integrity
      • Use mcollective to trigger updates
      • Pulp to manage repositories
      • Build with Jenkins
    • Build with Jenkins and deploy with it too
    • Use versioning in the package, to cope with different application versions
    • Advice to use multiple Jenkins machines for different purposes, rather than try to do everything on one machine
    • How do you know what’s been deployed where, when using Jenkins? (which isn’t primarily a deployment tool)
      • Use post install scripts in rpms to register in graphite
      • Use work flow management plugin in Jenkins
    • Push application artifacts into Nexus, as an alternative approach
    • Liquibase
    • Need to have configuration and binaries integrated ie in same Puppet module, to ensure they’re in synch
    • Want to have a local repository and local mirror of everything you deploy, because you can’t rely on Internet resources being there
    • Be careful using something like Maven, because it will use snapshots from the Internet by default, which hurts reliability and reproducibility
      • Therefore block Internet for such cases
      • Use a proxy?
    • How many developers know how to write spec files?
    • Keep environment configuration separate from application configuration
      • Same tags in version control
    • Restrict access to eg Puppet modules to certain developers
    • Use git tags to keep track of things



Interleaving bugs

It’s useful to know the component parts of a problem.  Not just their nature, but their number.  If there’s a component you don’t know about, you may spend excessive time on fruitless attempts to solve other parts that in reality rely upon the undiscovered aspect.

Last week I experienced this while working on an OpenStack setup.  Instances were to be launched by a remote glideinWMS server, using HTCondor‘s EC2 interface.  When invoked manually through a simple job description, instances did start on the cloud controller node.  When invoked by glideinWMS, they failed.  While there was some log information at the remote end showing that the problem was an HTTP 414 return code, indicating that the URI was too long, nothing was logged at the OpenStack end specifically related to the launching of instances (there was information showing that other requests from glideinWMS were succeeding, so it wasn’t a simple connectivity problem).

At first I thought it might be a quota problem.  Increasing the defaults had no effect, so it wasn’t that.  What was more puzzling was the complete lack of any local trace of the launch request.  Eventually I found this OpenStack bug, which looked like it might explain that.  Talking it over with two of the incredibly helpful glideinWMS developers, I found some WSGI code with an internally specified limit to the length of incoming requests.  It turned out this code was included in OpenStack and, if I increased it and restarted the Nova API service, the launch instance requests coming from the remote site started working immediately.  This bug was reported and a fix proposed (again, by a glideinWMS developer).

The whole process was very frustrating until it clicked that I was dealing with two problems and that one was unhelpfully obscuring evidence needed to help solve the other.