Docker tips for java developers

Latest comes with new guarantees

Unlike the SNAPSHOT-version concept of the ubiquitous Apache Maven dependency manager the so-called “latest” tag in Docker comes with no effort to ensure that a somewhat recently published artifact/image is used. Thus will:

$ docker run myrepo.com/myimage:latest

only pull the latest  image from the repository when run the first time. On subsequent executions will Docker just re-use the myimage:latest image that was pulled into the local repository. An explicit:

$ docker pull myrepo.com/myimage:latest

is required to get the latest image.

Docker Compose up is actually quite smart

Re-running the up command on an already “up” project will check for any deviation between the running services and the configuration in the docker-compose.yml file.  A delta wrt. e.g. an environment variable value will automatically be reconciled by Docker Compose by stopping and removing any outdated container(s) followed by relaunching with the updated configuration set. This can save serious time compared the more brute-force method of tearing down and starting up all containers in the project after every docker-compose.yml change.

Docker volumes requires explicit cleanup (for now)

Removing volumes associated with containers requires an extra parameter:

docker rm --volumes mycontainer

If –volumes is left out any associated volumes will remain for ever, hogging disk space.

The Docker volume subsystem appears to be significantly reworked in the upcoming Docker 1.9 version with new features such as the listing and removal of orphaned volumes.

Configuration management on the Raspberry Pi

Raspberry Pi computers running the Raspbian flavour of the venerable Debian Linux distribution make excellent headless home servers. Inexpensive, small, quiet, reliable, power-efficient and powerful enought to be employed as NAS-appliances, BitTorrent boxes, AirPlay receivers, blog hosts, beer fermentation controllers etc etc.. Setting up a Raspberry Pi for these application are seldom as trivial as a:

$ apt-get something

More typical are laborious sessions at the console cloning git repos, running DDL scripts, tweaking cron tables as directed by long step-by-step tutorials in blog posts.

If  one had to endure this once and then be set for the foreseeable future things might not be too bad.

However, as stable the Pi hardware is it does have one Achilles’ heel – persistent storage. The SDHC-cards used for holding the root file system are  infamous for not holding up very well in the long-term (perhaps not too unexpected considering the much less demanding camera applications SD cards typically are designed for). Eventual file system corruption appears  to be almost unavoidable, as the plethora of online problem reports bears wittnes to [1], [2].

This problem leads to frequent (re)imaging of cards. If one wasn’t for foresighted enough to keep a full backup image hanging around you’ll have to start from scratch with an install image, toiling away at the step-by-step guides again.

Enter modern configuration automation tools. The prefered silver bullet of the fledging devops movement promises big saving in the data center automating all kinds of system administration task on hordes of virtual machines in private or public clouds.

Recently I’ve experimented with applying these tools on a different kind of platform: the humble Raspberry Pi.

The tools provides ways to specify the desired configuration of the nodes under control in various domain specific languages. These languages typically allows for powerful encapsulation and modularization abstractions thus managing complexity and promoting reusability. The tools offers several importants benefits compared to manual or semi-manual configuration :

  • Large libraries of ready-to-use modules for tasks such as controlling databases, installing packages, etc
  • Convenient mass application of specification on multiple nodes
  • Idempotency. Specifications can be re-run repeatablity ensuring that controlled nodes are up-to-date with the latest addition.
  • More friendly syntax than bash.

I’ve taken a closer look att Puppet, Pallet and Ansible.

Puppet is a well-known Ruby based tool that has been around for a while. The Puppet ecosystem is huge and there is a great assortment of third-party modules ready to use for almost any task. I’ve used Puppet for implementing a manifest that installs the Tiny Tiny RSS reader on PostgreSQL with data backups, feed updates, etc.

Pallet is a newer Clojure based option. In the trio it is the option which currently clearly enjoys the least amount of community traction, the available documentation is also sparse and disorganized. Even though I’m attracted to Clojure as a language these factors caused me to abandon Pallet for now.

Both Puppet and Pallet are heavily geared toward operation in large data centers, offerings features to control and organize hundreds of nodes. Puppet also in general relies on central master nodes that agents on controlled nodes periodically pull new configuration specs from. This is a huge impedance mismatch when one only wants to control  a couple of a Raspberry Pi servers in a living room.

Ansible is a relatively new Python based tool hitting 1.0 as recently as February 2013. Compared to Puppet there is less documentation available and the selection of third party modules is still a little behind. However is Ansible under heavy development with features and new modules added at a furious pace, the latest release being 1.3 as of August 2013.

Ansible is by design push based and agent less, requiring no running daemon on controlled hosts. Since Ansible uses standard ssh as transport to control hosts it is perfectly possible to control hosts without installing any additional software at all on them. The apt modules however requires python-apt and aptitute to be installed (easily achieved remotely with the Ansible shell module).

This means that it’s possible to assume control of a freshly imaged Pi without ever manually logging in!

Ansible is remarkably quick and executes a typical playbook much faster than a corresponding Puppet manifest on the modest Pi hardware. Ansible comes with a great deal of instant gratification, the learning curve beeing  smoother than that of Puppet, much of which probably can be attributed to a simpler specification language. Ansible playbooks are YAML, Puppets manifest is a Ruby DSL. Yaml is definitely less ceremony, and more cool. Puppets manifest are declarative in nature, one defines the desired actions to be taken and relationships between them (like before/after) leaving Puppet to figure out the correct execution sequence, while Ansible’s style is more imperative. Puppets style of specification may offer better optimization possibilities on the tool level and perhaps specs that are easier to maintain in the long run, but so far I’m very happy with Ansible’s simpler approach.

The conclusion is simple: to control a couple of Pi:s go for Ansible!

Mercurial tooling reaching the tipping point

The MercurialEclipse context menu certainly looks comprehensive..

A tipping point can be defined as the levels at which the momentum for change becomes unstoppable.

I’d venture to state that distributed version control system Mercurial reached its tipping point when version 1.6.0 of MercurialEclipse recently was released.  At least for me. 🙂 During the last years I’ve been consistently and repeatedly underwhelmed with the state of Mercurial tooling in my favoured integrated development enviroment, resorting to use hg at the command line exclusively.  Command line interfaces are good for many things but it’s hard to let go of the comforts of a good GUI when one is used to the brilliant Subclipse Subversion plugin.

MercurialEclipse is a quantum leap. I sincerely hope the availability of first class Eclipse support for Mercurial will be the tipping point for distributed version control systems in general and Mercurial in particular!

4 cool things you can do with a distributed version control system

With the Git-craze currently flourishing it’s hard to see that there’s competition in distributed versions control systems (DVCS) space. In fact there is a strong contender that I’ve had the privilege of getting to know more intimately during the last couple of months. Mercurial was born around the same time as Git, in 2005 when an open and free distributed version control system for maintaining the Linux kernel was in urgent need. It was finally Git that got the honor, but that doesn’t mean Mercurial hasn’t been entrusted any projects of significance. Among the Mercurial users today are OpenJDK, Netbeans and Mozilla. Open source project hosting provider Google Code adopted Mercurial rather than Git after careful analysis.

I’m a bit of “late adopter” when it comes to DCVS:s, but in the unlikely case there is even later adopters – here’s a brief summary of the whole “distributed” thing:  version control systems are something all developer are (should?) be very familiar with. The most common ones such as old rusty CVS and the more modern Subversion are based on the client-server model – developers work on a working copy of a project stored on a central repository on a server. The client working copy is updated from the remote repository and the fruits of the developer’s labor are committed the other way around. The central repo stores all revisions of the project ever committed including commit comments, tags and branches. The working copy is essentially little more than a particular revision from the central repo plus the developer’s outstanding uncommitted changes against this version.

Enter distributed version systems.  The idea is that developers no longer work with plain working copies of projects, instead they work with full repositories. These distributed repositories are no longer heavy weight, singular entities on the lines of a Subversion server, rather they are small manageable things that lives beside the working copy in the file system (in the case of Mercurial in the .hg sub directory of the working copy folder). The repo can easily be zipped up and mailed somewhere. DVCS:s however provide native functionality for cloning repos between different computers (over a network) or just between different file paths on the same machine.  Developers still commit and update the working copy as usual but the counter part is now the private local repo. Changes to the repo are pushed to other repos (again over the network or on the same machine). The other way around is known as a pull. There is no pre-determined repository topology, developers are free to push and pull between different repos in any direction, at any time.

A distributed version system is a generalization of your typically vanilla client-server version system. You could use a DVCS in a completely centralized way, always pushing changes to the local repo to a central repo after each commit. A distributed version control system however lends itself to a distributed development process well suited for open source projects, and provides some other cool possibilities:

1. Carry around entire projects, including the complete commit history, tags and branches, on a flash stick. Ideal in academic/corporate environments where restrictive firewalls makes it problematic to connect to remote repositories.

2. Work the same project with two different IDE:s. Having two of the Java heavy-weights (Eclipse, Netbeans or IntelliJ) share project folder is not something I’d recommend.  A better approach is to clone separate repos for each IDE and push/pull deltas between them to heart’s content.  I’ve used this approach with Java Swing apps – utilizing the excellent GUI builder Matisse in Netbeans to lay out the screens while doing the rest of the coding in my preferred IDE, Eclipse.

3. Speed up deployments. Modern web application frameworks tend to have many dependencies and thus produce voluminous deployment artifacts. On a typical ADSL-hookup a 50MB WAR (a realistic figure for a Grails app the uses a couple of plugins) could take several minutes to upload to a deployment server. A better approach is to clone a repo on the deployment server(s) and simply push the delta since the previous deploy. The WAR can then be built on the server faster than it would have taken to build it locally and then upload it.

4. Add version control to anything, anywhere with ease. Transforming an existing directory structure into the working copy of a full-blown pushable, taggable, revertable and diffable repository is trivial and unobtrusive. This can be immensely powerful.

Want to start tracking the system configuration of a Unix system? Simply:

~# cd /etc
/etc# hg init

Changed your mind and want to revert to the previous non-version-controlled state and remove all traces of the version control?

/etc# rm -rf .hg/

The 10 minute guide to distributed version control with Mercurial

Mercurial is a piece of distributed version control (DVCS) software which is remarkably easy to work with from the command line¹. Users of the grand-daddy of version control software, CVS, or its follow-up – Subversion, will be pleased to notice that the Mercurial developers have gone to great length to provide a similar syntax. Where Mercurial deviates it’s mainly due to the nature of DVCS which brings with it a couple of concepts not used in traditional client/server version control systems.

This guide is designed to give a DVCS novice a feel for distributed version control in general and Mercurial in particular in under 10 minutes. Unix and client/server version control familiarity is assumed.

800px-Pouring_liquid_mercury_bionerd

The Mercurial executable is named for the chemical symbol of the element mercury - Hg. Photo credit: Bionerd

Start with installing Mercurial:
Visit the Mercurial download page and grab an installation package of your choice.

Create a repository:
~# mkdir repo1

~/repo1# cd repo1
~/repo1# hg init

Add a file to the repo:
~/repo1# touch file.txt
~/repo1# hg add file.txt

Commit the file to the repo:
~/repo1# hg commit

Clone the repo:
~/repo1# cd ~
~/# hg clone repo1 repo2

Make a change in the working copy of the cloned repo and commit it:
~/# cd ~/repo2
~/repo2# echo "DOH" > file.txt
~/repo2# hg commit

Push the change to repo1:
~/repo2# hg push

Update repo1 with the delta from repo2:
~/repo2# cd ~/repo1
~/repo1# hg update

Let’s try pulling changes from repo1 to repo2 :
~/repo1# echo "DOH2" >> file.txt
~/repo1# hg commit
~/repo1# cd ~/repo2
~/repo2# hg pull
~/repo2# hg update

Note that good help is never far away :
~/# hg help
~/# hg help pull

For more on what you can do with with Mercurial read up on 4 cool things you can do with a distributed version control system.

¹) If you are so inclined, there are good graphical Mercurial interfaces such as TortoiseHg available.