Maven repo full? Groovy comes to the rescue!

Working with Maven and continuous integration practices, the maven snapshot repository can grow in size at an alarming rate. Typically each commit to version control will trigger a build, test execution and (if the two previous step succeeds) a deployment of the project artifact to the the snapshot repo. Pretty soon, particularly if a large multi module project is the order of the day, the hard drive of the build server will be full with old snapshot releases of interest only to software archaeologists.

As I understand it maven repository managers such as Sonatype’s Nexus allow for specifying policies with regard to retention of outdated snapshot releases but I’ve unfortunately no experience with such beasts. For simpler scenarios where maven is configured to dump snapshots directly to the file system I’ve prepared a small shell script in groovy which  produces a list of the absolute path to repo files which can be safely nuked.  Set up your favorite build server (i.e. Hudson) to execute:

snapshotCleaner.groovy | xargs rm

weekly and you will never be troubled with running out disk space again!  (You may want to do a trial run without the rm part before trusting Hudson with erasing the files for you on autopilot, though. :-))

The script uses a couple of groovy niceties including (obviously) the ability to execute groovy as a unix shell script, the GDK java.io.File extension eachFileRecurse and the regex find operator =~ in a boolean context. Here’s the snapshot cleaner source:

#!/usr/bin/env /opt/groovy/bin/groovy

def snapshotRepoPath = '/depot/maven_repo/snapshot-repository'

long size = 0

new File(snapshotRepoPath).eachFileRecurse{File f ->
  try {
  if (isPartOfSnapshotRelease(f)) {
    if (!(f.getName() =~ getLatestDatePatternFromMavenMetaData(f))) {
      println "$f"
      size += f.size()
    }
  }
  } catch (Exception e) {
    System.err.print("For $f received exception $e")
  }
}

System.err.println "  Total disk space consumed by returned files ${size/(1024*1204)} MB."

def boolean isPartOfSnapshotRelease(File snapshotCandidateFile) {
  if (snapshotCandidateFile.isDirectory()) {
    return false;
  }

  if (snapshotCandidateFile =~ /maven-metadata/) {
    return false
  }

  boolean hasSubDirInParentDir = false
  snapshotCandidateFile.getParentFile().eachFile {File f ->
    if (f.isDirectory())
      hasSubDirInParentDir = true;
  }

  return !hasSubDirInParentDir
}

def String getLatestDatePatternFromMavenMetaData(File file) {
  def x = new XmlSlurper().parse(new File(file.getParent() + File.separator + "maven-metadata.xml"))

  "${x.versioning.snapshot.timestamp}.${x.versioning.snapshot.buildNumber}"
}