Working with Maven and continuous integration practices, the maven snapshot repository can grow in size at an alarming rate. Typically each commit to version control will trigger a build, test execution and (if the two previous step succeeds) a deployment of the project artifact to the the snapshot repo. Pretty soon, particularly if a large multi module project is the order of the day, the hard drive of the build server will be full with old snapshot releases of interest only to software archaeologists.
As I understand it maven repository managers such as Sonatype’s Nexus allow for specifying policies with regard to retention of outdated snapshot releases but I’ve unfortunately no experience with such beasts. For simpler scenarios where maven is configured to dump snapshots directly to the file system I’ve prepared a small shell script in groovy which produces a list of the absolute path to repo files which can be safely nuked. Set up your favorite build server (i.e. Hudson) to execute:
snapshotCleaner.groovy | xargs rm
weekly and you will never be troubled with running out disk space again! (You may want to do a trial run without the rm
part before trusting Hudson with erasing the files for you on autopilot, though. :-))
The script uses a couple of groovy niceties including (obviously) the ability to execute groovy as a unix shell script, the GDK java.io.File extension eachFileRecurse
and the regex find operator =~
in a boolean context. Here’s the snapshot cleaner source:
#!/usr/bin/env /opt/groovy/bin/groovy def snapshotRepoPath = '/depot/maven_repo/snapshot-repository' long size = 0 new File(snapshotRepoPath).eachFileRecurse{File f -> try { if (isPartOfSnapshotRelease(f)) { if (!(f.getName() =~ getLatestDatePatternFromMavenMetaData(f))) { println "$f" size += f.size() } } } catch (Exception e) { System.err.print("For $f received exception $e") } } System.err.println " Total disk space consumed by returned files ${size/(1024*1204)} MB." def boolean isPartOfSnapshotRelease(File snapshotCandidateFile) { if (snapshotCandidateFile.isDirectory()) { return false; } if (snapshotCandidateFile =~ /maven-metadata/) { return false } boolean hasSubDirInParentDir = false snapshotCandidateFile.getParentFile().eachFile {File f -> if (f.isDirectory()) hasSubDirInParentDir = true; } return !hasSubDirInParentDir } def String getLatestDatePatternFromMavenMetaData(File file) { def x = new XmlSlurper().parse(new File(file.getParent() + File.separator + "maven-metadata.xml")) "${x.versioning.snapshot.timestamp}.${x.versioning.snapshot.buildNumber}" }
We just use small shell script:
find /devserver/maven/repos -type f -mtime +30 | egrep ‘.*[0-9]{8,8}\.[0-9]{6,6}.*’ | xargs -n 1 rm -v
and this works very well keeping only SNAPSHOT files not older than 30 days. 😉
Nice one there, Alexander! Like the briefness! 🙂
The groovy script actually deletes *all* SNAPSHOTs not being of the latest build number, do you know if there is any advantage keeping SNAPSHOTs around which are not “the latest”?
Yes. I know. 😉
Generally we develop very complicated multi module/applications project. So we are just forced to have SNAPSHOTS for modules/libraries because these are used for other modules/applications. If we will remove SNAPSHOTs completely then we just will not be able to build some modules that depends from other SNAPSHOT versions.
BTW: this script do not kill SNAPSHOT itself but version numebered by build date/time created during every build together with SNAPSHOT.
It works for us about 5 or 6 years. Since we first discovered problem of free space on our continuous integration server when we yet used Maven 1.
So I can say that this very short but very powerful solution completely satisfy our goals for long time so probably it will satisfy your goals as well. 😉
Thanks for the script. I actually used your idea and did it in Perl. A bit longer, but I don’t have to start a JRE on my tiny Virtual Machine.