Using makefiles in bioinformatics pipelines

Recently I am involved in many projects where parsing text files is necessary, and I use small scripts to archive the task. However the code should support pipelines and I found makefiles very handy.

Makefiles are efficiently used in software development to define a ruleset to automatically build programs and they have been used since the end of ‘70-es. The command called make reads makefiles (named: makefile or Makefile) and executes the commands within and produces the desired build. Moreover make can also invoke scripts, therefore it is a very handy utility in bioinformatics.

A makefile consists of multiple rules, and these rules define what components are needed to create the target using a script. The rules can be compared to a recipe in a cookbook, what ingredients are needed to cook a food, and the actual recipe the script which “compiles” our ingredients into the food on the table.

fried_chicken: raw_chicken oil

Formally speaking:

target1 [target2 ...]: [component1 component2 ...]
	[<TAB>command 1]
	[<TAB>command 2]

On the left side the targets need to be defined, on the right side just after the colon the necessary components need to be stated, however they are not necessary e.g.: creating a file by just “touching it”. Afterwards the commands are listed and they will create the targets. Usually make’s basic interpreter executes commands by using Unix’s default shell, the /bin/sh , so cat, cp, rm etc… commands can be invoked.

Another nice feature of make that a target can be a component. For example:

dinner: fried_chicken baked_potatoes

fried_chicken: ...

baked_potatoes: ...

Here if we issue make dinner , make will check whether fried_chicken and baked_potatoes exist, if not it will call those rules as well.

There are two common targets: all and clean. Programmers define all target to create every target, while clean  is responsible of launch an rm command to clean up the build environment.

There is another advantage of makefiles. Let us assume that that we have an all target and one of the components has been updated. (E.g.: A newer source file has been downloaded from the internet and it has a newer timestamp). After we issue make all again, it will discover that component is newer and call any target where that component has been listed. This feature allows to build up pipelines.

Before executing make, we may be interested what will be done.

make –n target1 target2 ...

Calling make with –n will show what commands will be issued upon a real execution.

I hope that, this article gave some brief introduction to make. There are a few links about make that I found useful:

Advanced Makefile Tricks – it is described here how to use special macros. This is very useful e.g.: passing components as arguments for the commands, pipeing output to the target etc.

Make (software) – Wikipedia entry about make where its history described and some examples are shown.