← All posts
The convenience of Makefile in the era of Python and AI
Table of contents
GNU Make: A forgotten friend
No one has uttered the word "Makefile" in my vicinity in the last 10 years of AI research spanning numerous projects and codebases, all written primarily in Python. And why would they? After all, Make is popularly used as a build automation tool for compiled languages like C. In an interpreted language like Python, there is nothing to build. There are beautiful libraries like python-fire, click, or argparse to create easy-to-use command-line interfaces (CLIs) and you can directly run your Python scripts passing in your arguments. But with all its apparent simplicity and beauty, one question has been nagging me for a while: how do I manage the many commands, each with their own set of arguments, without having to retype them every time?
Here's a specific situation I found myself in recently. While debugging a codebase, I had to run a series of python commands repeatedly. Sometimes, I would want to run them all in a specific order, and sometimes I would want to run and test just one of them. Some of these arguments were file paths that depended on the execution of previous commands. Cycling between these commands quickly became annoying.
Now you might ask, "Why didn't you just write a bash script?"
I could, but writing a single bash script that supports running a single command as well as sequence of commands would have required implementing a non-trivial amount of logic...in bash. Ugh!
But surely, there must be a simpler, more elegant way to achieve this kind of orchestration? And that's when the lord's light shone upon me through an angel of tokens (gpt-5.2) and I found the profound answer laid bare before me: GNU Make.
When to use Make and how?
In this post, I want to convince you to welcome Make back into your lives with a few concrete use cases. Each of these use cases illustrates a situation where you might consider using Make and demonstrates the convenience of how.
1. You want a named set of standard runs
You often have a handful of “known good” command variants. Instead of keeping them in a notes app, encode them as targets.
Makefile
baseline:
python -m train --lr 1e-4
lower_lr:
python -m train --lr 1e-5
warmup:
python -m train --lr 1e-4 --warmup_steps 1000
Now you can run these with following short hands instead of typing out the full command each time:
make baseline
make lower_lr
make warmup
This is low-tech, but the ergonomics are great: every variant is named, versioned, reviewable, and lives next to the code. While collaborating, this reduces “which exact flags did you use?” ambiguity in conversations and PRs.
2. You want reasonable defaults you can override per run
Make variables provide a nice pattern: set defaults in one place, override them at invocation time.
Makefile
lr ?= 1e-4
warmup_steps ?= 1000
train:
python -m train --lr $(lr) --warmup_steps $(warmup_steps)
If you want to keep defaults:
make train
If you want to override:
make train lr=3e-5 warmup_steps=500
This feels much more natural than bash where you would have to write a tiny command-line argument parser yourself as follows:
script.sh
lr="1e-4"
warmup_steps="1000"
# a tiny argument parser that iterates over arguments and sets variables
while [[ $# -gt 0 ]]; do
case "$1" in
--lr) lr="$2"; shift 2;;
--warmup_steps) warmup_steps="$2"; shift 2;;
*) echo "Unknown arg: $1" >&2; exit 1;;
esac
done
python -m train --lr "$lr" --warmup_steps "$warmup_steps"
3. You want to orchestrate a pipeline
Consider a common ML workflow:
- download data to a
{outdir}/downloadsdirectory - preprocess the downloaded data and save to a
{outdir}/processeddirectory - train on this preprocessed data and save checkpoints to a
{outdir}/ckptsdirectory
This is perfect for Make. Not only do you get named steps and overridable variables, but it is easy to see the path dependencies in the arguments, and the workflow reads like a story.
Makefile
outdir ?= path/to/output
download_dir ?= $(outdir)/downloads
preprocess_path ?= $(outdir)/processed
ckpt_path ?= $(outdir)/ckpts
.PHONY: download preprocess train all
download:
python -m download --outdir $(download_dir)
preprocess:
python -m preprocess --input_dir $(download_dir) --output_dir $(preprocess_path)
train:
python -m train --input_dir $(preprocess_path) --ckpt_dir $(ckpt_path)
all: download preprocess train
You can now run the pipeline with
make all
or run a single step in the pipeline (for instance, while debugging)
make preprocess
4. You need dead simple parallelization
Consider an eval pipeline where you wish to shard inference across 2 processes (eg. sharding by sample ids) and then compute aggregated metrics across all inferred outputs.
In the Makefile below, infer_proc1 and infer_proc2 are defined to shard inference across 2 processes. The infer target depends on both infer_proc1 and infer_proc2, and compute_metrics depends on infer. Finally, eval depends on infer and compute_metrics.
Makefile
.PHONY: infer_proc1 infer_proc2 infer compute_metrics eval
infer_proc1:
python -m infer --start_idx 0 --end_idx 999
infer_proc2:
python -m infer --start_idx 1000 --end_idx 2000
infer: infer_proc1 infer_proc2
compute_metrics: infer
python -m compute_metrics
eval: infer compute_metrics
Now we can make use of parallelism with
make -j2 eval
This will run infer_proc1 and infer_proc2 in parallel, and when inference is complete will run compute_metrics. This pattern scales nicely: you can add more shards (infer_proc3, infer_proc4) and bump the job count (-j4) without touching your Python code.
A few quality-of-life tips
If you keep using Make, these tiny additions make things smoother.
Using multiple Makefiles in a project
Say you put 2 makefiles in a 'makefiles/' directory. You can specify which one to run by using the -f flag:
make -f makefiles/train.mk train lr=1e-5 warmup_steps=500
make -f makefiles/eval.mk eval
Mark targets as phony
make normally treats targets as files and uses timestamps to decide whether they need rebuilding. For “run this command” targets like train or clean, mark them as .PHONY so they always run—even if a file named train happens to exist.
.PHONY: download preprocess train all
Add a params target
If you are using variables, particularly ones that depend on other variables, it’s useful to have a paramstarget that prints them out. You might event want to have other targets depend on params to ensure they are printed before running any other target.
lr=1e-5
batch_size=32
num_epochs=10
.PHONY: params train
params:
@echo "variables:"
@echo " lr=$(lr)"
@echo " batch_size=$(batch_size)"
@echo " num_epochs=$(num_epochs)"
train: params
python -m train --lr=$(lr) --batch_size=$(batch_size) --num_epochs=$(num_epochs)
The @ symbol is used to suppress the echoing of the command itself. This is useful when you want to print the output of a command without showing the command itself.
Add a help target
Similar to params, it could be useful for future-you (and teammates) to see common ways to use your Makefile.
help:
@echo "Targets:"
@echo " make baseline"
@echo " make warmup"
@echo " make train lr=1e-5 warmup_steps=500"
@echo " make all"
@echo " make -j2 eval"
Closing thought
Makefiles are a stable interface for repeatable commands and workflows: minimal, easy to run, easy to version, easy to review. Next time you reach for a quick bash script or copy paste commands to and fro from a notes app, consider spending 60 seconds putting it behind a Make target instead.