Blog of Andrés Aravena

Numbering the pages of a PDF

28 January 2018

I often have PDF files without page numbers. For example, when I print the exam questions. Nowadays I prepare my exams in Rmarkdown and compile them to HTML, which is the same format that my students will use. But when I print them on Google Chrome they do not have page numbers, or worse: they have (a wrong) date and show the filename in my computer. I used to change the date on my computer and upload the file to a secret folder on my blog, but this is too much trouble for such a small issue. Now I’m just printing without page numbers.

I was resigned to this situation, until my wife asked me to put page numbers into some of her PDF documents. Then I had to find a way to do it. Here is how I solved it.

How to add page numbers to a PDF

Adobe has a paid solution, and is not “command line friendly”. I found a good answer at Command Line FU “Add page numbers to a PDF”. Their suggestion is

enscript -L1 -b'||Page $% of $=' -o- < \
<(for i in $(seq "$(pdftk "$1" dump_data | grep "Num" | cut -d":" -f2)"); \
do echo; done) | ps2pdf - | pdftk "$1" multistamp - output "${1%.pdf}-header.pdf"

I had forgotten about enscript, a program that I used ten years ago to print my scripts on the PostScript printer. I tested the first part by doing

enscript -L1 -b'||Page $% of $=' -o- < <(for i in $(seq 1 429); do echo; done) | \
ps2pdf - test.pdf

and it worked as expected (like a charm). Then upgraded pdftk, since the version I had installed never worked. The official webpage had only a version for MacOS X 10.6, probably a 32 bit one. But there is a hidden version for 10.11 at https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_server-2.02-mac_osx-10.11-setup.pkg. I tested it with the option proposed to find the number of pages:

pdftk 'original.pdf' dump_data |grep Num

It worked, but the grep pattern has to be more specific. Using "Num" yields too many answers. Instead I just opened the original.pdf file and took note of the number of pages. The command I used to test was:

enscript -L1 -b'||Page $% of $=' -o- < \
<(for i in $(seq 1 429); do echo; done) | \
ps2pdf - | \
pdftk 'original.pdf' multistamp - output numbered.pdf

and the result was perfect… except for the location of the numbers.

Printing footers instead of headers

So far I could use the example code to get page numbers on the header, but not in the footer, despite using option --footer as indicated in the manual page. Asking to Google led me to a discussion about “printing Footers using enscript”. Apparently the PostScript code generated by enscript does not handle footers, but it can be fixed installing some code for fancy headers on the ~/.enscript/footer.hdr file. I followed the advice and tested it:

enscript --fancy-header=footer -L1 -b'||' --footer '||$%' -o- < \
<(for i in $(seq 1 429); do echo; done) | \
ps2pdf - 'test.pdf'

Now the file test.pdf had 429 white pages with the page number in the lower right side. The manual also showed options to change the font from Courier to something nicer, such as Times Roman at 10pt. The final command was:

enscript -F Times-Roman10 --fancy-header=footer -L1 -b'||' --footer '||$%' -o- < \
<(for i in $(seq 1 429); do echo; done) | \
ps2pdf - | \
pdftk 'original.pdf' multistamp - output 'numb.pdf'

Extra tricks learned

In bash we can use $(seq 1 429) instead of the regular backquote `seq 1 429`. It is easier to read, and sometimes easier to understand. In both cases the shell executes the command inside parenthesis/backquotes and its standard output became command line arguments for the outer command.

We can also use <(for i in $(seq 1 429); do echo; done) to inject the standard output of a command to the input of another. In this case we can easily use a pipe, like this:

for i in $(seq 1 429); do echo; done | \
enscript -F Times-Roman10 --fancy-header=footer -L1 -b'||' --footer '||$%' -o - | \
ps2pdf - | pdftk 'original.pdf' multistamp - output numb.pdf

Both shell expansions execute a command and handle its standard output. They differ in how they deliver this output to the next command. The $() syntax delivers the output as arguments, the <() syntax delivers in the standard input. Both are alternatives to classical syntax.

Summary

Next exam will have page numbers in the printed copy.