Some of us have come to learn that recursion is slow and impractical.
In the days when the computer had 64K or 256K of memory, recursive
programs would run of out memory because they repeatedly push
arguments to the runtime stack. Memory itself was slow. These machines
did not have any processor caches. So if performance was a required,
a program would have to be tuned to minimize memory access and do
all its bookkeeping in registers. That was just impossible with
recursive programs.
Our recursive programs ran out of memory quickly. Not having faith in
our hardware engineers, we did whatever we could to come up with
nonrecursive solutions. Although the recursive program for the Tower
of Hanoi puzzle was not very memory intensive, people still wanted
nonrecursive solutions. Here is a very short
one:
void tower(unsigned int n) {
unsigned int c, i, zeros, m = 1 << n;
for(i=1; i < m; i++) {
for(c=i, zeros=1; !(c&1); c>>=1, zeros++);
printf("%d: move disc %d from %d to %d\n", i, zeros,
(i&i1)%3, ((ii1)+1)%3);
}
}
We are not going to explain much of this program because we would have
to type up some proofs by induction. But we can still analyze its
performance. The program uses four unsigned ints. It can be
optimized to do all the processing in registers, except the call to
printf. We ignore the cost of printf in our analysis and
focus on the actual computation. In the following paragraphs, an
operation refers to a mathematical concept while an
instruction refers to a machine instruction. Unless explicitly
noted, we assume the an operation can be implemented with one
instruction, and that one instruction completes in one
clock cycle. The former is practically true; but the latter is
a simplification of reality.
First, we analyze the mysterious computation of zeros, which is
actually one plus the number of ending zeros of the binary
representation of i. If i=36, then zeros=3, because
36 is 10100_{2}, which ends in 2 zeros.
How many times do we have to compute !(c&1), c>>=1 and
zeros++? Half of the time, i is odd. So we compute
only !(c&1), requiring two operations. If c is even,
then we have to keep rightshifting until it becomes odd. Each iteration
of this loop involves !(c&1), c>>=1 and zeros++,
requiring four operations. Therefore, the total number of operations
is: 2 · (2^{n}  1) + 4 · (1 ·
2^{n2} + 2 · 2^{n3} + 3 ·
2^{n4} + ... + (n1) · 2^{nn}), which
equals 6 · 2^{n}  4 · n  14. If we count
more rigorously, when n is even, we need 2^{n}  n 
3 jump instructions to complete the inner loops. And for all
n, we need to jump into or out of the inner loop, requiring
2^{n}1 jump instructions.
There are 7 arithmetic operations in the computation of the
arguments to printf. Adding the i++ and the i <
m, we have a total of 9 operations for each of the
2^{n}1 steps. We also need one jump instruction for
each iteration of the main loop. We make a final approximation of
15 · 2^{n} arithmetic operations and 3 ·
2^{n} jump instructions for the entire procedure. (We
count jump instructions separately because a "jump" in the program
counter may screw up the instruction cache and flush the ALU
pipeline in RISC processors. It is an instruction that may take
more than one clock cycle).
