intertwingly

It’s just data

Continuations for Curmudgeons


This essay is for people who, in web years, are older than dirt.  More specifically, if there was a period of time in which you programmed in a language which did not have garbage collection, then I mean you.  For most people these days, that means that you had some experience with a language named C.

It is a pretty safe bet that — despite your deep technical background — you find that you have some genetic defect that makes it completely impossible for you to understand articles with titles like Continuations Made Simple and Illustrated.  And for some reason, that bothers you.

If so, then maybe this article is for you.  Think of yourself as a frog.  Prepare to be boiled.

Local Variables

Let me start with something far, far, away from continuations.  Something that you should be comfortable with.  It should bring back memories of battles you heroically fought, and mostly won, long ago.  This example is in C:

#include <stdio.h>

int* f(int n) {
  int result = n + 1;
  return &result;
}
 
main() {
  int *n1 = f(1);
  int *n2 = f(2);
 
  printf("%d\n", *n1);
}

OK, now let’s walk through this program.  First a function named f is defined.  It takes in one argument, an integer named n.  It adds one to that value, and stores the result in a local variable named result.  Finally, f returns the address of result.

At this point, any self respecting compiler should issue a warning.  In fact, gcc does exactly that:

test.c: In function 'f':
test.c:5: warning: function returns address of local variable

The problem is that local variables are stored on the stack.  And when f returns, it pops the stack, making this memory available for other uses.

In this program, main calls f twice.  Once with a value of 1 and once with a value of 2.  Quite likely, the second call will use the exact same storage layout as the first call, meaning that the first result will be overlaid.

In fact, on my machine, this is exactly what happens.  When I compile and run this example, I get the following result:

3

If you now look around, you will notice that all the tadpoles that never really programmed in C have jumped out of the pond.

ByValue vs ByReference

The above example could be fixed by using malloc.  Of course, this would introduce a leak, so calls to free would also have to be added.  Aren’t you glad that you are now programming in a language with garbage collection?

Even so, there still are quirks that will snare the unsuspecting programmer.  For this pair of examples, I will use JavaScript.  If you like, you can run these in your favorite web browser.

<script>
  var v = 5;
 
  function f() {
    v = v + 1;
    return v;
  }
 
  var x = f();
  x = x - 2;
  document.write(f());
</script>

In this script, v starts out at 5.  Function f is called, and v gets incremented, and the result gets returned and stored in a variable named x.  That variable is then decremented by two.  Then f is called again, v get incremented again, and the result is printed.

When I run this I get

7

Now, lets change this example slightly.

<script>
  var v = [5];
 
  function f() {
    v[0] = v[0] + 1;
    return v;
  }
 
  var x = f();
  x[0] = x[0] - 2;
  document.write(f()[0]);
</script>

The only difference between this script and the first script is that v is an array of length one.  However, this difference is significant, because arrays are allocated on the heap, so v is really just a reference to the array.  And after the first call, x is a reference to the same array, so when the value at index 0 is decremented by two, this affects the value referenced by v.

Sure enough, when I run the new script, I get

5

Inconsistent?  Yes.  Strange?  Yes.  But generally, this doesn’t cause too much problems.  In any case, you avoid SEGVs as things allocated on the heap never go out of scope.  They simply become eligible to be reclaimed when there is nothing left which references them.

If you are still with me, you don’t realize it yet, but you are about to become dinner.

Frames

Let’s recap.

While a stack can be seen as a consecutive sequence of bytes, in most languages there are either conventions (which are not enforced) or policies (which are) which prevent a function from “popping” off more data than they pushed onto the stack.  In other words, each active scope (e.g., a function) is separated from one another much in the same way as the little bars that you put in between groceries on the conveyor belt in line separate each person’s purchases.

A procedure call pushes the return address (the address of the instruction after the call instruction) onto the stack, along with any arguments, and creates a new frame.  A return statement uses this information to set the Instruction Pointer and Stack Pointer back to what it was as the point of call.

Within the procedure, arguments and local variables may be located based on their offset from the start of the stack segment.

Hashes

Everything so far has been very traditional computer science.  Now, let’s start to diverge from that a bit.  Instead of viewing a stack as a contiguous sequence of bytes, let’s view a stack as a linked list of frames.  And instead of viewing a frame as a contiguous sequence of bytes, let’s substitute a dictionary or hashtable.

Now instead of knowing that this data resides at a predetermined offset in the frame, you would look up x simply by looking up “x” in the frame’s hash table.

In case you didn’t notice, this organization of frames is more oriented towards dynamic languages.

callcc

Now, let’s revisit the notion of a procedure call with this organization of frames.  A procedure call creates a new frame, and stores the current context (consisting of both the instruction pointer and the stack pointer) in the new frame.  The syntax for creating a new frame and storing the current context into a global variable named $cc in Ruby is:

callcc{|$cc|}

The syntax for returning to this exact point, restoring both the instruction pointer and stack pointer, is:

$cc.call

This is naturally called a continuation, as you are continuing where you left off.

At this point, you are probably feeling a little bit disoriented.  But go back and reread the above few paragraphs and examples.  Other than the last examples being in a foreign language, all the words make sense.  You are probably wondering “is that all there is?” and “what’s the big deal?”.  If so, the answers are “pretty much yes”, and “read on”.

One thing to note here is that as long as $cc retains its current value, the stack frame it refers to can not be reclaimed by garbage collection.  Frames themselves are not automatically freed upon encountering return statements; instead they are subject to garbage collection.  Simply by “continuing” at a prior point in the execution, and creating a new continuation, call “stacks” become arbitrary trees.

Generators

At this point, discussions generally veer off into either contrived or complicated examples.  Let’s instead take a simple example where the machinery for continuations is not quite so raw and exposed: Python generators.  Here is an example that prints Fibonacci numbers less than 1000:

def fib():
  yield 0
  i,j = 0,1
  while True:
    yield j
    i,j = j,i+j
 
for n in fib():
  if n>1000: break
  print n

Instead of return statements, the fib function uses yield statements.  This yields one number at a time out of the inexhaustible supply of Fibonacci numbers.  The caller simply invokes that continuation as many times as desired, and all program state (instruction pointer, local variables) are restored, and the fib function starts back up where it left off.

Once the loop is exited, the state if the fib function is eligible for reclaiming.

Cool, eh?

If you try to recode this without continuations, you will find that you will need to return a “token” which contains all the relevant state, and that token will need to be passed back in on subsequent calls.  That’s pretty much what is happening under the covers.  (Note: you may be tempted to use global variables instead.  Don’t.  Your code would no longer be reentrant, and if there were two callers, they would get confused).

Cocoon

Now let’s look at a web application, this time in JavaScript.  A single function implements the entire form based application that involves the display of three (or four, depending on the options selected) web pages.  This is accomplished via the cocoon.sendPageAndWait primitive.

This style of programming is quite different than the resource oriented approach advocated by RESTSome, however, find it to be more intuitive.

This hearkens back to the days in Basic when you coded things like

10 print "Enter your name"
20 input a$
30 print "Hello, "; a$; "!"

When the computer got to the input statement, the operating system would suspend your program until the user provided an input.  At which point, your program would be paged back into memory and would pick up exactly where it left off.

SAX

A SAX parser is an example of a task that could be recast as a generator.  It consumes an InputSource byte by byte and occasionally emits startElement, endElement, characters and other events.  After each event, the parser picks back up where it left off.

While this organization is convenient for parser writers, many consumers find XML Pull Parsers easier to deal with.

Judicious application of continuations allows both constituencies to be satisfied.  From the user of the library’s perspective, they are using a pull parser.  From the person who codes up the SAX handler, each event simply invokes a continuation.  There is no memory bloat like you see with DOM style parsers.  Everybody is happy. 

UI

Event programming, in general, is for wizards.  These wizards live and breathe events.  But even they regress every once and a while.  You can tell when this happens when you see a modal dialog box.

With User Interface Continuations, even scripters can become £337 h4x0r UI gods.  Without a modal dialog box in sight.

Closures

Consider the following JavaScript:

var msg = "Hello, world!";
setTimeout(function(){alert(msg)}, 1000);

The use of setTimeout is common in JavaScript based animation.  The first parameter is a function to be invoked.  The second parameter is the number of miliseconds to wait until this string is executed.

The question you should be asking yourself is “in which context is this function to be evaluated?”  In particular, how does JavaScript go about looking for the value of msg?

The answer is simple: in the current context.  This, it turns out, is a closure, continuation’s more general cousin.  Wrapped up in a nice neat little package.

Continuations are bookmarks — they tell you where to pick up after you last left off. Closures are simply the current lexical stack — they represent the current local context and its immediate parent, recursively.

Classes

If the notion of a nested set of scopes residing on the heap seems foreign to you, it really shouldn’t. Here’s an example in Java:

public class Scopes {

  class Outer {
    class Inner {
       void alert() {System.out.println(msg);}
    }
    String msg = "Schweet!";
    Inner inner = new Inner();
  }

  Outer outer = new Outer();

  public static void main(String[] args) {
    String msg = "Bogus!";
     (new Scopes()).outer.inner.alert();
  } 
}

Epilogue

One thing that people don’t often tell you about continuations is that they don’t actually resume exactly where you left off.  After all, if they did, you would end up in an infinite loop.  The only thing that is really restored is a few pointers.  Everything that resides on the heap, or is stored externally in places like files and databases, remains unaffected when you invoke a continuation.  But like the ByValue or ByReference discussion above, this doesn’t cause problems very often.  In fact, it generally works out pretty much the way you would like it to, without you having to worry about it.

When you realize this, you realize that there really is no magic in continuations.  Sure, more things are put on the heap, but there are ways to optimize this.

Oh, and by the way, you probably taste like chicken right about now.