The Stanford GraphBase: A Platform for Combinatorial Computing

by Donald E. Knuth (New York: ACM Press, 1994), viii+576pp.
Co-published by Addison-Wesley Publishing Company.
ISBN 0-201-54275-7 (original hardcover edition)
ISBN 0-321-60632-9 (paperback edition, published 2009)

This award-winning book demonstrates the art of literate programming with more than 30 examples. Each example is a programmatic essay: a short story that can be read and enjoyed by human beings as readily as it can be read and interpreted by machines. The programs contain state-of-the-art explanations of many important algorithms and data structures. They also define a workbench for combinatorial computing, and standard sets of data that can be used for benchmark tests of competing methods, as well as demonstration programs and games that make use of the data.

Hundreds of example programs that use The Stanford GraphBase will be distributed electronically as supplements to Volume 4 of The Art of Computer Programming when that volume is available, because Knuth will be using The Stanford GraphBase for many of the examples in that book.

Major topics include

Available from the publishers, ACM Press or Addison-Wesley Publishing Company.

Public-domain sources for all programs and data of The Stanford GraphBase are freely available. They can be obtained, for example, by anonymous ftp from the master sources on ftp.cs.stanford.edu in directory ~ftp/pub/sgb. The programs are highly portable and have been installed on a wide variety of computers and operating systems.

Download sgb-words.txt, the 5757 five-letter words of English

Download contiguous-usa.dat, adjacencies between the contiguous United States and DC

Sample graphs from TAOCP:

Other example graphs:

This book was cited by the Association of American Publishers as the best new book in Computer Science, 1994.

There's a nice online writeup in the Stony Brook Algorithm Repository.

Challenge Problems

The book contains several yet-unsolved problems on which I'm hoping readers will make significant progress. One of the most fun problems is based on the 1990 football season: What is the biggest score Stanford can rack up over Harvard by a simple chain of results from that year? My best solution was 2279, but Buel Chandler has improved that to 2448 by applying genetic programming ideas to some of my algorithms. See Chandler's cool page about genetic football for further information.

News flash, 25 November 2001: Mark Cooke has just reported the discovery of a sequence by which Stanford racks up 2473 points over Harvard, together with a matching upper bound (found in February 2002)! Also, Harvard over Stanford by 2358; Penn State over Columbia by 2542 (and that's the max over all pairs of teams). Details to follow...

IMPORTANT ANNOUNCEMENT

An embarrassing off-by-one error was corrected in the programs that create random graphs (random_graph and random_bigraph). When making this change, I decided it was better to conform to the documentation than to remain totally compatible with past history. The graph that now is called random_graph(n,m,x,l,o,f,t,a,b,s), and which properly deserves that name, is the same as the graph that used to be called random_graph(n,m,x,l,o,f,t,a,b+1,s)---unless a=b, in which case the old program was correct and no change was necessary.

Please upgrade to the new versions of the SGB source programs, if your copy is dated earlier than July 1999. (I also made minor corrections in May 2007---worth making, but not crucial. They did, however, affect lisa.dat and jean.dat in tiny ways.)

“Defects” in the data files

Since the data files were prepared by hand, they are subject to human error. They should therefore not be considered to be definite sources of facts, which are correctible like an article in the Wikipedia. They are intended simply as forever-frozen examples of typical data that is more or less accurate.

In particular, I recently learned that I forgot to include any connection between Fantine and her infant daughter Cosette, when I summarized the encounters between the characters of Les Misérables in the data file jean.dat. Fantine and Cosette appear together in chapter 1.4.1 (and then, of course -- alas -- they never see each other again). Upon rereading the chapter, I see that my file should have had "1.4.1:TM,AZ,EP;TM,FN;FN,CO;AZ,EP,CO;TH,TM,FN" in place of the line "1.4.1:TM,FN;TH,TM,FN". However, I shall never update the file jean.dat, because it is “correct by definition.”

Errata

A list of corrections to all changes made to the first printings of this book can be found here.

Here is a list of all nits that have been picked so far since 2009. An asterisk (*) marks significant technical errors that are not merely typographical:

*page 136, line 11 (09 Aug 09)
change $\{0,1,2\}$ to $\{0,1,1\},\quad\{0,1,2\}
*page 136, line 13 (09 Aug 09)
change $\{1,1,1\}$ to $\{1,2,0\},\quad\{1,1,1\}
*page 136, line 18 (09 Aug 09)
change $\{0,2,2\}$ to $\{0,2,2\}$, $\{1,1,2\}$,
*page 241, line 5 of section 12 (30 Jan 21)
change prefix[5] to prefix[8]
page 438, line 3 of section 23 (06 Aug 15)
change "rds," to "rds’,"
page 474, line 11 (09 Mar 11)
change "the vertex of" to "the key of"
page 550, line 13 from the bottom (02 Jul 24)
change "(n1,n2)" to "(n1,n2,directed))"

I was tempted to change the definitions of INF on pages 82 and 101, because those existing 32-bit definitions fail on machines that treat long as 64-bit. [Instead of using <limits.h>, one can say (long)(((unsigned long)-1)>>1) to get the desired effect.] But I decided not to open that can of worms, because I haven't time to check everything for other 64-bit glitches. As always, readers who encounter system-dependent problems should fix them with change files, and share those change files (assuming that the changes have been made with loving care).

I will gratefully deposit 0x$1.00 ($2.56) to the account of the first person who finds and reports anything else that remains technically, historically, typographically, or politically incorrect. Please send suggested corrections to sgb@cs.stanford.edu, or send snail mail to Prof. D. Knuth, Computer Science Department, Gates Building 1B, Stanford University, Stanford, CA 94305-9015 USA.

PLEASE DO NOT SEND EMAIL TO SGB EXCEPT TO REPORT ERRORS! And if you do report an error via email, please do not include attachments of any kind; your message should be readable on brand-X operating systems for all values of X.

Don Knuth's home page

Don Knuth's other books

Valid HTML 4.01 Transitional