\datethis \def\falling#1{^{\ff{#1}}} \def\ff#1{\mkern1mu\underline{\mkern-1mu#1\mkern-2mu}\mkern2mu} @*Intro. This program, inspired by {\mc HISTOSCAPE-COUNT}, calculates the number of $m\times n$ ``whirlpool permutations,'' given $m$ and~$n$. What's a whirlpool permutation, you ask? Good question. An $m\times n$ matrix has $(m-1)(n-1)$ submatrices of size $2\times2$. An $m\times n$ whirlpool permutation is a permutation of $(mn)!$ elements for which the relative order of the elements in each of those submatrices is a ``vortex''---that is, it travels a cyclic path from smallest to largest, either clockwise or counterclockwise. Thus there are exactly eight $2\times2$ whirlpool permutations. If the entries of the matrix are denoted $abcd$ from top to bottom and left to right, they are 1243, 1423, 2134, 2314, 3241, 3421, 4132, 4312. One simple test is to compare $a:b$, $b:d$, $d:c$, $c:a$; the number of `$<$' must be odd. (Hence the number of `$>$' must also be odd.) The enumeration is by a somewhat mind-boggling variant of dynamic programming that I've not seen before. It needs to represent $n+1$ elements of a permutation of~$t$ elements, where $t$ is at most $mn$, and there are up to $(mn)\falling{n+1}$ such partial permutations; so I can't expect to solve the problem unless $m$ and $n$ are fairly small. Even so, when I {\it can\/} solve the problem it's kind of thrilling, because this program makes use of a really interesting way to represent $t\falling{n+1}$ counts in computer memory. The same method could actually be used to enumerate matrices of permutations whose $2\times2$ submatrices satisfy any arbitrary relations, when those relations depend only the relative order of the four elements. (Thus any of $2^{24}$ constraints might be prescribed for each of the $(m-1)(n-1)$ submatrices. The whirlpool case, which accepts only the eight relative orderings listed above, is just one of zillions of possibilities.) It's better to have $m\ge n$. But I'll try some cases with $m #include int m,n; /* command-line parameters */ unsigned long long *count; /* the big array of counts */ unsigned long long newcount[maxmn]; /* counts that will replace old ones */ unsigned long long mems; /* memory references to octabytes */ int x[maxn+1]; /* indices being looped over */ int ay[maxn+1]; int l[maxmn],u[maxmn]; int tpow[maxmn+1]; /* factorial powers $t\falling{n+1}$ */ @; main(int argc,char*argv[]) { register int a,b,c,d,i,j,k,p,q,r,mn,t,tt,kk,bb,cc,pdel; @; for (i=1;i; @; } @ @= if (argc!=3 || sscanf(argv[1],"%d", &m)!=1 || sscanf(argv[2],"%d", &n)!=1) { fprintf(stderr,"Usage: %s m n\n", argv[0]); exit(-1); } mn=m*n; if (m<2 || m>maxn || n<2 || n>maxn || mn>maxmn) { fprintf(stderr,"Sorry, m and n should be between 2 and %d, with mn<=%d!\n", maxn,maxmn); exit(-2); } for (k=n+1;k<=mn;k++) { register unsigned long long acc=1; for (j=0;j<=n;j++) acc*=k-j; if (acc>=0x80000000) { fprintf(stderr,"Sorry, mn\\falling(n+1) must be less than 2^31!\n"); exit(-666); } tpow[k]=acc; } count=(unsigned long long*)malloc(tpow[mn]*sizeof(unsigned long long)); if (!count) { fprintf(stderr,"I couldn't allocate %d entries for the counts!\n", tpow[mn]); exit(-4); } @ Suppose I want to represent $n+1$ specified elements of a permutation of $t+1$ elements. For example, we might have $n=3$ and $t=8$, and the final four elements of a permutation $y_0\ldots y_8$ might be $y_5y_6y_7y_8=3142$. There are $(t+1)\falling{n+1}$ such partial permutations, and I can represent them compactly with $n+1$ integer variables $x_{t-n}$, \dots, $x_{t-1}$,~$x_t$, where $0\le x_j\le j$. The rule is that $x_j$ is $y_j-w_j$, where $w_j$ is the number of elements ``inverted'' by $y_j$ (the number of elements to the right of $y_j$ that are less than~$y_j$). In the example, $w_0w_1w_2w_3=2010$, so $x_5x_6x_7x_8=1132$. (Or going backward, if $x_5x_6x_7x_8=3141$ then $y_5y_6y_7y_8=6251$.) This representation has a beautiful property that we shall exploit. Every permutation $y_0\ldots y_t$ of $\{0,\ldots,t\}$ yields $t+2$ permutations $y'_0\ldots y'_{t+1}$ of $\{0,\ldots,t+1\}$ if we choose $y'_{t+1}$ arbitrarily, and then set $y'_j=y_j+\hbox{[$y_j{\ge}y'_{t+1}$]}$. For example, if $t=8$ and $y_5y_6y_7y_8=3142$, the ten permutations obtained from $y_0\ldots y_8$ will have $y'_5y'_6y'_7y'_8y'_9= 42530$, 42531, 41532, 41523, 31524, 31425, 31426, 31427, 31428, or 31429. And the representations $x'_5x'_6x'_7x'_8x'_9$ of those last five elements will simply be respectively 31420, 31421, \dots, 31429! In general, we'll have $x'_j=x_j$ for $0\le j\le t$, and $x'_{t+1}=y'_{t+1}$ will be arbitrary. @ Now comes the mind-boggling part. I want to maintain a count $c(x_{t-n},\ldots,x_t)$ for each setting of the indices $(x_{t-n},\ldots,x_t)$, but I want to put those counts into memory in such a way that I won't clobber any of the existing counts when I'm updating $t$ to $t+1$. In particular, if $x'_{t+1}\le t-n$, I'll want $c(x'_{t+1-n},\ldots,x'_t,x'_{t+1})$ to be stored in exactly the same place as $c(x'_{t+1},x_{t+1-n},\ldots,x_t)$ was stored in the previous round. But if $x'_{t+1}>t-n$, I'll store $c(x'_{t+1-n},\ldots,x'_t,x'_{t+1})$ in a brand-new, previously unused location of memory. Thus we shall use a memory mapping function $\mu_t$, different for each~$t$, such that $c(x_{t-n},x_{t-n+1},\ldots,x_t)$ is stored in location $\mu_t(x_{t-n},x_{t-n+1},\ldots,x_t)$ during round~$t$ of the computation. Let's go back to the example in the previous section and apply it to whirlpool permutations for $n=3$. Denote the permutation in the first three rows by $y_0\ldots y_8$, where $y_6y_7y_8$ is the third row and $y_5$ is the last element of the second row. (It's a permutation of $\{0,\ldots,8\}$, representing the relative order of a final permutation of $\{0,\ldots,3m-1\}$ that will fill the entire matrix.) At this point we've calculated counts $c(x_5,x_6,x_7,x_8)$ that tell us how many such partial whirlpool permutations have any given setting of $y_5y_6y_7y_8$. In particular, $c(1,1,3,2)$ counts those for which $y_5y_6y_7y_8=3142$. To get to the next round, we essentially want to know how many partial permutations $y'_0\ldots y'_9$ of $\{0,\ldots,9\}$ will have a given setting of $y'_6y'_7y'_8y'_9$; the second row is now irrelevant to future computations. It's the same as asking how many permutations have $y_6y_7y_8=142$. Answer: $c(0,1,3,2)+c(1,1,3,2)+c(2,1,3,2)+c(3,1,3,2)+c(4,1,3,2)+c(5,1,3,2)$, because these count the permutations with $y_5y_6y_7y_8=0142$, 3142, 5142, 6142, 7142, 8142. Those six counts $c(k,1,3,2)$ appear in memory locations $\mu_8(k,1,3,2)$, for $0\le k\le5$. On the next round we'll want $c'(x'_6,x'_7,x'_8,x'_9)=c'(1,3,2,x'_9)$ to be set to their sum. These new counts will appear in memory locations $\mu_9(1,3,2,x'_9)$. So we'd like $\mu_9(1,3,2,k)=\mu_8(k,1,3,2)$ when $0\le k\le5$. Let $\lambda_t(x_{t-n},\ldots,x_t)= \bigl(\cdots((x_tt+x_{t-1})(t-1)+x_{t-2})\cdots\bigr)(t-n+1)+x_{t-n}= x_t t\falling n+x_{t-1}(t-1)\falling{n-1}+\cdots x_{t-n}(t-n)\falling0$ be the standard mixed-radix representation of $(x_t\ldots x_{t-n})$ with radices $(t+1,t,\ldots,t-n+1)$. When each $x_j$ ranges from 0 to~$j$, $\lambda_t(x_{t-n},\ldots,x_t)$ ranges from $\lambda_t(0,\ldots,0)=0$ to $\lambda_t(t-n,\ldots,t)=(t+1)\falling{n+1}-1$. Therefore $\lambda_t$ would be the natural choice for $\mu_t$, if we didn't want to avoid clobbering. Instead, we use $\lambda_t$ only when $x_t$ is sufficiently large: We define $$\mu_t(x_{t-n},\ldots,x_t)=\cases{ \lambda_t(x_{t-n},\ldots,x_t),&if $x_t\ge t-n$;\cr \mu_{t-1}(x_t,x_{t-n},\ldots,x_{t-1}),&if $x_t\le t-n-1$.\cr}$$ This recursion terminates with $\mu_n=\lambda_n$, because $x_n$ is always $\ge0$. One can also show that $\mu_{n+1}=\lambda_{n+1}$. Back to our earlier example, what is $\mu_8(k,1,3,2)$? Since $2\le4$, it's $\mu_7(2,k,1,3)$. And since $3\le3$, it's $\mu_6(3,2,k,1)$. Which is $\mu_5(1,3,2,k)$. Finally, therefore, if $k\le1$, the value is $\lambda_4(k,1,3,2)=68+k$; but if $2\le k\le5$ it's $\lambda_5(1,3,2,k)=60k+34$. In this program we will keep $x_j$ in location $x_{j\bmod(n+1)}$. Consequently the arguments to $\mu_t$ and $\lambda_t$ will always be in locations $(x_{(t+1)\bmod(n+1)},x_{(t+2)\bmod(n+1)},\ldots, x_{t\bmod(n+1)})$. @= int mu(int t) { register int r,a,p,tt; for (r=t%(n+1), tt=t;o,x[r]= x1:@+for (k=0;k<=t;k++) o,l[k]=k+1; o,l[t+1]=0; /* circularly linked list */ k=0, kk=t%(n+1); x2:@+if (k==n) @; oo,p=t+1,q=l[p],x[kk]=0; x3:@+o,ay[k]=q; x4:@+ooo,u[k]=p,l[p]=l[q],k++,kk=(kk?kk-1:n); goto x2; x5:@+o,p=q,q=l[p]; if (q<=t) { oo,x[kk]++; goto x3; } x6:@+if (--k>=0) { kk=(kk==n?0:kk+1); ooo,p=u[k],q=ay[k],l[p]=q; goto x5; } @ At this point we're ready to do the ``inner loop'' calculation, by using all counts $c(x_{t-n},\ldots,x_t)$ for $0\le x_{t-n}\le t-n$ to obtain updated counts that will allow us to increase~$t$. The array $a_{n-1}\ldots a_0$ corresponds to $y_{t-n+1}\ldots y_t$ in the discussion above; we want to loop over all choices for $y_{t-n}$, namely all choices for $a_n$. Fortunately there's a linked list containing precisely those choices, beginning at |l[t+1]|. @= { @; for (d=0;d<=t+1;d++) o,newcount[d]=0; oo,b=ay[n-1],c=ay[0]; if (bcc) { /* whirlpool constraint when |a| not middle */ for (d=bb+1;d<=cc;d++) oo,newcount[d]+=tmp; }@+else { /* whirlpool constraint when |d| not middle */ for (d=0;d<=bb;d++) oo,newcount[d]+=tmp; for (d=cc+1;d<=t+1;d++) oo,newcount[d]+=tmp; } } if (pdel) { for (d=0;d<=t-n;d++) oo,count[p+d*pdel]=newcount[j?d:0]; for (;d<=t+1;d++) ooo,x[kk]=d,count[mu(t+1)]=newcount[j?d:0]; }@+else { for (d=0;d<=t+1;d++) ooo,x[kk]=d,count[mu(t+1)]=newcount[j?d:0]; } } goto x6; } @ Our example of $\mu_8(k,1,3,2)$ shows that the mission of this step is sometimes impossible. But the addressing scheme is usually simple, so I decided to exploit that fact. (Being aware, of course, that premature optimization is the root of all evil in programming.) @= for (tt=t,a=0,r=t%(n+1); a=tt-n) break; if (a==n) pdel=0; /* a difficult case */ else { for (p=pdel=0,a=0;a<=n;a++,r=(r?r-1:n)) { if (r!=kk) p=p*(tt+1-a)+x[r],pdel=pdel*(tt+1-a); else p=p*(tt+1-a),pdel=pdel*(tt+1-a)+1; } } @ @= { t=n*i+j-1; if (t; fprintf(stderr," done with %d,%d ..%lld, %lld mems\n", i,j,count[0],mems); } @ @d thresh 1000000000000000000 @= for (newcount[0]=newcount[1]=newcount[2]=0,p=tpow[mn]-1;p>=0;p--) { if (count[p]>newcount[2]) newcount[2]=count[p],pdel=p; o,newcount[0]+=count[p]; if (newcount[0]>=thresh) ooo,newcount[0]-=thresh,newcount[1]++; } fprintf(stderr,"(Maximum count %lld is obtained for params", newcount[2])@t)@>; for (q=mn-n-1;q," %d",pdel%(q+1)); pdel/=q+1; } fprintf(stderr,")\n"@t(@>); if (newcount[1]==0) printf("Altogether %lld %dx%d whirlpool perms (%lld mems).\n", newcount[0],m,n,mems); else printf("Altogether %lld%018lld %dx%d whirlpool perms (%lld mems).\n", newcount[1],newcount[0],m,n,mems); @*Index.