This post is the continuation of my answer to the elves, cakes and gremlins puzzle.

One remaining question is the validity of the assumption that its quick and easy to find a value of S that is coprime to the value of L.

I blithely asserted this easy, as the probability that two numbers are coprime is 6/π2. Whilst this is true (here‘s a proof and some other interesting information on coprime – or relatively prime – numbers), it’s only true if you pick both numbers randomly. In fact, for certain values of L the number of coprime numbers smaller than it is far smaller.

Euler’s totient function \(\phi(n)\) is the number of coprimes of an integer n smaller than n. So, with L lockers there are only \(\phi(L)\) distinct locker search permutations we can pick. With a maximum of E elves searching at once, we know that the probability of two using the same locker search sequence is \(E / \phi(n)\), and whilst we know \(E \ll L\), we do not know \(E \ll \phi(L)\). Further, we hunt for coprimes by taking a random guess S, and incrementing modulo L-1 until we have a coprime; if L is huge, how do we know we won’t be looping for a very long time?

The formula for Euler’s totient function can be expressed as follows:

\(\phi(n) = n \prod_{p|n} (1 – \frac 1 p)\) where \(p\) is prime

Is there a lower bound for \(\frac {\phi(n)} n\)? Clearly this is at its lowest where n has the largest number of prime factors, i.e. primorials. It turns out there is no constant lower bound on this ratio, but a lower bound can be given by (for \(n \ge 3\)):

\(\phi(n)\geq \frac{n}{e^{\gamma}\log \log n}+O\left(\frac{n}{(\log \log n)^2}\right)\)

where \(\gamma\) is the Euler-Mascheroni constant, i.e. about 0.57721.

Now, \(\log \log(n)\) is pretty small. With 30 lockers (a primorial and thus meeting the lower bound), \(\phi({30}) = 6\), and \(\frac {\phi(n)} n\) = 0.2.  Where we have around \(2^{256}\) lockers, the lower bound on \(\frac {\phi(n)} n\) is still about 0.1, i.e. at least one tenth of the numbers less than n (where n is around \(2^{256}\)) are coprime to n. This is good news as it suggests there are always going to be a good number of lockers to pick, even when the number of lockers approaches the number of atoms in the universe (\({10}^{80}\)). In other words we have a similar sort of probability of SHA-256 hash collisions, and as the net result of such a collision would be slightly less efficient searching, we can relax.

However, we still don’t know there aren’t vast tracts of non-comprime values of \(S\) which are adjacent. If that were the case, finding a coprime value of \(S\) might (occasionally) itself take a long time, and because we hunt through the space in order, we might still end up with lots of elves using the same value S. For instance, if the coprimes of \(L\) were all located in the bottom 5% and top 5% of the numbers between \(0\) and \(L\), then 90% of the time we’d have an expensive hunt through a range barren of coprimes, and end up with the first coprime in the top 5%. Intuitively this seems unlikely as the coprimes will be evenly distributed, especially given the pretty picture here. But can we prove it? I’m still thinking about this one. The maximum length of a sequence of integers smaller than \(N\) and non-coprime to \(N\) is given by Jacobsthal’s function (or more accurately the Jacobsthal function less one), and seems to be no more than about \(2 \log (N)\), and the average sequence length in practice much smaller – far smaller than \(\log \log(N)\). Suggestions on how to prove this appreciated1.

An alternative route (which is I think slower, but given a true random number generator would be fine), would be to simply pick \(S \in \mathbb{Z^+}\) in the range \(0 \lt S \lt L-1\), and repeat until \(\gcd (S, L) = 1\). As there are are \(\phi(L)\) coprimes to \(L\), this will succeed with probability

\(p = \frac {\phi(L)} {L-1}\)

which means one can expect success in:

\(E(n) = \frac 1 {p} = \frac {L-1} {\phi(L)}\)

iterations. Where \(L\) is large, \(L \approx L-1\), hence we can use the simpler estimate:

\(p = \frac {\phi(L)} {L}\)

which leads us to

\(E(n) \leq {e^{\gamma}\log \log L}\)

We know for values of lockers smaller than \(2^{256}\), we would expect fewer than 11 iterations (as per the above calculation).

So, for those worried about my unproven number theory, an alternative algorithm is set out below, which bails out if it can’t find a coprime step after 100 attempts (which should happen less than 0.002% of the time with even with  \(2^{256}\) lockers, though it makes me feel safer in case of random number generator problems) and falls back to using a step of 1. In pseudo-C again:

int placeCake(int L)
{
    const int maxsteps = 100;
    int s = 1; /* guaranteed to be coprime */
    int n;
    int c;

    /* Repeat until we get a step coprime with L
     * Bail out after maxsteps steps and just use 1.
     * Probability analysis says this should almost
     * never happen, and the result is marginally
     * less efficient placement
     */
    for (c = 0; c < maxsteps; c++)
    {
        /* Pick a random step between 1 and L-1 inclusive */
        n = 1 + rand() % (L-2);
        if (gcd (n, L) == 1)
        {
            s = n; /* use this if it's coprime */
            break;
        }
    }

    /* now we know the step is coprime to L */

    /* pick a random first locker */
    n = rand() % L;

    /* L iterations covers all lockers as gcd(L,S)=1 */
    for (c = 0; c < L; c++)
    {
        if (!placeCakeIfEmpty(n))  /* returns 0 on success */
            return n;
        n = (n+S) % L;
    }

    return -1;
}

 


Note 1: A late edit. It appears that if \(j(N)\) is the Jacobsthal function, then the upper bound:

\(j(N) \ll \log \log (N)\)

was proved by: H. Iwaniec (On the problem of Jacobsthal, Demonstratio Math. 11, 225–231, (1978) – seemingly unavailable online). So it would appear my first solution is not time bounded any worse than the second. As \(e^{\gamma}\gt 1\), that means the expected value using the second routine is greater than the maximum value using the first. This would imply the first routine (given in the previous post) is provably always faster.