Alex Bligh's blog

Alex Bligh's personal blog

Browsing Posts in Uncategorized

This post is the continuation of my answer to the elves, cakes and gremlins puzzle.

One remaining question is the validity of the assumption that its quick and easy to find a value of S that is coprime to the value of L.

I blithely asserted this easy, as the probability that two numbers are coprime is 6/π2. Whilst this is true (here‘s a proof and some other interesting information on coprime – or relatively prime – numbers), it’s only true if you pick both numbers randomly. In fact, for certain values of L the number of coprime numbers smaller than it is far smaller.

Euler’s totient function \(\phi(n)\) is the number of coprimes of an integer n smaller than n. So, with L lockers there are only \(\phi(L)\) distinct locker search permutations we can pick. With a maximum of E elves searching at once, we know that the probability of two using the same locker search sequence is \(E / \phi(n)\), and whilst we know \(E \ll L\), we do not know \(E \ll \phi(L)\). Further, we hunt for coprimes by taking a random guess S, and incrementing modulo L-1 until we have a coprime; if L is huge, how do we know we won’t be looping for a very long time?

The formula for Euler’s totient function can be expressed as follows:

\(\phi(n) = n \prod_{p|n} (1 – \frac 1 p)\) where \(p\) is prime

Is there a lower bound for \(\frac {\phi(n)} n\)? Clearly this is at its lowest where n has the largest number of prime factors, i.e. primorials. It turns out there is no constant lower bound on this ratio, but a lower bound can be given by (for \(n \ge 3\)):

\(\phi(n)\geq \frac{n}{e^{\gamma}\log \log n}+O\left(\frac{n}{(\log \log n)^2}\right)\)

where \(\gamma\) is the Euler-Mascheroni constant, i.e. about 0.57721.

Now, \(\log \log(n)\) is pretty small. With 30 lockers (a primorial and thus meeting the lower bound), \(\phi({30}) = 6\), and \(\frac {\phi(n)} n\) = 0.2.  Where we have around \(2^{256}\) lockers, the lower bound on \(\frac {\phi(n)} n\) is still about 0.1, i.e. at least one tenth of the numbers less than n (where n is around \(2^{256}\)) are coprime to n. This is good news as it suggests there are always going to be a good number of lockers to pick, even when the number of lockers approaches the number of atoms in the universe (\({10}^{80}\)). In other words we have a similar sort of probability of SHA-256 hash collisions, and as the net result of such a collision would be slightly less efficient searching, we can relax.

However, we still don’t know there aren’t vast tracts of non-comprime values of \(S\) which are adjacent. If that were the case, finding a coprime value of \(S\) might (occasionally) itself take a long time, and because we hunt through the space in order, we might still end up with lots of elves using the same value S. For instance, if the coprimes of \(L\) were all located in the bottom 5% and top 5% of the numbers between \(0\) and \(L\), then 90% of the time we’d have an expensive hunt through a range barren of coprimes, and end up with the first coprime in the top 5%. Intuitively this seems unlikely as the coprimes will be evenly distributed, especially given the pretty picture here. But can we prove it? I’m still thinking about this one. The maximum length of a sequence of integers smaller than \(N\) and non-coprime to \(N\) is given by Jacobsthal’s function (or more accurately the Jacobsthal function less one), and seems to be no more than about \(2 \log (N)\), and the average sequence length in practice much smaller – far smaller than \(\log \log(N)\). Suggestions on how to prove this appreciated1.

An alternative route (which is I think slower, but given a true random number generator would be fine), would be to simply pick \(S \in \mathbb{Z^+}\) in the range \(0 \lt S \lt L-1\), and repeat until \(\gcd (S, L) = 1\). As there are are \(\phi(L)\) coprimes to \(L\), this will succeed with probability

\(p = \frac {\phi(L)} {L-1}\)

which means one can expect success in:

\(E(n) = \frac 1 {p} = \frac {L-1} {\phi(L)}\)

iterations. Where \(L\) is large, \(L \approx L-1\), hence we can use the simpler estimate:

\(p = \frac {\phi(L)} {L}\)

which leads us to

\(E(n) \leq {e^{\gamma}\log \log L}\)

We know for values of lockers smaller than \(2^{256}\), we would expect fewer than 11 iterations (as per the above calculation).

So, for those worried about my unproven number theory, an alternative algorithm is set out below, which bails out if it can’t find a coprime step after 100 attempts (which should happen less than 0.002% of the time with even with  \(2^{256}\) lockers, though it makes me feel safer in case of random number generator problems) and falls back to using a step of 1. In pseudo-C again:

int placeCake(int L)
{
    const int maxsteps = 100;
    int s = 1; /* guaranteed to be coprime */
    int n;
    int c;

    /* Repeat until we get a step coprime with L
     * Bail out after maxsteps steps and just use 1.
     * Probability analysis says this should almost
     * never happen, and the result is marginally
     * less efficient placement
     */
    for (c = 0; c < maxsteps; c++)
    {
        /* Pick a random step between 1 and L-1 inclusive */
        n = 1 + rand() % (L-2);
        if (gcd (n, L) == 1)
        {
            s = n; /* use this if it's coprime */
            break;
        }
    }

    /* now we know the step is coprime to L */

    /* pick a random first locker */
    n = rand() % L;

    /* L iterations covers all lockers as gcd(L,S)=1 */
    for (c = 0; c < L; c++)
    {
        if (!placeCakeIfEmpty(n))  /* returns 0 on success */
            return n;
        n = (n+S) % L;
    }

    return -1;
}

 


Note 1: A late edit. It appears that if \(j(N)\) is the Jacobsthal function, then the upper bound:

\(j(N) \ll \log \log (N)\)

was proved by: H. Iwaniec (On the problem of Jacobsthal, Demonstratio Math. 11, 225–231, (1978) – seemingly unavailable online). So it would appear my first solution is not time bounded any worse than the second. As \(e^{\gamma}\gt 1\), that means the expected value using the second routine is greater than the maximum value using the first. This would imply the first routine (given in the previous post) is provably always faster.

Here’s my answer to the elves, cakes and gremlins puzzle. I’m quite sure this is not the only solution, and it may well not be the best solution, but I believe it works.

I think the key to this puzzle is criterion 3: “Assuming there is at least one locker that remains empty throughout an elf’s attempt to find a free locker, it must find that empty locker in no more than L attempts“. Effectively this means that each elf has to check each locker for an empty one, and check each locker at most once. However, if each elf checks the lockers in the same order (or a cyclic permutation there of), we are going to get the bunching problem mentioned in the puzzle.

My solution is for each elf to pick a locker at random, check whether it is empty, and if not to pick another locker, proceeding through the lockers in a randomly chosen permutation. However, constructing a truly random permutation from a large set of lockers tends to be memory intensive. We don’t require every permutation to be equally likely; we merely require that there is little chance of collision with the permutations chosen by other elves (either simultaneously or before).

One simple solution to this is as follows. When the elf receives a cake to place, he should first determine a number of steps to advance \(S\) which is a randomly chosen number between \(1\) and \(L-1\) that is coprime with \(L\). The elf should then pick a random locker (assuming the lockers are numbered \(0 .. N-1\)), and if the locker has a cake in, move on by \(S\) steps (using modular arithmetic, i.e. if he reaches the end of the lockers, starting from the beginning) . As \(S\) and \(L\) are coprime, this will ensure each locker is visited exactly once.

Why? Because if after \(n\) steps the elf has moved on by \(nS\) lockers. For the elf to check the same locker twice, \(nS\) would have to be divisible by \(L\). But the smallest multiple of \(S\) divisible by \(L\) is \(LS\) as \(L\) and \(S\) are coprime. We can thus deduce that he’ll visit each locker exactly once. Another way to look at this is that the integers from \(0 .. N-1\) are isomorphic to a cyclic abelian group \(\mathbb{Z}/n\mathbb{Z}\).

If

\(C_n = \left \langle {g} \right \rangle\) is the cyclic group of order \(n\) which is generated by \(g\), then:

let

\(a \in C_n: a = g^i\)

and let

\(H = \left \langle {a} \right \rangle\)

i.e \(H\) is a subgroup of \(C_n\) generated by \(g^i\). Then:

\(\left|{H}\right| = \frac n {\gcd \left\{{n, i}\right\}}\) where gcd is the greatest common divisor.

Hence where \(n\) and \(i\) are coprime, the order of the subgroup is equal to the order of the group.

The remaining task is for the elf to find a value of \(S\) which is coprime to \(L\) and is between \(1\) and \(L-1\). This can easily be done by picking a random number between \(1\) and \(L-1\), and incrementing by one (modulo \(L-1\)), until the \(\gcd\) of \(S\) and \(L-1\) is \(1\) (i.e. they are coprime). This can easily be calculated using the Euclidean algorithm. With a little bit of handwaving (properly addressed in my next post), we can note that the probability of two numbers being coprime is \(6/\pi^2\), so it will find such a coprime pretty quickly, and there are many such coprimes for a large \(L\), certainly many more than the number of elves if \(L \gg E\).

This gives the algorithm as follows (in pseudo-C), which is pleasingly simple:

int placeCake(int L)
{
    int s;
    int n;
    int c;

    /* First pick a random step between 1 and L-1 (inclusive)*/
    s = 1 + rand() % (L-2);

    /* while loop guaranteed to terminate as gcd(1,L)=1 */
    while (gcd(s,L) != 1)
    {
        s = 1 + (s % (L-2));
    }

    /* now we know the step is coprime to L */

    /* pick a random first locker */
    n = rand() % L;

    /* L iterations covers all lockers as gcd(L,S)=1 */
    for (c = 0; c < L; c++)
    {
        if (!placeCakeIfEmpty(n))  /* returns 0 on success */
            return n;
        n = (n+S) % L;
    }

    return -1;
}

I’ve done another post on why finding a coprime step is fast, and why there are lots of them.

A while ago, following a line fault I wrote about why BT TotalCare is a total waste of money. I’m sad to say, following a line fault, the situation has not improved. This time, they fixed the fault in what I suppose is a reasonable period of time – that, you would have thought, was the hard part. But their customer communication (which you would have thought was the easy bit) was dreadful.

Once again, here’s the promise from BT’s own web site on TotalCare (with my emphasis):

Have you considered how much it would cost your business if your Business Phone Lines went down even for a short time? In lost customers, lost revenue, customer dis-satisfaction? … BT will respond within 4 hours of receiving your fault report and if the fault is not cleared during this period, we will advise you of progress.

And from another BT site (again my emphasis):

We guarantee to resolve a “Service Failure” in line with the care/service level you have chosen. For Total Care, this means 24 Hours after you report the fault, unless you have requested specific appointment date. The Total Care working week is 7 Days a week, 52 weeks per year.

What actually happened is:

  • I reported the fault at 17:55 yesterday (and, to be clear, the woman who took the call was helpful).
  • BT’s web based fault tracker never indicated any change until the fault was fixed, but suggested I rang BT for an update.
  • When I rang BT, I was effectively told off, and told (wrongly) that this was a 24 hour response service, not a 4 hour response service.
  • I received only one update from BT, and that was 16 hours and 50 minutes after the fault was reported, and was out of date.

A little digging suggests that whilst the 24 hour ‘guarantee’ is encoded in the contractual documentation (albeit with more holes in than a colander), the four hour response ‘promise’ is nowhere to be found. So, potential buyers of BT TotalCare, beware:

  • The marketing material might say you get a 4 hour response time, but contractually this appears to mean nothing (corrections from BT welcome).
  • As far as I can tell on any recent fault I’ve had, nothing substantive has ever happened out of business hours, even when the fault is at the exchange.
  • Don’t think you will actually get any useful fault updates from BT at all. That’s not to say they won’t fix the line (they did, and within the 24 hours, unlike last time when they claimed they couldn’t work in the dark), just in their own good time. I mean it’s not as if you’re paying extra for this, is it? Oh. Yes you are.

If it wasn’t for the fact BT owns very nearly all the copper infrastructure in the UK, I’d change providers. As it is, I’m stuck with them.

I’ve started keeping diaries of BT faults (obsessional? me?), so anyone really interested in the gory details can click on ‘Page 2′ below.

Last week, Flexiant posted a whitepaper I had written, proclaiming that private cloud was ultimately dead. Jeremiah Dooley (who works in the office of the CTO at VCE) posted a thoughtful riposte yesterday. Firstly, I’d like to thank Jeremiah for taking the time to read the paper before commenting (not everyone did that), and most of all for taking the time to reply. I said on Twitter I would answer, so here it is.

Apologies for the length, but some of the criticisms raised are down to the fact the original paper was condensed to make it readable by those with short attention spans, and that necessarily means some points got lost.

Purpose of Paper

Let’s put this one to rest first. Whilst (hard as it might be to believe given the consequent twitter storm) this wasn’t originally written by me as marketing piece but as a ranty blog post, I should be clear that it was subsequently adapted by our marketing department to form one. Adapting in this case means making it less long winded (see the length of this blog post for why this is necessary) and easier to digest (which means taking some material out), and making it look pretty. It is marketing collateral. Its audience is service providers, and specifically people in service providers who are selling to their potential customers. That’s not the same thing as an academic paper or a paper written for a cloud conference.

So, is this just a sales pitch for Flexiant? Well, no – not least as it doesn’t once suggest anyone buys anything. We’ve found whilst all service providers have got the message ‘cloud is hot’, many are ill equipped to sell it, as they often don’t understand the issues and the objections that their customers raise. Their customers’ objections to cloud technology are similar in many cases to objections we saw to virtualisation and objections to outsourcing; whilst some are valid, more often than not at least some are based more on fear, self-protection and specious argument than solid facts. We’re a European company and our customers are predominantly European, so perhaps that applies more here in than in the US (though I’ve had plenty of US people agree with me on this point). So what the white paper is designed to do is to provide ammunition to service providers who need to make their case for service provider run clouds, rather than solutions that keep IT in the enterprise. I’d say that’s because we believe the long-term future of IT is in the Service Provider, not the organisation’s own data center; I explain why in the article, and not one of the reasons I give is that ‘Flexiant’s software is great’. I can understand why, given Flexiant sells Service Provider focused software, that might come over as self-serving, but when I load up VCE’s home page, the ‘title’ element says ‘Private Cloud Computing’, so perhaps we all have our biases.

Am I doing anything radically new here? Are my terms particularly controversial? I don’t think so. Here’s an article by Simon Wardley using yet another definition of private and public cloud. And here’s one saying private clouds are a transitional phase whereas public clouds are the long term economic model. Perhaps ‘private clouds are ultimately dead’ is just a bit of plain speaking too far, but it means the same thing. I could look for other sources, but having sat through Simon’s (by his own admission often rather similar) presentations sufficient times without a single person arguing, I’m guessing his views are hardly heretical.

Pet Peeves

Jeremiah doesn’t like having to register to read papers on the web. You know what? Neither do I. In fact I’ll go further: I hate it. I hate it enough that 99% of the time it will stop me reading them. I also hate emails not written in plain text and wrapped at 76 characters, top posted replies, and my iPhone for making it hard for me to avoid either. But I’m not the audience, and this is marketing collateral. Our marketing guys like measuring the effectiveness and distribution of these materials, and asking for registration is hardly unknown – for instance one of VCE’s shareholder’s here. Given the number of cartoon characters we now appear to know have both memorable phone numbers and a surprising interest in cloud, and given we emailed it to anyone who wanted it, there would appear not to be that high a bar to work around for those who don’t want to give personal details.

And as far as opening PDFs opening full screen, I’m almost sure that’s a product of the web browser. On Safari and Chrome the default (for all PDFs) is to load them in browser. This infuriates me, so I turned it off. All PDF downloads (including Flexiant’s) now appear as downloads.

Anyway, this reply is in badly formatted HTML on my personal website, unadulterated by marketing, and with no registration required, so I am thus hoping it is peeve free.

Definitions

Jeremiah notes that I didn’t use the NIST definitions. I have a few nits to pick with the NIST definitions (in particular the use of the phrase ‘general public’ as opposed to ‘customers in general’), but none that are particularly serious. So let’s have a look at the differences between NIST and what I used, so we can ascertain whether this makes any practical difference:

NIST says:

Private cloud. The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.

Community cloud. The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.

Public cloud. The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.

Hybrid cloud. The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

I wrote:

Public cloud – a bank of cloud resources managed by a service provider, located in its datacenter and shared between customers.

Private cloud – a bank of cloud resources managed by an organisation, and used only by that organisation. It is irrelevant whether it is ‘on premises’ or not. This isn’t a distinction of who owns what building, but of who manages what. Just as the organisation might not own its head office building, the physical datacenter might belong to someone else, but the management and operation is done by the organisation concerned.

Hybrid cloud – a particularly ill-defined term that has many definitions. In general it means something on the spectrum between public and private clouds.

So, there are three main differences:

  1. I don’t mention community clouds. I think they are special cases of public clouds, just with a limited ‘public’. Given how little attention these get these days, I think this is a forgivable omission, and Jeremiah doesn’t pick me up on this one.
  2. Our definitions of private cloud are different. I say a private cloud is managed by the organisation to whom the resources are dedicated. NIST say it can be managed by anyone. More on this one immediately below.
  3. I say ‘hybrid cloud’ is an ill-defined term. I stand by that, and more on this under ‘Hybrid Clouds’ below.

Does who manages a private cloud matter?

So let’s look at point 2. My core argument re private clouds is that save in the case of the largest of clouds, the economics of public clouds are likely to be better. Those economics are likely in most cases to outweigh other factors, particularly as many of those other factors are less relevant than they might first appear, or are spurious. The reason why I believe the economics of public clouds are superior to private clouds are derived from scale, and scale comes in the main from multi-tenancy. I could perhaps have said ‘big clouds are better than small clouds’.

So, if by ‘manage’, we are looking solely at the economics of the hardware, I agree it doesn’t much matter who manages it. I agree that a private cloud doesn’t gain some sort of magic technical efficiency because the people with console access are employed by someone other than the organisation using it. However, management also includes the organisational resources to set the cloud up and to provide ongoing support. Those aren’t skills every organisation has, and there are economies of scale to be found here too. But I’d argue in the absence of any form of multi-tenancy, those economies are in general small.

The distinction I was making in my definitions was that it doesn’t matter whether the private cloud is on-premise or not. Put simply, putting your stuff in someone else’s datacenter does not make it either a cloud, or necessarily more efficient. Or (with the intended audience in mind) ‘don’t go selling datacenter space to people and pretend that you’re selling them cloud’. Something more than the location of the equipment needs to change whether it is scale (through multi-tenancy) or something else (perhaps management).

I agree that my definition has a ‘blackspot’ between my definitions of public and private cloud, which that a cloud dedicated to one organisation and where no resources are shared (i.e. is entirely single tenant) but which is managed by another organisation doesn’t fit into any category; such a cloud falls between two stools. NIST would call this a private cloud. Jeremiah says my definition is ‘self-serving’. I agree that there are some limited economies to be gleaned from having a single tenant cloud managed by a third party, and I agree that I ignored this case, but I don’t see how this makes the argument self-serving. As Jeremiah says, ‘How an enterprise pays for or maintains ownership of those solutions doesn’t fundamentally change what is going on’. To the extent that’s correct, then a private cloud managed in-house and a private cloud managed by a service provider should be similar, so not a lot would appear to turn upon the fact I didn’t deal with the latter case.

That said, given not a lot turns on it, and given as an industry we spend too much time talking about definitions, it would have been preferable to use the NIST definition of private cloud and avoid the debate. If I revise the paper I shall use that instead.

Hybrid clouds

Jeremiah says:

The dismissal of the Hybrid classification is expected, but disappointing.  Being able to manage resources that have been purchased from public clouds alongside infrastructure that is dedicated is important.  Most of the hypervisor providers have been, or are moving towards tooling that can handle this kind of use case.  If you are a round peg, it’s easy to see the Hybrid use case as a square hole and move on, but customers and service providers are increasingly seeing a need for this.

So here we disagree. Yes, I do slate the classification of private cloud, in the sense that I point out it is an ill-defined term. I don’t, however, slate hybrid clouds in general. In fact I say there are three models, two of which have their uses.

Re the definition, NIST says (ignoring community clouds) that hybrid cloud is a composition of private and public cloud infrastructures that remain unique entities but are bound together. I say the term is used to mean something on the spectrum between public and private cloud, and then later (starting on page 7) decompose this into three modes, the first two of which fit directly into the NIST definition (a composition of public and private clouds, with the composition being done in different ways), and the third of which describes a single cloud where some resources are dedicated and some are shared. Whatever the NIST definition says, this usage (meaning a hybrid between a public and private cloud) is used. I agree it’s confusing, and I make that point myself.

So, why do I draw a distinction between the first two hybrid cloud models? Simply because one makes sense, and the other is (in my opinion) likely to be a nonsense.

I have a suspicion that Jeremiah might actually agree with me here, as his statement ‘In the next paragraph the phrase “cloud-bursting” was used, and so I skipped the entire rest of the Hybrid Cloud section in protest’ rather suggests he thinks as little of it as I do. However, like it or not, we still hear people talking about it, and it thus needs to be addressed. I suspect skipping the rest of the section where I talk about the more useful types of hybrid cloud might explain why Jeremiah thinks I’m slating the whole thing.

So, what are these two models. In the first model, a given service component uses both private and public cloud simultaneously (at least under some circumstances) – for instance the service might ‘burst’ to use the cloud when running over a baseload capacity. In the second model, some services or service components are on a private cloud, some on a public cloud.

Let’s look at that diagrammatically, considering two services A and B, and colour representing whether the workload is performed on a public or a private cloud.

Hybrid Cloud Models

The first model’s problems are in essence that the very things that (by assumption) prevent a service going entirely on public cloud make it unsuitable to go on both public and private cloud. You also have data synchronisation issues if (like most compute tasks) each compute node cannot carry out its tasks completely independently. Or, to put it another way, introducing tens or hundreds of milliseconds of latency between some parts of an infrastructure and others may cause it to perform badly. As far as I’m concerned, apart from a few limited applications, this model – and cloud bursting in particular – is smoke and mirrors. Perhaps someone will pop up and prove me wrong on this, with an example of a significant enterprise application where cloud bursting using a hybrid cloud works better than putting the lot on public cloud. For now, I think it’s all hand-waving.

The second model doesn’t suffer from these problems. Some services or service components go on public cloud, some on private cloud. There’s no dynamic placement lunacy. I agree here common management is useful. But the model itself isn’t generating an additional returns beyond the returns you’re getting from putting one set of services on public cloud, and putting another on private cloud. And you’ve got a few extra challenges. For instance, you may have increased latency between the public and private elements. Assuming the two halves are closely connected, you’ve imported the security concerns of public cloud and consequent need to take additional precautions into the private cloud bit too. So whilst to an extent you have the best of both worlds, also you can have the worst. It’s not a panacea. What I don’t see it does is produce substantial advantages beyond those of its constituent components. I do however, in the concluding paragraph to this section (the title to which has sadly gone awry) suggest this is a useful approach for enterprises to take, along with the third model (virtual private clouds).

On Commodity Clouds

Jeremiah says:

On the public cloud side, I would wager that the majority of the customers referenced in the GiagOM research regarding pricing being a driver were talking about AWS, not cloud services hosted by traditional service providers.  Using them as the bar, especially on cost, isn’t very useful.

I agree with this. However, that was by way of interesting introductory statistic rather than a basis for my entire argument. I am (and Flexiant is) a strong believer in the ability of service providers to differentiate on things other than price. Service and Service Level is one such area ripe for differentiation as I mention on page 10 – have you ever tried speaking to AWS on the phone? If I thought price was the only important factor, I’d conclude that only service providers approximating the size of AWS would have any significant market share (i.e. that there would be a very high concentration ratio, and Flexiant would have very few customers). However that is not what has happened in any many other service provider market, and not what I believe will happen.

Currently, as an industry we suffer from a problem where cloud products are poorly defined, not portable, and not fungible. Put simply, comparing them is an apples-and-oranges exercise. Lack of substitutability means we have poor price competition, which means excess profit for certain suppliers. We have one licensee with a cloud much smaller than AWS that prices their product significantly cheaper, sells a commodity product, and still makes money, and I know of at least one other provided (not using our software) who does similarly. Thus I’m pretty sure AWS operates at a very healthy margin.

So, given that I think price isn’t everything, why am I banging on about economic drivers? Two reasons: firstly, the law of diminishing returns, and secondly because I am talking about the long term, not the market situation now. On the first point, most enterprises are sub-scale. They will get very significant economic advantages by deploying their technology on a larger platform than the platform they themselves need. There are exceptions – for instance enterprises with IT requirements the size of a medium size service provider are likely to be able to attain a similar cost profile to that sort of service provider. But the larger the service provider gets, the lower the marginal economies of scale. I am betting a service provider half the size of AWS, going to the same hardware vendor, can purchase hardware within a fraction of the percentage point of cost difference. And re the second point, the current lack of comparability between cloud products will disappear as the market and the products within it are commoditised. That doesn’t mean every product will become the same, rather that it will become well defined and the buyer will be better able to select products that are similar or substitute effectively. And what that means is that where two products have similar characteristics, price will become an increasingly important factor.

Let’s pick a real world example from another industry segment: internet access pricing. When I started in that business in the early nineties, prices and products varied hugely. You could buy purportedly the same service for one price, or elsewhere for one tenth of that price. Knowing which to buy was difficult. Pricing was incomprehensible and illogical. Margins varied hugely (though I seem to remember most of them were negative). Now, there are still different types of internet access product, and those have different prices. But the price of a particular type is reasonable consistent across all suppliers.

We’re still in that first stage for cloud. I’ve heard several organisations say that right now they are building private clouds because ‘they can do it cheaper than Amazon’ even for comparatively small clouds. What they mean by this is ‘they can build it more cheaply than Amazon’s current output pricing’. I expect that for commodity cloud (i.e. the market AWS is aimed at) in time margin compression from competition will end that for clouds of such small scale. It’s at this stage that economic drivers will kick in. But the point about price comparability is equally applicable to the choice between similar products or solution provided on public or private cloud infrastructures.

Let’s consider for a moment the alternative hypothesis: cloud will grow hugely, but despite this there will never be effective competition. This sounds desperately unlikely.

Public Cloud Drivers

Jeremiah says:

As an aside, I don’t agree with including “multi-tenancy” and “commodity” in a list of reasons why customers should choose a public cloud.  Both are things that customers will have to swallow hard and deal with in order to take advantage of the larger value, but I don’t think either is a selling point to the customer.

I agree. That’s why I didn’t list them as demand side drivers. They are supply side drivers (see top of page 5), in that the supplier wants them, because it reduces the cost of providing the services and thereby gives statistical gain. The customer may (or may not) benefit from consequent cheaper pricing.

Private Cloud Drivers

I’m less clear on the basis of Jeremiah’s objections here. Let’s take two things in particular.

Jeremiah writes:

There are TONS of valid reasons why the vast majority of enterprises of all sizes continue to maintain their own infrastructure, or pay a provider to maintain it on their behalf, and almost none of them include elasticity or utility billing.  I’d argue that very, very, few organizations, no matter what the size, do actual utility billing internally.

I’m not sure I understand the point here. I would have thought the point that the enterprise for many applications wants both elasticity (ability to flex resources) and utility billing (i.e. billing on a usage related OpEx basis rather than a CapEx basis) is relatively uncontroversial. Yes, I agree that they do not in general get this if they maintain their own infrastructure or pay someone to manage it on their behalf; that’s rather my point. The economic drivers that push people toward public cloud do not apply to the same extent to private cloud (whoever maintains it). Frankly I would have thought that was pretty uncontroversial. This does not mean there are not other factors that might militate in favour of private cloud solutions. But what it does mean is that the magic ‘cloud’ word does not deliver the same benefits regardless of deployment model.

And he continues:

Here’s a hint: it’s a completely different product, so there is different value realized and different costs involved.  Comparing a company that wants a dedicated infrastructure managed by CSC to a customer who wants to pull VMs from a public cloud provider and migrate their enterprise apps to that model is silly.  They don’t have the same drivers, they don’t have the same expectation of cost, they don’t want the same value.

I’m beginning to think part of the problem here is definitional. My comparison is not between AWS and CSC. It’s between the customer that wants a dedicated infrastructure (whether managed by CSC or the customer, that’s a private cloud) and the customer that wants a managed infrastructure provided by CSC which is not dedicated to one customer but rather to all of CSC’s customers, i.e. is multi-tenant. In my book the latter is public cloud. Under NIST’s terminology I think it is too, as it certainly doesn’t fit within the NIST private, hybrid or community cloud definitions.

Addressing Buyer Objections

Again, I think most of these are misunderstandings of what I wrote (or perhaps my failure to express them adequately).

He writes:

Objection 1 – Public cloud has inadequate SLAs – (paraphrased) “Sure they do, and even if they don’t you don’t really need an SLA anyway, but you need a terms of services contract.”  … VCE customers have, on average, 0.5 infrastructure incidents a year, leading to 83X better availability according to IDC, reducing productivity losses by more than $9,000/yr per 100 users.  There’s an implicit level of accountability with the internal IT teams responsible for those metrics (um, continued employment), but if I want that same level of accountability from AWS (2.0 incidents/year, 72X better availability) I’m asking the wrong questions?  Red herring at best.

Well, I’m not in a position to argue with IDC’s figures, so let’s take them at face value. If so, using VCE gives 83 times better availability than AWS (which is built on commodity hardware). No doubt it costs rather a lot more too. What we learn is that expensive hardware with built in redundancy has higher availability than cheap hardware that doesn’t. No surprises there.

But that’s a difference in technology, not a difference in deployment model. If VCE’s hardware were deployed in a public cloud, would its availability suddenly drop to the same as AWS? I’m guessing not.

What I argue is not that public clouds all have fantastic SLAs. Rather that you get what you pay for. If you want an enterprise-grade SLA (for some value of ‘enterprise-grade’), you aren’t going to get it from AWS. Perhaps you need to go to a cloud built using VCE’s hardware, or one with a different configuration. Undoubtedly, it will have a different price point. However, just because commodity public clouds often do not have such SLAs, that does not mean that the fact that a cloud is public prevents it from having a meaningful SLA. And an internally managed private cloud often has no SLA at all.

Jeremiah writes:

Objection 2 – Public cloud has inadequate security – (paraphrased) “Most problems here are actually your fault, and anything that IS an issue with public clouds is also an issue with private clouds, except public cloud providers are smarter than you.”, Well.  OK then.  Once again, the shitty definition of “private cloud” used plays into the failure of this statement to reflect reality, but this isn’t an actual rebuttal of the objection, it’s just pointing the finger somewhere else.:

Let’s quote in full what I actually said, as it’s shorter that Jeremiah’s comment.

Most security problems are not down to technical failings, but are instead due to poor organisational practice. For example, compare the number of security breaches originating in software bugs to those originating from configuration errors. Whilst cloud does present security challenges, many of these are common to private clouds. Service providers have in-house cloud-focused security expertise, whereas enterprises in general do not.

I’ll leave it to the reader to determine whether Jeremiah’s paraphrasing is a fair summary. As I say, cloud does present security challenge, and some (but not all) are common to all deployment models. Approaching private cloud as if it contains no security challenges would be foolhardy.

He writes:

Objection 3 – For regulatory reasons we cannot use public cloud – (paraphrased) “You don’t really know what those regulations mean.  Hybrid cloud may help here, just ignore that we ragged on it in the previous objection.  You should go get a deeper understanding of the fundamental regulatory oversight your company is in.”

Again, I don’t say what Jeremiah seems to think I said. Firstly, it was only one model of hybrid cloud I slated (though as Jeremiah skipped the section, he missed that). Secondly, I do not say the company doesn’t know what the regulations mean, I say “it is necessary for service providers to get a deep understanding of what the regulatory restrictions actually are, rather than accept the statement that regulation bans them at face value”. Knowing your customer’s industry is vital and the parameters in which they operate is surely important.

Conclusion

I pretty much stand by what I said in the original white paper. In retrospect it would have better to use the NIST definitions straight, perhaps at the expense of some more words, but I don’t think it changes the thrust of the argument. Given this reply is already longer than the original whitepaper, and given its intended audience, there’s a limit to the amount of information I could reasonably put therein. However, thanks to Jeremiah for helping draw out these points.

 

Inspired by this Dilbert cartoon, for a bit of fun I decided to plot GDP as a percentage of national debt (from here, using IMF data) against mathematical ability for 15 years a while ago (from here). I kept it to European countries on the basis they would be at similar stages of development. Is poor mathematical competence in 2003 correlated with debt ridden nations 8 years later as Dogbert suggests?

Well, the results are shown below. Unsurprisingly, the answer in general is ‘no’. However, the three countries with the highest levels of national debt as a percentage of GDP (Portugal, Italy and Greece) also have nearly the worst ranks of mathematical ability. So perhaps Dogbert just has a limited market of those who are really bad at mathematics.

In case it isn’t obvious, this is just a bit of fun, not a serious proposition. Remember kids, correlation does not imply causality.

Correlation of mathematical ability and national debt as a % of GDP

 

The raw data is below for anyone interested. Countries not in Europe or without data in either source are excluded.


Country  Debt as % GDP Upper Rank Lower Rank Avg Rank
Luxembourg  20.85 22 24 23
Sweden  37.44 15 19 17
Latvia  37.77 25 28 26.5
Czech Republic  41.46 12 17 14.5
Denmark  46.43 13 17 15
Serbia  47.89 32 34 33
Finland  48.56 1 4 2.5
Switzerland  48.65 6 12 9
Norway  49.61 21 24 22.5
Poland  55.39 22 26 24
Netherlands  66.23 2 7 4.5
Spain  68.47 25 28 26.5
Austria  72.20 16 20 18
Hungary  80.45 22 27 24.5
Germany  81.51 17 21 19
France  86.26 14 18 16
Belgium  98.51 5 10 7.5
Iceland  99.19 13 16 14.5
Ireland  104.95 17 21 19
Portugal  106.79 29 31 30
Italy  120.11 29 31 30
Greece  160.81 32 33 32.5

You may remember that around this time last year, Nominet announced a misguided proposal for a cosy relationship with law enforcement agencies. I’ve blogged several times before about this (here, here, here and here), but last week their issue group has come up with yet another revised set of draft recommendations that they will be proposing to the board for adoption.

Nominet unfortunately did not provide a version with tracked changes. I’ve highlighted the main differences in the document (and in my comments) in my response.

It won’t come as any great surprise to observers of this process that the same misguided principles remain. The changes from last time are mostly small incremental improvements. The main problem – and I find this somewhat difficult to believe given the number of iterations this document has been through now – is that it is not obvious how ‘connected’ your domain name has to be to an alleged criminal, in order to be subject to the policy; in simple terms there is nothing to stop an alleged criminal using a demon.co.uk address causing the suspension of demon.co.uk, or an employer having its domain suspended because of the unauthorised actions of an employee. On the positive side, the clause which apparently widened the process to include any crime at all has now gone, and they’ve made some further minor improvements to the process.

So, once again, they’ve improved the process and some of the safeguards, but have yet to address at least one key issue, together of course with the overarching point: why on earth should Nominet be doing this at all?

My full response is here.

This time we get eight days to reply (by 18 November) which I suppose is an improvement over three. I encourage you to submit your own response, by writing to policy@nominet.org.uk. I know some people have in the past just written ‘I agree with Alex’, which whilst better than nothing and mildly flattering, is probably not as effective as explaining why (at least in relation to the primary concerns). If, for instance, you were to state in your own words that you think the whole policy is a bad idea, but that you are particularly concerned that innocent registrants might be hit as the policy makes no express provision for minimising collateral damage (as in the demon.co.uk example above) , that would probably be useful.
 


 
A brief epilogue for those few people interested in the minutiae of Nominet governance rather than the substance of the issue at hand:

In my original blog post I said: “I think the paper is written by SOCA rather by than Nominet, though that’s not clear.” The issue group page now says “The policy process was started with a proposal submitted by Serious and Organised Crime Agency (SOCA) regarding domain names used in connection with criminal activity”. I’m pretty sure that wasn’t on the issue group page from the beginning, and it’s a welcome clarification. However, you have to wonder why:

  • the paper has no mention of SOCA as an author at all, cryptically referring to “we” which is easily read to mean Nominet, especially as it mentions SOCA as a potential interested party along with others who did not author the paper; and
  • the paper is written in Nominet’s house font, with an author attribute of “Tania” (who might just be Nominet’s ever helpful Tania Baumann, who is no doubt very bored of me by now).

Nominet got a bit of a hysterical pasting when the proposals first came out (e.g. here), because (quite reasonably) the proposals were attributed to Nominet rather than to SOCA. It looked like Nominet were recommending them in their original form, rather than SOCA recommending them. And Nominet were not exactly quick to deny this. As I wrote elsewhere at the time, I put this down to a rather bumpy introduction of the new policy development process, rather than any particular evil doing at Nominet Towers (is that link right? – Ed). A learning point here is that if you do not want your consultation to look like a fait-accompli (and, to be fair to Nominet, it wasn’t – a fair amount has changed since the original set of recommendations even though I remain opposed to the principle) then make it quite clear who authored the original proposals, and that you do not (yet) have a view on them.

I’ve blogged several times before (here, here and here) about Nominet’s misguided proposal for a cosy relationship with law enforcement agencies. Their issue group has come up with a revised set of draft recommendations that they will be proposing to the board for adoption.

As per last time, the same misguided principles remain. There has, however, been a fair amount of improvement, particularly in respect of transparency, process, and an appeals process. However, it’s still (rather startlingly) not obvious how ‘connected’ your domain name has to be to an alleged criminal, in order to be subject to the policy; in simple terms there is nothing to stop an alleged criminal using a demon.co.uk address causing the suspension of demon.co.uk, or an employer having its domain suspended because of the unauthorised actions of an employee. And the Issue Group seems still undecided as to what alleged crimes should be covered, wavering from those that create ‘a clear risk of imminent serious harm to individuals’ to any crime at all (beware your employee who sends an email when driving).

In short, they’ve improved the process and some of the safeguards, but not addressed at least two key issues, and the overarching point: why on earth should Nominet be doing this at all?

My full response is here.

The email to tell you all about this came out this afternoon (17 October) describing a meeting held on 21 September. They want responses by 20 October, which I have to say is not up to Nominet’s normal standards in terms of reasonable consultation periods, but if you want to submit your own response, write to policy@nominet.org.uk. You’ll need to type quickly.

Forgive the attention-seeking title. I’ve blogged before (here and here) about Nominet’s misguided proposal for a rather too cosy relationship with law enforcement agencies (I say “Nominet’s” proposal, though it’s never been entirely clear whether the proposal emanated from SOCA, Nominet or both). Now their issue group has come up with a set of principles that they will be proposing to the board for adoption. As feared, this principles remain misguided, though to be fair to the authors there has been an attempt (albeit very limited) to address some of the concerns raised. My full response is here, but for those not wanting to read a five page PDF, I have summarized below how my four main arguments that Nominet should not suspend domain names allegedly associated with criminal activity have been addressed.

Firstly, that there are already more effective ways of dealing with this, such as going to registrars and use of the ‘investigation lock’. The proposed principles do not set out why existing mechanisms are inadequate. There is some attempt at limiting the ambit of the proposals to situations where other routes have been exhausted, but bizarrely this is only proposed where no serious harm is being caused. Why Nominet should even consider suspending domain names where no serious harm is being caused is beyond me.

Secondly, collateral damage. The proposals appear not to address this point at all. There is no proposed linkage between registrant, alleged criminal and domain name, so nothing to stop an alleged criminal using a demon.co.uk address causing the suspension of demon.co.uk, or an employer having its domain suspended because of the unauthorised actions of an employee.

Thirdly, civil liberties and due process. I accept that civil liberties need to be balanced against protection of the public from serious crime, and that some attempt has been made to address this balance. However, protection ‘of the public’ from counterfeit Ugg Boots (for instance) does not seem to me a particularly significant altar on which to sacrifice civil liberties. The policy should concentrate on crimes more serious than these.

Fourthly, mission creep. The proposals do not address this. Indeed, disappointingly, the mission creep has already started, with an issue group tasked to look at criminal activity already recommending to the board that it look at ‘civil and third party’ complaints too.

Conspicuously absent from the proposals is any cogent explanation of why Nominet should be doing this in the first place.

Nominet is in general quite good at listening to feedback, so I encourage you to submit some, as detailed here. Thankfully, not every romance ends in marriage.