# Alex Bligh's blog

Alex Bligh's personal blog

## Copying sparse files

I developed at Flexiant a utility called sparsecopy, which we use to copy around sparse files (and indeed to make non-sparse files sparse, and to copy sparse files to block devices). In essence, it’s a version of dd which supports sparse files better. It’s hidden deep within Flexiant’s web site, and as we released it under the GPLv2, I thought I’d put it up here too. That link will take you to a tarball, which which you can either do the normal make, make install on, or build a Debian / Ubuntu package using debuild. It’s pretty self-explanatory, but here are the options:

### Usage

sparsecopy [OPTIONS] SOURCE DEST

### Options

-o, --overlay

Overlay existing file DEST (rather than truncating)

-n, --nocheck

Skip blank check

-q, --quiet

Quiet

-m, --max SIZE

Copy SIZE bytes maximum from SOURCE

-b, --blocksize SIZE

Use SIZE blocksize in bytes (default 512)

-s, --seek POS

Start writing SOURCE at position POS in DEST

-f, --finalsize SIZE

Set size of DEST to SIZE when done (no effect on block devices)

-c, --check

Check the first block in the destination is zero, else abort

-p, --progress

Display progress indicator on stderr

-h, --help

Display usage

### Notes

Sparsecopy copies from a source file (either sparse or non-sparse) to a destination file making the destination file sparse on the way. If the destination file already exists then the source file can be superimposed on top of it by using the -o switch (ignored if the destination file doesn’t exist); in this case sparsecopy will (unless -n is specified as well) check that any blocks to be overlayed with zero (i.e. made sparse) are already zero in the destination images. When used with block devices, sparsecopy will not affect the size of the device.

SOURCE can be "-" to indicate stdin. SOURCE or DEST can be raw devices if you wish. If you want to make a sparse file, either use /dev/zero for SOURCE and -m, or (quicker) use /dev/null for SOURCE and -f.

SIZE and POS can be specified in blocks (default), or use the following suffixes:

B: Bytes (2^0 bytes)
K: Kilobytes (2^10 bytes)
M: Megabytes (2^20 bytes)
G: Gigabtytes (2^30 bytes)
T: Terabytes (2^40 bytes)
P: Perabytes (2^50 bytes)
E: Exabytes (2^60 bytes)

Note that blocksize=1024 will set blocksize to 1024 512 byte blocks (use 1024B if this is not what you mean). Also note that disk capacity is often measured using decimal megabytes etc.; we do not adopt this convention for compatibility with dd.

## Microsoft Office Autoupdate on OS-X

I run Office 2004 on OS-X (the last one before the horrible ribbon update). Autoupdate has now been running for 10 minutes on a high performance Mac. It knows it has updates to install, but it is “Searching for programs to update”. The program to update is (unsurprisingly) Office 2004, and it is (unsurprisingly) installed in Applications. It’s presumably easy to detect that, as double clicking a Word file opens the right version of Word etc. But no, Microsoft has apparently determined it is necessary to search every folder on every volume which is a huge amount on my system. Even more amusingly (for those with time to spare), it has 4 updates to install, so searches the disk 4 times. No other updater is this brain-dead.

It’s a pity using Pages and Numbers is not a viable alternative for anyone wanting to exchange documents. Keynote is fantastic though.

## Domain Names and Criminals

Nominet, the UK internet domain registry, has recently consulted on proposals to allow it to suspend domain names ‘used in connection with criminal activity’. This is a fundamentally misguided proposal, which should be resisted. Details of the proposal, such as there are, can be found here; I think the paper is written by SOCA rather by than Nominet, though that’s not clear.

For a bit of background, I co-founded Nominet in 1996, and served on the board until 2007. I have no role there now, but I still have a tag so I can register domains and take an interest in what is going on.

The idea of having a big red switch to cut off criminals as soon as the police discover who they are is, of course, superficially attractive. However, this proposal has four serious problems: firstly, that there are far more effective ways to achieve the desired result, secondly, that of collateral damage; thirdly, that of justice, civil liberties and the respect of due process; and fourthly, that of mission creep.

With a few exceptions (for which it would be possible to apply a different rule) domain names are not registered by the end customer directly with Nominet. The end customer goes to one of over 3,000 registrars. The registrar might be an ISP, or it might be a web hosting company, or someone else. The registrar is likely to have a far better idea who the customer is than Nominet. For instance, instance, the registrar will have billing details, and details of the IP address used to make the registration. If a domain name is allegedly being used for criminal activity, the registrar is in a far better position to track the alleged miscreant down. The registrar also has a function called ‘investigation lock’ which can disable the domain name. They are also in a far better position to tell the police (subject to the appropriate data protection requirements, of course) who is using the domain name, and what for. In short, the registrar already has all the tools necessary to carry out the functions the police might reasonably require, and more. If it is the case that some registrars are ineffective in this role, perhaps there is a case for imposing such cooperation as a clause in the registrar contract. However, I think it more likely that some registrars, quite reasonably, don’t like cutting their customers off on a mere ‘request’ from the police, not least as they are likely to lose the customer if the police happen to be wrong. It seems to me having someone such as the registrar who is hesitant to cut a domain off unless due process has been followed (rather than taking the easy option of suspension) is a good thing.

Next we have the possibility of collateral damage. I have no figures to prove this, but I suspect most domain names are not used solely by one person. They might be used for a household, a business, or even customers of a business (as is the case, for instance, with demon.co.uk). Unlike the registrar, who can often tell whether a domain name is used for the sole purpose of one web site and one email account, Nominet cannot tell to what use a domain name is being put. Let us assume for the purposes of this paragraph that the police are justified in their belief that a domain name is being ‘used in connection with criminal activity’. Cutting off that domain name could have far wider than might be appreciated. Firstly, it might not be the perpetrator’s own domain name: as reported at great length, whether through computer viruses, weak passwords, or other means, it is not uncommon for someone’s email account to get taken over, and for this to be used for sending unsolicited mail. This itself can be an offence, but often unsolicited mail is used for fraudulent purposes (419 scams etc). Cutting off the domain name not only doesn’t solve the problem, but punishes a victim. Even if the perpetrator has some connection to the domain name registrant, it often isn’t the registrant itself. If my employee threatens someone unlawfully using their work email, should the business’s domain name be suspended? If one of Demon Internet’s customers hosted illegal content (and statistics suggest this should read ‘when’, rather than ‘if’), should all their customers lose connectivity? If my child illegally downloads material from the internet, should I lose my domain name? The answer to each of these is transparently ‘no’, but this is the power SOCA propose to give to the police.

Thirdly, we have justice and civil liberties. I have a number of concerns here. Firstly, these are individuals who the police reasonably believe may be involved in criminal activity. By assumption, they have not been charged, let alone been tried. The police can already get a warrant or a court order to which Nominet will respond under its current policy. One must therefore assume that the proposed change in policy is one where the police have neither a warrant or a court order; one has to ask “why not?” if the situation is sufficiently serious to merit a domain name being cut off. The powers of the police should be proportionate to the likely danger to society. As currently phrased, a domain name could be cut off for relatively trivial offences, such as failing to identify the legal name of the operator of a trading web site (thank you Adrian Kennard), storing a single item of data outside the remit of the Data Protection Act, or any number of other minor offences. The phrase ‘used in connection with criminal activity’ is particularly worrying: emails from an accused to their lawyer are presumably ‘used in connection with criminal activity’. At the very least, this power should be restricted to serious arrestable offences, where evidence would be sufficient to make an arrest were the registrant identified, where the suspension of the domain name will in the reasonable opinion of the police prevent the commission or continuance of the offence, and where this matter has been certified by a senior officer. However, if that test was satisfied, I suspect there would be no difficulty in obtaining the appropriate court order or bench warrant, which makes me think this proposal is meant to be used in other cases.

Fourthly, we have the question of mission creep. The internet is littered with all sorts of unpleasant material. It shares this quality with libraries and public playhouses, and as the Lord Chancellor found out over several hundred years, regulating what is printed or performed to exclude the unpleasant is in the most part doomed to failure. The internet has permitted a whole range of crimes to be committed in innovative ways, just as did electricity, the telegraph, the telephone and mobile phones before it. I am quite sure that with the advent of each invention, well meaning police officers suggested that if they reasonably suspected electricity, or a telephone, or whatever, to be utilised in connection with a crime, they would like the power to ensure it is cut off. No one would say those were reasonable; they would say “go to a court, and argue your case”. I fail to see why the internet is different in this respect. Nominet appears to be engaged in a process of making some false distinction here. A year or two ago, Nominet embarked on the vital task of ridding the world of the web sites of counterfeiters of Ugg Boots. Deep in the small print of the resultant public relations snowstorm, it became evident Nominet suspended or cancelled the domain names merely for not having genuine contact details. I have no problem with this, aside from the way the PR was used to make it look like Nominet was there to protect the great British public from marauding purveyors of ersatz fashion, when it is in fact a domain registry. Less well known is the allegation that at least one of the domain names concerned was not in fact associated with a criminal, but with someone who appears to have been advertising genuine goods (and thus at worst a civil rather than a criminal matter). If nothing else, this should have taught Nominet that playing internet policeman is an area to avoid.

So please, Nominet, think again on this one. This really isn’t a good idea.

Nominet is currently consulting on the issue, so I urge people to respond similarly to policy@nominet.org.uk, and including the title “Dealing with domain names used in connection with criminal activity” in the subject line. Feel free to link to this post, or include text from it (with attribution and/or a link here) elsewhere.

## BT TotalCare is a total waste of money

Around 15:15 today, one of my ISDN2 lines I use for voice, and the analogue line I use for DSL (and thus internet access) went down. The ISDN2 line is just dead, the voice line has such a loud buzz on it it’s unusable (occasionally accepts DTMF digits) and the DSL suffers accordingly. Both are meant to be on TotalCare, but according to BT the ISDN2 lines aren’t (it may have got missed off when I moved to OneBill earlier). Both faults were reported to BT between around 15:15 and 15:30.

BT claim they cannot work on the analogue fault ‘because it’s dark’. Apparently health and safety prevents them from working in the dark. This is not in the wilds of nowhere. This is in a street with street lights in London. Presumably BT also have lights. A quick consultation with this useful site suggests BT can avoid work in London save between around 8am and 4pm in the winter. In Glasgow, take an hour off that. Who knows how little time BT can work for in Aberdeen.

Now, I don’t particularly have a problem with people not working if it is in fact dangerous to do so. I can see climbing up a pole might be dangerous. However, every single fault I’ve had with these lines, to my recollection, has been a local loop fault. The last time (several years ago), I remember a chap turning up at 10pm. I’m pretty sure it was dark then. Has it suddenly become more dangerous to work in the dark?

Moreover, here’s the blurb from BT’s own web site on TotalCare:

Have you considered how much it would cost your business if your Business Phone Lines went down even for a short time? In lost customers, lost revenue, customer dis-satisfaction? … BT will respond within 4 hours of receiving your fault report and if the fault is not cleared during this period, we will advise you of progress.

And from here:

We guarantee to resolve a “Service Failure” in line with the care/service level you have chosen. For Total Care, this means 24 Hours after you report the fault, unless you have requested specific appointment date. The Total Care working week is 7 Days a week, 52 weeks per year.

Now, I don’t see anything here about not working in the dark. I also don’t think “well, it’s still dark” sounds like advice on progress.

The ISDN fault is even more comical: that, they say they will look at on Friday 26th. Assuming this means working hours (and avoiding those pesky periods where the sun isn’t shining) they’ll only start looking at it over 40 hours after I’ve reported the fault. And the expected repair time is Friday PM (let’s hope before 4pm, or we’ll be plunged into darkness followed by weekend). Fortunately I have 2 ISDN circuits so this is less disastrous for me than it would be for some people. I suppose waiting 16 hours rather than 40 hours before BT look at a fault is, in some senses, an improvement.

OpenReach faults used to be quite good. But this is frankly appalling.

Update: 24 hours after reporting the fault, they still hadn’t started looking at either of them. And despite two promises given when I rang in to escalate the faults and ring me back, I heard nothing. I eventually got hold of a supervisor (who, to be fair, did ring me back) who claimed the inaction was due to ‘snow and unforeseeable events’. Well, it’s not snowing in London (and the BT region is pretty small), and darkness has to be one of life’s more foreseeable features. Apparently they might look at the analogue line tonight and perhaps get around to the ISDN tomorrow some time. Clearly I have neither been ‘advised of progress’ nor have they met their ‘guarantee’.

## Adding SQL support to ISC dhcpd

I wanted to integrate SQL support direct into dhcpd for various reasons, so I wrote a patch. This integrates ISC dhcpd 4.2.0 with libdbi, and thus into MySQL, postgres, and any other SQL database supported by libdbi. This allows dhcpd to dynamically reference an SQL database based on hardware address.

Note that in general doing SQL lookups from dhcpd is a bad idea. The sort of environment where this is useful is the sort of environment where the LDAP patch is useful, where there is no LDAP around but there is an SQL database. An arbitrary SQL query can be specified, which can return anything that can be put in a ‘host {...}‘ block.

The patch, against dhcpd 4.2.0 release, can be found here (that link was updated on 2 Dec 2010 – the old version is here).

The patch was heavily inspired by the LDAP patch – thank you ISC, Ntelos Inc, Brian Masney & David Cantrell.

So far the patch has had very light testing indeed. Do not use it in a production environment. It should (unlike, I think, the LDAP patch) work fine with IPv6, though I’ve only tested it on IPv4.

To compile:

• Apply the patch here to the unpacked ISC dhcp source tree.
• Regenerate the configure script (requires GNU autoconf and automake):
aclocal
libtoolize --copy --force
autoconf

• Run ‘./configure‘ with the ‘--with-dbi‘ argument to enable DBI.
• Run ‘make‘ to build ISC dhcp.

Documentation is in the patch, but I have provided an outline below.

My main purpose in posting this here (and similarly to dhcp-users) is to see whether it is useful to anyone other than me. I don’t know much about the innards of dhcpd so I have probably mucked up the memory management (though this was in general taken from ldap.c).

This file provides a generalised interface to libdbi. The following
configuration elements can be set (all are strings):
dbi-host:     host on which database resides
dbi-driver:   name of driver (e.g. mysql)
dbi-dbname:   name of database
dbi-query:    database query

The file allows dynamic configuration of dhcp parameters looked up by
hardware address. The user can specify a statement (likely to be SELECT
in an SQL environment) which returns data providing the dhcp paramaters
associated with that particular hardware address.

The query is the SELECT statement passed to the SQL backend, into which
the following are substituted:
%t : the media type
%% : a percent sign

The query does not need a trailing semicolon. Be careful that quotes
in the query do not interfere with quotes in the config file.

Columns returned are processed in order, with column names corresponding
to entries in the file. The values in the first row are appended after a
" " and terminated with a ";".

For example, if the table 'dhcp' contained columns haddr, ipv4,
default routes for IPv4,  the following query might be used:

SELECT ipv4 AS 'fixed-address', droute AS 'option routers'
FROM dhcp WHERE haddr = '%h'

If ipv4 were returned as 192.200.0.15, and droute as 192.200.0.1, this
would generate a host block equivalent to:
host dummyhostname {
option routers 192.200.0.1;
};

The following column names have special significance:
name:      the name of the host block (should be irrelevant)
entry:     an string to be copied verbatim into the configuration
(including the relevant semi-colon)

An example of the use of 'entry' on MySQL is as follows, and has
the same effect as the above example:
'option routers ',droute,';') AS 'entry'
FROM dhcp WHERE haddr = '%h'

This allows for more flexible (if less legible) queries, and avoids
problems with databases which are unhappy returning column names
containing non-alphanumeric characters.


## Linux PV drivers for Xen HVM – building normally within an arbitrary kernel tree

I wrote here about how to build PV drivers for Linux using the unmodified_drivers directory from Xen. The problem with this method is that you need to Xenify an existing kernel. And although the process can be automated a little, it still requires someone to keep Xen patches in sync with the kernel. That’s hard work for someone, and certainly too hard work for me.

So I simplified things. By stripping out only the files that are actually used when building the PV drivers, and putting them into a normal kernel build configuration, I can now build PV drivers for a fully HVM kernel (i.e. a normal kernel), just by adding a patch. This has the great merit  you can use it with a kernel package from an arbitrary distribution.

To install, simply apply this patch to your kernel build tree, and build as normal.

Notes:

• To turn it on, you need to switch on the relevant config file option (either CONFIG_XEN_PVDRIVERS=y or CONFIG_XEN_PVDRIVERS=m).
• The main changes are the location of files.
• I have imported a spaghetti of .h files. These are all included, one way or another. Most of them are pointlessly included. These could be pruned, but that would make them less easy to maintain.
• Supporting building otherwise than as a module (CONFIG_XEN_PVDRIVERS=y) was non-trivial because the code did all sorts of strange and I believe unnecessary things when MODULE was not defined if CONFIG_XEN was false too. I suspect this may not have been tested in recent times. Rather than work out what it was doing, I changed MODULE in the source to another keyword which is always defined as true. Though this now works, it is less tested. It seems not to modprobe the vbd devices correctly; I suspect it may need a MOD_ALIAS_BLOCKDEV_MAJOR in or something.
• I rewrote the Makefile and build system for the pvdrivers.
• I think the remaining code changes boil down to renaming the function ctrl_alt_del(), which is used elsewhere in the kernel which broke non-modular compilation.
• It will not work with CONFIG_XEN turned on. Whilst you don’t need CONFIG_XEN turned on for a fully HVM kernel, I can see why you might want a dual use kernel. The only reason it won’t work is namespace collisions. This is probably fixable at some cost to maintainability.

Today, I flew from London to Edinburgh, which is a fairly frequent trip for me, given Flexiant is in Livingston, and I live in London. I always try to fly BMI, on the basis that it’s not BA, which is the alternative.

Today, however, BMI experienced a complete logic failure in customer service. The London to Edinburgh configuration has 3 seats on each side. It appears to be a single class cabin – all the seat pitches are the same, and BMI has recently advertised it has gone single class.

I was seated in seat 5D, with seat 5E occupied by a rather large gentleman, and seat 5F also occupied. Rows 1 to 4 were almost completely unoccupied (I think a total of 6 people in 24 seats) and the whole right hand side of row 4 was entirely empty. Nowhere was anyone in rows 1 to 4 more than one person to 3 seats. So I slipped forward a row, and the gentleman in the center seat moved one sideways. You would have thought this would have helped everyone, but no.

Within about 10 seconds, I was accosted by a stewardess. Apparently rows 1 to 4 are “reserved for a different class” (of ticket, rather than of person, I assume). I pointed out that the seats were exactly the same – no difference in pitch, and no other passenger could claim to be inconvenienced or have less space as the row was completely empty. She then said she would be “serving differently” in that area, and couldn’t possibly have me there. “Serving differently” means offering people a free sandwich, rather than a paid for sandwich, a deal available as I understand it for silver Diamond Club card holders anyway. I didn’t care, I’d brought my own sandwich. But no, avoiding having to remember not to offer me a sandwich was preferable to having an argument with one passenger and inconveniencing three.

Whilst the volcanic ash had stopped the flights, I went by train instead. Far more comfortable, and a first class ticket cost little more than a standard class flight. And I got work done whilst traveling. Perhaps I shall do that in future.

## How to compile Xen pv drivers for Linux HVM on a modern kernel

I was somewhat surprised to find that this is non-trivial.

The xensource hg repository contains a directory called unmodified_drivers. To be polite, the instructions for building these are woefully incomplete. Also, pulling down the entire xen source code to build a couple of drivers is a pain in the backside.

In essence, what you need to do is as follows:

• Grab a recent kernel source from kernel.org. I used 2.6.32.11.
• Find a set of patches to xenify it (I used Andrew Lyon’s Gentoo patches).
• Configure the kernel in non-Xen mode (i.e. a normal kernel, with the options you would want for full HVM).
• Build the kernel.
• Build the PV drivers.

I have simplified this process as follows:

• I wrote a little script to patch the kernel (more accurately, I adopted and adapted a perfectly useful script from Boris Derzhavets).
• I pulled the unmodified_drivers directory out of the rest of the Xen source tree, fixed a bug in a Makefile, wrote a build script, and put it in its own .tgz.

So now what you need to do is as follows (I’m building on Ubuntu Lucid, but it doesn’t much matter):

wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.32.11.tar.bz2
tar xvjf linux-2.6.32.11.tar.bz2
cd linux-2.6.32.11
wget http://www.alex.org.uk/pvdrivers-0.01.tgz
tar xvzf pvdrivers-0.01.tgz
pvdrivers-0.01/xenify.sh xen-patches-2.6.32-1.tar.bz2
# Use a normal HVM config *NOT* a Xen config
cp /boot/config-uname -r .config
(exit)
make localmodconfig
(accept defaults)
make -j 9
cd pvdrivers-0.01
./build.sh


And that should be it. In case you are wondering where pvdrivers-0.01.tgz came from, here’s roughly how you make it:

cd pvdrivers
hg clone http://xenbits.xen.org/xen-4.0-testing.hg
cd xen-4.0-testing.hg
# Fix duff makefile
perl -pi -e 's/ vbd.o\$/ vbd.o vcd.o/' unmodified_drivers/linux-2.6/blkfront/Kbuild
# Remove the unneeded Xen Source code
ls -1 | egrep -v 'unmodified|COPYING' | xargs rm -rf
rm -rf .hg
cd ..


You’ll then need to add the two shell scripts I wrote (or your own equivalent). The main surprises here are:

• The XL environment variable needs to point to a kernel source tree with a configured and built xenified kernel in, but one which has not been configured for Xen.
• The XEN environment variable does not need to point to a Xen source tree. Instead it needs to point to include/xen within the Linux tree.

The 3.3-testing PV drivers do not build (though it’s an easy fix, one of the hypervisor includes has moved). However, PV drivers built using this package (from 4.0-testing) seem to work just fine with older versions of Xen, so there’s not a lot of point in bothering.