Alex Bligh's blog

Alex Bligh's personal blog

Browsing Posts in Technology

At Flexiant, we had a need for a general websocket to tcp proxy. A cute apache module to do extensible web-sockets programming has been developed by self.disconnect, and is available on github here. I’ve hacked around with this to produce a generic, apache licensed, websocket proxy. A websocket connection is made over HTTP or HTTPS to apache, this calls the websocket module, which in turn calls my module, which makes an outbound tcp connection.

Some features:

  • It supports base64 and text format sockets. So it will thus process (for instance) VNC which works using base64 encoding.
  • Configuration is supplied through normal apache directory based configuration (see details below).
  • It’s extensible (currently by changing the code, rather than calling additional modules)

The configuration section looks like this, in this case set up to proxy VNC using a novnc client:

<IfModule mod_websocket.c>
      <Location /vncproxy>
        SetHandler websocket-handler
        WebSocketHandler  /usr/lib/apache2/modules/mod_websocket_tcp_proxy.so tcp_proxy_init
        WebSocketTcpProxyBase64 on
        WebSocketTcpProxyHost 192.168.199.199
        WebSocketTcpProxyPort 5900
        WebSocketTcpProxyProtocol base64
      </Location>
 </IfModule>

I discovered that on a default apache install, you may also want to change the RequestReadTimeout, i.e.

<IfModule reqtimeout_module>
  RequestReadTimeout body=300,minrate=1
</IfModule>

I’m going to try and persuade self.disconnect to stick this in the examples directory for apache-websocket, but in the mean time the source is available here. Put it in the examples directory of apache-websocket, then compile, install and enable it with it with:

sudo apxs2 -c -i -a -I.. mod_websocket_tcp_proxy.c

I blogged here about adding SQL support to dhcpd. I’ve fixed a couple of issues – mainly memory leaks. I have no idea how I missed these before (sigh). The good news is that the server ran for sufficiently long for the process to get to many gigabytes in size, and nothing crashed!

You can find a new patch, against dhcpd 4.2.0-P1 (but should work against dhcpd 4.2.1-RC1) here.

See the original post here for build instructions and documentation.

I wrote here about the dangers of Registry operators, particularly Nominet, allowing suspension of domain names outside the due process of courts, to make matters ‘easier’ for law enforcement. One of the many reasons why this is a bad idea is that innocent bystanders can get caught in the crossfire. Now, exactly that has happened in the US as described here.

The Register alleges that the domain name ‘mooo.com’ was suspended as it was ‘seized’ by  ’Immigration and Customs Enforcement, the main investigative arm of the US Department of Homeland Security‘ who then replaced the web sites hosted at ‘mooo.com’ with a banner including the text ‘Advertisement, distribution, transportation, receipt, and possession of child pornography constitute federal crimes that carry penalties for first time offenders of up to 30 years in federal prison, a $250,000 fine, forfeiture and restitution.’

It appears, however, that mooo.com (which today reports ‘no web site configured’) was actually some sort of shared free web hosting service. I’m obviously not in a position to say whether one or more users had hosted unlawful material there, but even if this is the case, it seems likely that a large number of innocent users were affected too. There does not seem to be any suggestion the operators of mooo.com were involved. The solution to this problem is not for the domain name in question to be suspended (particularly on no notice), with consequent collateral damage, but rather to go after the publishers of the unlawful material. I am not in any way suggesting child pornography is not a serious crime. Rather, I am suggesting it is the perpetrators who should be targeted, impact on innocent parties should be minimised, and that the authorities pursuing them should be subject to the rule of law. Particularly worrying in this case, if the Register is correct about the notice appearing on web sites of innocent people, is the likely inference drawn by others that these people were somehow involved in child pornography.

I am sure it is not unheard of for people to upload unlawful material to Facebook, Myspace, or other services reliant on a single domain name; I do not believe these services should get special protection just because the law enforcement community is likely to be more familiar with them.

I should point out that in this case what The Register describes as a ‘secret court proceedings’ (in reality perhaps just an ex parte hearing) was involved. What Nominet and/or SOCA propose (it’s difficult to tell who is doing the proposing) is that even that safeguard would be dropped.

I blogged here about adding SQL support to dhcpd. I’m now reasonably happy with the patch, but have made a couple of tiny updates. These concern the documentation, and a bug which prevented the (probably useless) “name:” field from working.

You can find a new patch, again against dhcp 4.2.0, here.

See the original post here for build instructions and documentation.

I developed at Flexiant a utility called sparsecopy, which we use to copy around sparse files (and indeed to make non-sparse files sparse, and to copy sparse files to block devices). In essence, it’s a version of dd which supports sparse files better. It’s hidden deep within Flexiant’s web site, and as we released it under the GPLv2, I thought I’d put it up here too. That link will take you to a tarball, which which you can either do the normal make, make install on, or build a Debian / Ubuntu package using debuild. It’s pretty self-explanatory, but here are the options:

Usage

sparsecopy [OPTIONS] SOURCE DEST

Options

-o, --overlay

Overlay existing file DEST (rather than truncating)

-n, --nocheck

Skip blank check

-q, --quiet

Quiet

-m, --max SIZE

Copy SIZE bytes maximum from SOURCE

-b, --blocksize SIZE

Use SIZE blocksize in bytes (default 512)

-s, --seek POS

Start writing SOURCE at position POS in DEST

-f, --finalsize SIZE

Set size of DEST to SIZE when done (no effect on block devices)

-c, --check

Check the first block in the destination is zero, else abort

-p, --progress

Display progress indicator on stderr

-h, --help

Display usage

Notes

Sparsecopy copies from a source file (either sparse or non-sparse) to a destination file making the destination file sparse on the way. If the destination file already exists then the source file can be superimposed on top of it by using the -o switch (ignored if the destination file doesn’t exist); in this case sparsecopy will (unless -n is specified as well) check that any blocks to be overlayed with zero (i.e. made sparse) are already zero in the destination images. When used with block devices, sparsecopy will not affect the size of the device.

SOURCE can be "-" to indicate stdin. SOURCE or DEST can be raw devices if you wish. If you want to make a sparse file, either use /dev/zero for SOURCE and -m, or (quicker) use /dev/null for SOURCE and -f.

SIZE and POS can be specified in blocks (default), or use the following suffixes:

B: Bytes (2^0 bytes)
K: Kilobytes (2^10 bytes)
M: Megabytes (2^20 bytes)
G: Gigabtytes (2^30 bytes)
T: Terabytes (2^40 bytes)
P: Perabytes (2^50 bytes)
E: Exabytes (2^60 bytes)

Note that blocksize=1024 will set blocksize to 1024 512 byte blocks (use 1024B if this is not what you mean). Also note that disk capacity is often measured using decimal megabytes etc.; we do not adopt this convention for compatibility with dd.

I run Office 2004 on OS-X (the last one before the horrible ribbon update). Autoupdate has now been running for 10 minutes on a high performance Mac. It knows it has updates to install, but it is “Searching for programs to update”. The program to update is (unsurprisingly) Office 2004, and it is (unsurprisingly) installed in Applications. It’s presumably easy to detect that, as double clicking a Word file opens the right version of Word etc. But no, Microsoft has apparently determined it is necessary to search every folder on every volume which is a huge amount on my system. Even more amusingly (for those with time to spare), it has 4 updates to install, so searches the disk 4 times. No other updater is this brain-dead.

It’s a pity using Pages and Numbers is not a viable alternative for anyone wanting to exchange documents. Keynote is fantastic though.

Nominet, the UK internet domain registry, has recently consulted on proposals to allow it to suspend domain names ‘used in connection with criminal activity’. This is a fundamentally misguided proposal, which should be resisted. Details of the proposal, such as there are, can be found here; I think the paper is written by SOCA rather by than Nominet, though that’s not clear.

For a bit of background, I co-founded Nominet in 1996, and served on the board until 2007. I have no role there now, but I still have a tag so I can register domains and take an interest in what is going on.

The idea of having a big red switch to cut off criminals as soon as the police discover who they are is, of course, superficially attractive. However, this proposal has four serious problems: firstly, that there are far more effective ways to achieve the desired result, secondly, that of collateral damage; thirdly, that of justice, civil liberties and the respect of due process; and fourthly, that of mission creep.

With a few exceptions (for which it would be possible to apply a different rule) domain names are not registered by the end customer directly with Nominet. The end customer goes to one of over 3,000 registrars. The registrar might be an ISP, or it might be a web hosting company, or someone else. The registrar is likely to have a far better idea who the customer is than Nominet. For instance, instance, the registrar will have billing details, and details of the IP address used to make the registration. If a domain name is allegedly being used for criminal activity, the registrar is in a far better position to track the alleged miscreant down. The registrar also has a function called ‘investigation lock’ which can disable the domain name. They are also in a far better position to tell the police (subject to the appropriate data protection requirements, of course) who is using the domain name, and what for. In short, the registrar already has all the tools necessary to carry out the functions the police might reasonably require, and more. If it is the case that some registrars are ineffective in this role, perhaps there is a case for imposing such cooperation as a clause in the registrar contract. However, I think it more likely that some registrars, quite reasonably, don’t like cutting their customers off on a mere ‘request’ from the police, not least as they are likely to lose the customer if the police happen to be wrong. It seems to me having someone such as the registrar who is hesitant to cut a domain off unless due process has been followed (rather than taking the easy option of suspension) is a good thing.

Next we have the possibility of collateral damage. I have no figures to prove this, but I suspect most domain names are not used solely by one person. They might be used for a household, a business, or even customers of a business (as is the case, for instance, with demon.co.uk). Unlike the registrar, who can often tell whether a domain name is used for the sole purpose of one web site and one email account, Nominet cannot tell to what use a domain name is being put. Let us assume for the purposes of this paragraph that the police are justified in their belief that a domain name is being ‘used in connection with criminal activity’. Cutting off that domain name could have far wider than might be appreciated. Firstly, it might not be the perpetrator’s own domain name: as reported at great length, whether through computer viruses, weak passwords, or other means, it is not uncommon for someone’s email account to get taken over, and for this to be used for sending unsolicited mail. This itself can be an offence, but often unsolicited mail is used for fraudulent purposes (419 scams etc). Cutting off the domain name not only doesn’t solve the problem, but punishes a victim. Even if the perpetrator has some connection to the domain name registrant, it often isn’t the registrant itself. If my employee threatens someone unlawfully using their work email, should the business’s domain name be suspended? If one of Demon Internet’s customers hosted illegal content (and statistics suggest this should read ‘when’, rather than ‘if’), should all their customers lose connectivity? If my child illegally downloads material from the internet, should I lose my domain name? The answer to each of these is transparently ‘no’, but this is the power SOCA propose to give to the police.

Thirdly, we have justice and civil liberties. I have a number of concerns here. Firstly, these are individuals who the police reasonably believe may be involved in criminal activity. By assumption, they have not been charged, let alone been tried. The police can already get a warrant or a court order to which Nominet will respond under its current policy. One must therefore assume that the proposed change in policy is one where the police have neither a warrant or a court order; one has to ask “why not?” if the situation is sufficiently serious to merit a domain name being cut off. The powers of the police should be proportionate to the likely danger to society. As currently phrased, a domain name could be cut off for relatively trivial offences, such as failing to identify the legal name of the operator of a trading web site (thank you Adrian Kennard), storing a single item of data outside the remit of the Data Protection Act, or any number of other minor offences. The phrase ‘used in connection with criminal activity’ is particularly worrying: emails from an accused to their lawyer are presumably ‘used in connection with criminal activity’. At the very least, this power should be restricted to serious arrestable offences, where evidence would be sufficient to make an arrest were the registrant identified, where the suspension of the domain name will in the reasonable opinion of the police prevent the commission or continuance of the offence, and where this matter has been certified by a senior officer. However, if that test was satisfied, I suspect there would be no difficulty in obtaining the appropriate court order or bench warrant, which makes me think this proposal is meant to be used in other cases.

Fourthly, we have the question of mission creep. The internet is littered with all sorts of unpleasant material. It shares this quality with libraries and public playhouses, and as the Lord Chancellor found out over several hundred years, regulating what is printed or performed to exclude the unpleasant is in the most part doomed to failure. The internet has permitted a whole range of crimes to be committed in innovative ways, just as did electricity, the telegraph, the telephone and mobile phones before it. I am quite sure that with the advent of each invention, well meaning police officers suggested that if they reasonably suspected electricity, or a telephone, or whatever, to be utilised in connection with a crime, they would like the power to ensure it is cut off. No one would say those were reasonable; they would say “go to a court, and argue your case”. I fail to see why the internet is different in this respect. Nominet appears to be engaged in a process of making some false distinction here. A year or two ago, Nominet embarked on the vital task of ridding the world of the web sites of counterfeiters of Ugg Boots. Deep in the small print of the resultant public relations snowstorm, it became evident Nominet suspended or cancelled the domain names merely for not having genuine contact details. I have no problem with this, aside from the way the PR was used to make it look like Nominet was there to protect the great British public from marauding purveyors of ersatz fashion, when it is in fact a domain registry. Less well known is the allegation that at least one of the domain names concerned was not in fact associated with a criminal, but with someone who appears to have been advertising genuine goods (and thus at worst a civil rather than a criminal matter). If nothing else, this should have taught Nominet that playing internet policeman is an area to avoid.

So please, Nominet, think again on this one. This really isn’t a good idea.

Nominet is currently consulting on the issue, so I urge people to respond similarly to policy@nominet.org.uk, and including the title “Dealing with domain names used in connection with criminal activity” in the subject line. Feel free to link to this post, or include text from it (with attribution and/or a link here) elsewhere.

I wanted to integrate SQL support direct into dhcpd for various reasons, so I wrote a patch. This integrates ISC dhcpd 4.2.0 with libdbi, and thus into MySQL, postgres, and any other SQL database supported by libdbi. This allows dhcpd to dynamically reference an SQL database based on hardware address.

Note that in general doing SQL lookups from dhcpd is a bad idea. The sort of environment where this is useful is the sort of environment where the LDAP patch is useful, where there is no LDAP around but there is an SQL database. An arbitrary SQL query can be specified, which can return anything that can be put in a ‘host {...}‘ block.

The patch, against dhcpd 4.2.0 release, can be found here (that link was updated on 2 Dec 2010 – the old version is here).

The patch was heavily inspired by the LDAP patch – thank you ISC, Ntelos Inc, Brian Masney & David Cantrell.

So far the patch has had very light testing indeed. Do not use it in a production environment. It should (unlike, I think, the LDAP patch) work fine with IPv6, though I’ve only tested it on IPv4.

To compile:

  • Apply the patch here to the unpacked ISC dhcp source tree.
  • Regenerate the configure script (requires GNU autoconf and automake):
    aclocal
    libtoolize --copy --force
    autoconf
    autoheader
    automake --foreign --add-missing --copy
  • Run ‘./configure‘ with the ‘--with-dbi‘ argument to enable DBI.
  • Run ‘make‘ to build ISC dhcp.

Documentation is in the patch, but I have provided an outline below.

My main purpose in posting this here (and similarly to dhcp-users) is to see whether it is useful to anyone other than me. I don’t know much about the innards of dhcpd so I have probably mucked up the memory management (though this was in general taken from ldap.c).

This file provides a generalised interface to libdbi. The following
configuration elements can be set (all are strings):
   dbi-host:     host on which database resides
   dbi-driver:   name of driver (e.g. mysql)
   dbi-username: database username
   dbi-password: database password
   dbi-dbname:   name of database
   dbi-query:    database query

The file allows dynamic configuration of dhcp parameters looked up by
hardware address. The user can specify a statement (likely to be SELECT
in an SQL environment) which returns data providing the dhcp paramaters
associated with that particular hardware address.

The query is the SELECT statement passed to the SQL backend, into which
the following are substituted:
    %h : the hardware address
    %t : the media type
    %% : a percent sign

The query does not need a trailing semicolon. Be careful that quotes
in the query do not interfere with quotes in the config file.

Columns returned are processed in order, with column names corresponding
to entries in the file. The values in the first row are appended after a
" " and terminated with a ";".

For example, if the table 'dhcp' contained columns haddr, ipv4,
droute corresponding to hardware address, IPv4 address and
default routes for IPv4,  the following query might be used:

    SELECT ipv4 AS 'fixed-address', droute AS 'option routers'
           FROM dhcp WHERE haddr = '%h'

If ipv4 were returned as 192.200.0.15, and droute as 192.200.0.1, this
would generate a host block equivalent to:
    host dummyhostname {
         fixed-address 192.200.0.15;
         option routers 192.200.0.1;
    };

The following column names have special significance:
    name:      the name of the host block (should be irrelevant)
    entry:     an string to be copied verbatim into the configuration
               (including the relevant semi-colon)

An example of the use of 'entry' on MySQL is as follows, and has
the same effect as the above example:
    SELECT CONCAT('fixed-address ',ipv4,'; ',
                  'option routers ',droute,';') AS 'entry'
           FROM dhcp WHERE haddr = '%h'

This allows for more flexible (if less legible) queries, and avoids
problems with databases which are unhappy returning column names
containing non-alphanumeric characters.

I wrote here about how to build PV drivers for Linux using the unmodified_drivers directory from Xen. The problem with this method is that you need to Xenify an existing kernel. And although the process can be automated a little, it still requires someone to keep Xen patches in sync with the kernel. That’s hard work for someone, and certainly too hard work for me.

So I simplified things. By stripping out only the files that are actually used when building the PV drivers, and putting them into a normal kernel build configuration, I can now build PV drivers for a fully HVM kernel (i.e. a normal kernel), just by adding a patch. This has the great merit  you can use it with a kernel package from an arbitrary distribution.

To install, simply apply this patch to your kernel build tree, and build as normal.

Notes:

  • To turn it on, you need to switch on the relevant config file option (either CONFIG_XEN_PVDRIVERS=y or CONFIG_XEN_PVDRIVERS=m).
  • The main changes are the location of files.
  • I have imported a spaghetti of .h files. These are all included, one way or another. Most of them are pointlessly included. These could be pruned, but that would make them less easy to maintain.
  • Supporting building otherwise than as a module (CONFIG_XEN_PVDRIVERS=y) was non-trivial because the code did all sorts of strange and I believe unnecessary things when MODULE was not defined if CONFIG_XEN was false too. I suspect this may not have been tested in recent times. Rather than work out what it was doing, I changed MODULE in the source to another keyword which is always defined as true. Though this now works, it is less tested. It seems not to modprobe the vbd devices correctly; I suspect it may need a MOD_ALIAS_BLOCKDEV_MAJOR in or something.
  • I rewrote the Makefile and build system for the pvdrivers.
  • I think the remaining code changes boil down to renaming the function ctrl_alt_del(), which is used elsewhere in the kernel which broke non-modular compilation.
  • It will not work with CONFIG_XEN turned on. Whilst you don’t need CONFIG_XEN turned on for a fully HVM kernel, I can see why you might want a dual use kernel. The only reason it won’t work is namespace collisions. This is probably fixable at some cost to maintainability.

I was somewhat surprised to find that this is non-trivial.

The xensource hg repository contains a directory called unmodified_drivers. To be polite, the instructions for building these are woefully incomplete. Also, pulling down the entire xen source code to build a couple of drivers is a pain in the backside.

In essence, what you need to do is as follows:

  • Grab a recent kernel source from kernel.org. I used 2.6.32.11.
  • Find a set of patches to xenify it (I used Andrew Lyon’s Gentoo patches).
  • Configure the kernel in non-Xen mode (i.e. a normal kernel, with the options you would want for full HVM).
  • Build the kernel.
  • Build the PV drivers.

I have simplified this process as follows:

  • I wrote a little script to patch the kernel (more accurately, I adopted and adapted a perfectly useful script from Boris Derzhavets).
  • I pulled the unmodified_drivers directory out of the rest of the Xen source tree, fixed a bug in a Makefile, wrote a build script, and put it in its own .tgz.

So now what you need to do is as follows (I’m building on Ubuntu Lucid, but it doesn’t much matter):

wget \
  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.32.11.tar.bz2
tar xvjf linux-2.6.32.11.tar.bz2
cd linux-2.6.32.11
wget http://www.alex.org.uk/pvdrivers-0.01.tgz
tar xvzf pvdrivers-0.01.tgz
wget \
  http://gentoo-xen-kernel.googlecode.com/files/xen-patches-2.6.32-1.tar.bz2
pvdrivers-0.01/xenify.sh xen-patches-2.6.32-1.tar.bz2
# Use a normal HVM config *NOT* a Xen config
cp /boot/config-`uname -r` .config
make menuconfig
(exit)
make localmodconfig
(accept defaults)
make -j 9
cd pvdrivers-0.01
./build.sh

And that should be it. In case you are wondering where pvdrivers-0.01.tgz came from, here’s roughly how you make it:

mkdir pvdrivers
cd pvdrivers
hg clone http://xenbits.xen.org/xen-4.0-testing.hg
cd xen-4.0-testing.hg
# Fix duff makefile
perl -pi -e 's/ vbd.o$/ vbd.o vcd.o/' \
   unmodified_drivers/linux-2.6/blkfront/Kbuild
# Remove the unneeded Xen Source code
ls -1 | egrep -v 'unmodified|COPYING' | xargs rm -rf
rm -rf .hg
cd ..

You’ll then need to add the two shell scripts I wrote (or your own equivalent). The main surprises here are:

  • The XL environment variable needs to point to a kernel source tree with a configured and built xenified kernel in, but one which has not been configured for Xen.
  • The XEN environment variable does not need to point to a Xen source tree. Instead it needs to point to include/xen within the Linux tree.

The 3.3-testing PV drivers do not build (though it’s an easy fix, one of the hypervisor includes has moved). However, PV drivers built using this package (from 4.0-testing) seem to work just fine with older versions of Xen, so there’s not a lot of point in bothering.