I had an interesting situation today where qemu-img create performed oddly when using one particular NFS filer. The symptoms were:

  • with -f qcow2 it worked as expected, and a 500G image is approximately 1MB
  • with -f qcow2 -o preallocation=metadata, the image took hundreds of gigabytes

This is not meant to happen. The files are meant to be sparse (i.e. have holes in them). The metadata preallocated is pretty small. On any other NFS filer I’ve tried, and on any local filing system I’ve tried, this works as expected.

I therefore needed to narrow the problem down, and armed with an strace of what qemu-img was actually doing, I built sparsetest which creates a sparse file – the source is here.

The results are pretty interesting. Here’s a normal ext4 volume.

$ ./sparsetest -b 4K -s100M -w1M test
Results:
  Intended logical size:       104857600 bytes;             100 M;           25600 blocks of 4096 bytes
  Optimum physical size:          409600 bytes;               0 M;             100 blocks of 4096 bytes
   Actual physical size:          409600 bytes;               0 M;             100 blocks of 4096 bytes

Used 100 writes of 4096 bytes every 1048576 bytes in ascending order
Created 800 512 byte blocks on disk
Density as % of actual physical size over logical size: 0.390625 %
Efficiency as % of optimum physical size over actual: 100.000000 %

What I’ve asked it to do there is write a 100MB (logical size) file, and every 1MB of data, write 4K of random junk. At the end of writing the file, it uses ftruncate to set the logical extent to exactly 100MB.

So the 100 4K sections of random junk are the only content in the file. The file is gratifyingly 409,600 bytes long – exactly as it should be, so 100% efficient at encoding the sparse nature of the file. And it’s 0.39% dense, i.e. 0.39% of the logical space is reflected in physical space.

I have a flag to set the logical extent of the file using ftruncate at the start rather than at the end. Unsurprisingly, this makes no difference here.

$ ./sparsetest -i -b 4K -s100M -w1M test
Results:
  Intended logical size:       104857600 bytes;             100 M;           25600 blocks of 4096 bytes
  Optimum physical size:          409600 bytes;               0 M;             100 blocks of 4096 bytes
   Actual physical size:          409600 bytes;               0 M;             100 blocks of 4096 bytes

Used 100 writes of 4096 bytes every 1048576 bytes in ascending order
Created 800 512 byte blocks on disk
Density as % of actual physical size over logical size: 0.390625 %
Efficiency as % of optimum physical size over actual: 100.000000 %

So, let’s see what happens on the filer in question:

$ ./sparsetest -b 4K -s100M -w1M /path/to/test
Results:
  Intended logical size:      104857600 bytes;            100 M;          25600 blocks of 4096 bytes
  Optimum physical size:         409600 bytes;              0 M;            100 blocks of 4096 bytes
   Actual physical size:      131252224 bytes;            125 M;          32044 blocks of 4096 bytes

Used 100 writes of 4096 bytes every 1048576 bytes in ascending order
Created 256352 512 byte blocks on disk
Density as % of actual physical size over logical size: 125.171875 %
Efficiency as % of optimum physical size over actual: 0.312071 %

Eek! My 100MB sparse file is no longer sparse. In fact it’s negatively sparse! It uses 125MB on disk (a density of 125%). And the efficiency is tiny (0.3%).

So out of interest, let’s run it calling ftruncate before writing the data to the file, so that the writing itself never expands the file.

$ ./sparsetest -i -b 4K -s100M -w1M /path/to/test
Results:
  Intended logical size:      104857600 bytes;            100 M;          25600 blocks of 4096 bytes
  Optimum physical size:          409600 bytes;             0 M;            100 blocks of 4096 bytes
   Actual physical size:          413696 bytes;             0 M;            101 blocks of 4096 bytes

Used 100 writes of 4096 bytes every 1048576 bytes in ascending order
Created 808 512 byte blocks on disk
Density as % of actual physical size over logical size: 0.394531 %
Efficiency as % of optimum physical size over actual: 99.009901 %

Well that’s pretty much normal.

So what’s happening here is that when a sparse file is expanded using ftruncate, the filer is fine. I the sparse file is expanded using pwrite at an offset beyond the end of the file, bad things happen; it would appear the amount by which the file is extended (or indeed more) is allocated to the file.

I suspect this may not be the filer vendor (who I am not naming unless they want me to), but rather a product of the (Linux based) underlying filesystem that the filer vendor uses (I don’t know what that is yet). I suspect this has something to do with treatment of how the write is journaled.

I’ve never seen this before. But if you want to test your filesystem’s treatment of sparse files, here‘s some GPL code that will let you do it.