Copied directories don’t match in size

Q: When I copy files to another directory, the inode usage and the size of the copied directories are different. What is happening?

A: When you copy files from one filesystem to another you will almost *ALWAYS* have a different amount of space occupied as well as a different number of inodes used. The reasons are twofold: directories and sparse files. Let’s start with directories:

  • Directories are just special files that hold information about filenames and inode numbers. When a directory is first created, the size of the directory is 96 bytes. After creating a bunch of files inside the directory, it will grow to accomodate the additional entries. However, if you create 10,000 files in the directory, then remove all but 1 file, the directory will still be several kilobytes in size. The reason is that there are now a bunch of empty slots in the directory but the overhead needed to compress the directory after each file is removed would be enormous, so the directory is left as is with lots of empty slots waiting to be reused.Now if you copy this directory using cp -r or use tar or cpio or any other backup program to copy the directory to a new location, the directory will be created as 96 bytes and the one file fits nicely in this new directory. But the occupied space shown by du or bdf will be different between the original (which is bigger) and the copy (which is smaller). The result is perfectly OK though.
  • Sparse files: This is a file that is created by using lseek to write a new record, then skip a million records and write another record at position 1,000,000. The resultant file contains 2 valid records and 999,998 records full of nulls. On the original system, the space will show up in wc and ls -l but the undefined records are not stored nor counted in bdf or du. Depending on the size of the file and the spareseness, the difference in apparent versus actual size may be VERY large. For example, create your own sparse file with:
    dd if=/etc/issue of=/var/tmp/sparse bs=4096k seek=1

    where you will see the original file is just a few dozen bytes, the result with ls -l or wc -c shows a 4 meg file, but du will show the file as occupying just a bit more than the original /etc/issue file. A cp of the file will create a new file that is the same size (using ls -l or wc -c) but du will now show a MUCH larger size than the original file and it will use more inodes. However, both the original and the copy will diff exactly the same and programs cannot tell any difference between the two files.

So in summary, you can’t use bdf or du (or df) to verify a directory copy. Instead, use find to count the files and the directories and if necessary, use ls -l to find the size of both source and destination files and compare those numbers. Be sure to exclude the lost+found directories as they may not be found in both source and destination. You can get a faster result using df -i on the source and destination directories.

– See more at: http://serviceitdirect.com/blog/copied-directories-dont-match-size#sthash.tN9oJW7v.dpuf


Tags: