Block vs file level copy

Discussion in 'Apple' started by Király, Mar 8, 2014.

  1. Király

    Király Guest

    We had a discussion a while back about the speed of a block level vs a
    file level copy of one volume to another. It's generally considered
    that a block-level copy is faster, but some argued that a file-level
    copy will be faster if the volume contains a lot of free space. There
    was more disagreement about whether a block-level copy copies that free
    space or not. I had the chance to test it over the last two days.

    I had a 2TB volume that was only 25% full (of Time Machine backups.) I
    wanted to copy it to a 3TB volume.

    Disk Utility's Restore function, which unmounts both the volumes and
    does a block-level copy, took 16 hours over FW800.

    SuperDuper, which leaves the volumes mounted and does a file-level copy,
    took 5 hours.

    That settled it for me.
     
    Király, Mar 8, 2014
    #1
    1. Advertisements

  2. Király

    Alan Browne Guest

    Useful. Thanks.

    Your data shows that block level is in fact faster (100%/16 hours =
    6.25% / hour) v. [25%/5 hours = 5%/hour]. So you'd want the source disk
    to be at least 80% full to use Block rather than file.

    That's a bit rough, however - since another issue with file level
    transfer is that the heads would be moving a lot more often, whereas
    with block they would (in theory) only move cylinder to cylinder as needed.

    So I'd guesstimate that the breakeven point is somewhere around 50 - 60%
    based on your numbers.
     
    Alan Browne, Mar 8, 2014
    #2
    1. Advertisements

  3. Király

    JF Mezei Guest

    (Mr browne complains if I don't quote some text, so I quoted some text)

    Although Apple has hidden from the masses the problems of disk
    fragmentation and other optimisation issues, just wondering how much a
    file copy recreates the disk as it was or whether it rebuilds files from
    scratch on new more continguous disk space ?

    While large drives do allow files to be moved around to find contiguous
    free space, once your drive use grows above a certain percentage, I have
    to wonder how well the OS can defragment under thr hood.

    Would rebuilding on a file basis reorganise the disk "clean" and give it
    a fresh start ? Or is the difference between the original and the copy
    in terms of fragmentation and file placement so minimal that it is not a
    consideration ?
     
    JF Mezei, Mar 8, 2014
    #3
  4. Király

    Lewis Guest

    A block level copy is always faster if the block size is set properly.
    The smaller the block size, the slower the copy (but the more the copy
    can cover for errors, which is why you might want to use a small block
    size).

    $ dd if=/dev/zero bs=1 count=1024 of=test
    1024+0 records in
    1024+0 records out
    1024 bytes transferred in 0.007203 secs (142166 bytes/sec)
    $ dd if=/dev/zero bs=1024 count=1024 of=test
    1024+0 records in
    1024+0 records out
    1048576 bytes transferred in 0.007179 secs (146061124 bytes/sec)

    So, 0.13MB/s versus 140MB/s, or 1000 times slower with a block size of 1
    (512 bytes) instead of 512K.

    But it gets worse:

    $ dd if=/dev/zero bs=2048 count=1024 of=test
    1024+0 records in
    1024+0 records out
    2097152 bytes transferred in 0.008700 secs (241048286 bytes/sec)
    $ dd if=/dev/zero bs=4096 count=1024 of=test
    1024+0 records in
    1024+0 records out
    4194304 bytes transferred in 0.014646 secs (286377764 bytes/sec)

    2MB block size and we're up to almost 275MS/s

    $ dd if=/dev/zero bs=8192 count=1024 of=test
    1024+0 records in
    1024+0 records out
    8388608 bytes transferred in 0.017961 secs (467045054 bytes/sec)
    $ dd if=/dev/zero bs=16384 count=1024 of=test
    1024+0 records in
    1024+0 records out
    16777216 bytes transferred in 0.110376 secs (152000743 bytes/sec)

    OK, so we peaked at 445MB/s with a blocksize of 8MB and doubling the
    blocksize again really slowed us down.

    So, 0.13MBs or 445MB/s?

    Disk Utility uses what block size?
     
    Lewis, Mar 8, 2014
    #4
  5. Király

    Lewis Guest

    s/hidden/eliminated/
     
    Lewis, Mar 8, 2014
    #5
  6. Király

    billy Guest

    $ diskutil info disk0s2
    Device Identifier: disk0s2
    Device Node: /dev/disk0s2

    Volume Name: Macintosh HD

    File System Personality: Journaled HFS+
    Type (Bundle): hfs

    Partition Type: Apple_HFS

    Total Size: 749.3 GB (749296615424 Bytes) (exactly 1463469952 512-Byte-Units)
    Volume Free Space: 538.8 GB (538799898624 Bytes) (exactly 1052343552 512-Byte-Units)
    Device Block Size: 512 Bytes

    I'd hope block copies are done in at least 4096 byte chunks,
    but I don't know how to check that...

    Billy Y..
     
    billy, Mar 8, 2014
    #6
  7. Just want to point out that the bs option of dd takes its argument (1
    and 1024 above) in bytes (not multiples of 512 bytes). So the above
    copied 1-byte and 1024-byte blocks. Note the totals at the end of
    each run:
    1024 bytes transferred [1byte x 1024]
    1048576 bytes transferred [1024bytes x 1024]
    (Perhaps you've confused dd with dump.)

    Martin
     
    Martin Frost me at invalid stanford daht edu, Mar 9, 2014
    #7
  8. Király

    Lewis Guest

    The man page said:

    Input data is read and written in 512-byte blocks.

    Which is what threw me off.
     
    Lewis, Mar 9, 2014
    #8
  9. Király

    Alan Browne Guest

    Agreed. As I said, "a bit rough", "guesstimate", "somewhere around"...
    I didn't know that, but it makes sense.

    --
    Those who have reduced our privacy, whether they are state
    or commercial actors, prefer that we do not reduce theirs.
    - Jaron Lanier, Scientific American, 2013.11.

    Privacy has become an essential personal chore that most
    people are not trained to perform.
    - Jaron Lanier, Scientific American, 2013.11.

    ... it may be that "in the cloud" really isn't the best term
    for the services these companies offer. What they really
    want is to have us "on the leash."
    -David Pogue, Scientific American, 2014.02
     
    Alan Browne, Mar 9, 2014
    #9
  10. A file copy does not re-create the on disk structure of the file. It
    copies the data blocks only.
     
    Oregonian Haruspex, Mar 10, 2014
    #10
  11. Király

    Siri Cruz Guest

    A Disk Utility block level copy is a sector by sector copy as accurately as the
    hardware can do it. When a disk drive fails, you can lose critical sectors such
    as the superblock before you realise it. Attempt to repair the failing drive is
    pointless. Instead you want to copy what you can off the old drive and then
    repair the file system on good hardware.
    If the source disk is intact and readable, sure just copy the inodes. It might
    even be able to speed it up more by reading multiple files: the more sector
    reads in the queue, the more likely the disk head is near a requested sector.
     
    Siri Cruz, Mar 10, 2014
    #11
  12. Király

    Siri Cruz Guest

    'Disk fragmentation' is only a problem on a single process OS where a process
    has only one or few files open. A multi-tasking operating system is going to be
    having hits all over the disk regardless of how individual files are allocated.
    And not only do you have the files you know that you have open, the kernel is
    also doing its own disk activity with swaps, paging, updating inode stats,
    checkpoints, etc.

    One of the old time problems was actually logical sectors too close to each
    other when disk drives were fast compared to CPUs: the processor could not set
    up for the next sector within the sector gap, so successive sector reads/writes
    ended up a whole rotation apart. The solution was to space out logical sectors
    over noncontiguous physical sectors (aka half-tracking).
     
    Siri Cruz, Mar 10, 2014
    #12
  13. Király

    Siri Cruz Guest

    One of the original purposes of dd was reel-to-reel magnetic tape. Magnetic tape
    is formatted as block of bytes by gaps.

    Magnetic tape was sold with random bad spots. The drive would detect a bad write
    and just write again a bit further along. The longer a block the more likely it
    will overlay a bad spot. Also the tape could not be stopped in midblock. The
    hardware had to be able to read or write an entire block as it passed through
    the heads. On older system the amount of real memory the kernel could use as a
    buffer was much more limitted.

    The tape has to be moving at a constant velocity through the head to read and
    write blocks: the gaps have to be large enough to stop the tape and start it up
    when the processor is not ready for the next block. When blocks are small, the
    gaps between them (which are wasted tape) take up more of the reel. When the
    blocks are large, they are more likely to hit a bad spot or overrun buffers.

    Also blocks on one reel don't have to be the same size. Tape blocks are defined
    as a region of tape as it moves through the heads, unlike a disk surface that
    periodically repeats the same angular displacement.

    That made managing tape blocks an art and science to most efficiently utilise a
    tape reel. Many of the dd options are left from those needs.
     
    Siri Cruz, Mar 10, 2014
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.