Shrinking physical volumes in LVM on a Linux Guest in ESXi 5.0

  • The problem:

    Linux guest (OpenSuse 12.1), with multiple virtual disks attached.

    3 disks are in a logical volume, two of which are exactly 2TB.

    None of the disks are independent, and due to the backup software we use, cannot be independent.

    When the two 2TB virtual disks are "dependent", the snapshot fails stating that the file is too large for the datastore. When I put those two disks in independent mode, snapshots work fine (the other disk is 1.8TB).

    I have therefore concluded that even shrinking the two physical disks by 100GB should solve the problem, however I am having trouble conceptualizing how to go about getting those disks smaller without breaking the LVM entirely.

    The actual LV has 1.3TB free, so there is plenty of space to shrink with.

      What I need to accomplish:

      Deallocate 100GB from the two, 2TB virtual disks within the linux guest.

      Shrink the two virtual disks by 100GB within vsphere (not as complicated).

      Are there any vsphere/LVM gurus that can give me a clue?

      Edit:

      Fixing formatting:

      Something like this? e2fsk -f /dev/VGroup1

      resize2fs /dev/VGroup1 5922108040K (that is a 200GB shrink in KB)

      lvreduce -L 209715200K /dev/Vgroup1 pvresize /dev/sdb1(and sdc1) --

      setphysicalvolumesize 2042625023K Correct?

      Another thought occurred to me: Maybe to play on the safe side I should reduce 25G more than I plan on reducing the disks, to ensure that the physical volumes aren't smaller than the filesystem.

      Answers(20)

      • Well it is crash consistent, but can loose recently written data not yet committed to journal. Any application that can recover from powerloss can recover from non-coordinated snapshots as well. This is especially true for transactional databases like Mysql InnoDB, as opposed to Mysql MyISAM which will burn data in that case. IIRC Mysql MyISAM will burn data even with frozen filesystem snaps, but not as badly. To backup Mysql MyISAM requires FLUSH TABLES WITH READ LOCK as well. So it all comes down to how well your application recovery is designed. – korkman Sep 4 '11 at 12:00

        • There is 4TB free on the datastore, so I don't think that is the issue. – Stew Apr 15 '12 at 12:38

        • Every time you perform an operation with LVM, by default, the previous metadata is archived in /etc/lvm/archive. You can use vgcfgrestore to restore it, or grab the extends by hand (harder, but lvcreate(8) should cover it).

          Edit:

          And to make it as easy as possible, I should add that you can find the last backup before your destructive operation by looking at descriptions:

          # grep description /etc/lvm/archive/vg01_*
          /etc/lvm/archive/vg01_00001.vg:description = "Created before executing 'lvremove -f /dev/vg01/foo'"
          /etc/lvm/archive/vg01_00002.vg:description = "Created before executing 'lvremove -f /dev/vg01/bar'"
          /etc/lvm/archive/vg01_00003.vg:description = "Created before executing 'lvremove -f /dev/vg01/baz'"
          

          Edit:

          The normal allocation policy (default one) will allocate a stripe from the first free PE when there is enough room to do so. If you want to confirm where the LV was allocated, you can look in the archive files, those are perfectly readable by humans.

          • try inspecting the block device with sudo less -f /dev/sda5 which should show you all recent changes to lvm metadata. This may be more accurate than what vgcfgrestore finds in /etc/lvm especially when the disk is corrupted. Try extracting the right one by timestamp to a file and run vgcfgrestore from the file. – Martian Nov 19 '11 at 19:50

            • Oh it's multi-extent? hmmm, not a big fan of doing that but yeah, I can see how that changes things, still at least you'll be able to follow that tool chain I mentioned. – Chopper3 Apr 15 '12 at 12:40

            • There aren't any really good recovery options, and there are no tools that support this to my knowledge. However, see the data recovery section of LVM dangers and caveats for some articles on manual recovery.

              Generally it's best to do a raw image backup of the broken volume(s) or underlying disks, and do the data recovery on the backup versions, so that you have a way of retrying the recovery.

              The above answer also has a section on resizing LVM volumes - expanding them is reasonably safe, and it's usually better to use lvresize instead of deleting and recreating.

              On a related point: since you are using VMware, you should also take care that hard disk write cache flushes (write barriers) propagate correctly from the guest Linux kernel through the hypervisor and any host OS. It's also important that write barriers are set up in the guest FS and guest kernel, which should be 2.6.33 or higher.

            • This isn't a VMWare issue really, the issue with the 2TB vmdk's is that there's no space left on the datastore to commit to a snapshot, as you say dropping the size of the vmdk will allow that to work.

              Now obviously you can use the usual chain of umount, e2fsck, resize2fs, lvreduce and pvresize then reduce the vmdk size within the vsclient, but there's another thought, if you have enough temporary space you could just convert them to thin disks. Obviously there can be a write penalty for this but it'd mean you'd not have to touch your guest filesystem.

              • There is 4TB free on the datastore, so I don't think that is the issue. – Stew Apr 15 '12 at 12:38

                • I removed the comment and added it to an edit on the original question, better formatting. – Stew Apr 15 '12 at 13:03

                • to recover a deleted LVM volume I'd guess you'd need to go low level and re-create the volume using device mapper, not lvm. Also, you can use lvdisplay --maps to see LE to PE mapping (the exact and most important thing you need to recreate). – Hubert Kario Oct 9 '11 at 23:39

                • This isn't a VMWare issue really, the issue with the 2TB vmdk's is that there's no space left on the datastore to commit to a snapshot, as you say dropping the size of the vmdk will allow that to work.

                  Now obviously you can use the usual chain of umount, e2fsck, resize2fs, lvreduce and pvresize then reduce the vmdk size within the vsclient, but there's another thought, if you have enough temporary space you could just convert them to thin disks. Obviously there can be a write penalty for this but it'd mean you'd not have to touch your guest filesystem.

                  • I would need to have run "lvdisplay --maps" before deleting the volumes, right? (When I run "lvdisplay --maps" on the new server, it shows the LE to PE are mapped the way I would expect, with the first logical volume starting at PE 0.) – Vegar Nilsen Oct 10 '11 at 9:05

                  • It isn't consistent. Whether the dom0 LV contains a partition table, etc., or just a filesystem is irrelevant. The dom0 can only freeze those filesystems mounted in the dom0.

                  • Oh it's multi-extent? hmmm, not a big fan of doing that but yeah, I can see how that changes things, still at least you'll be able to follow that tool chain I mentioned. – Chopper3 Apr 15 '12 at 12:40

                  • This did not go well, however I am going to attempt it again when I can plan for a bit of downtime of that volume. Thanks for the answer, this is the route I am going to try and worse case scenario I can delete everything and recreate it from backup. Thanks! – Stew Apr 17 '12 at 7:02

                  • This did not go well, however I am going to attempt it again when I can plan for a bit of downtime of that volume. Thanks for the answer, this is the route I am going to try and worse case scenario I can delete everything and recreate it from backup. Thanks! – Stew Apr 17 '12 at 7:02

                  • A thought: Perhaps the simplest way to achieve this would be: Add an additional 1.8TB disk, run pvmove on one of the 2TB disks, when all the data is moved off that disk to the new one, remove it from the vg and what virtual machine. Rinse repeat for the second disk. As a matter of fact I am going to try that today. – Stew Apr 15 '12 at 11:52

                  • yes, you need to have the mapping from before deletion. – Hubert Kario Oct 12 '11 at 12:50

                    • I removed the comment and added it to an edit on the original question, better formatting. – Stew Apr 15 '12 at 13:03

                    • A thought: Perhaps the simplest way to achieve this would be: Add an additional 1.8TB disk, run pvmove on one of the 2TB disks, when all the data is moved off that disk to the new one, remove it from the vg and what virtual machine. Rinse repeat for the second disk. As a matter of fact I am going to try that today. – Stew Apr 15 '12 at 11:52