Learn through suffering, really!
I started to build VMware RedHat guests with no experience whatsoever and while doing so I followed nothing else but what I have already learnt and what I do know – AIX. Building, I have used LINUX Logical Volume Manager to create volume groups and logical volumes which I topped with file systems. Following the best procedures, I populated
/etc/fstab with UUID’s of the disks. Why? Because the literature says that the
universally unique identifiers (which AIX equivalent is called
PVID) is the preferred way to associate disks with their file systems because once
UUID get assigned to a disk it will never change (a disk description like
/dev/sda may change to something else if more disks are added to the host).
To see the currently used
UUID‘s execute command called
# blkid | sort /dev/mapper/oracle_vg-u01_lv: UUID="7a9e4e58-174b-4567-93a7-9a479d4ce999" TYPE="ext4" /dev/mapper/vg_sys-lv_home: UUID="8e393748-4432-4752-900d-dbdd71a4f7bb" TYPE="ext4" /dev/mapper/vg_sys-lv_root: UUID="c2c612e2-ccf0-4311-b705-c1788a8afbbf" TYPE="ext4" /dev/mapper/vg_sys-lv_swap: UUID="03f634ac-28d6-44fb-9742-dfb1e621e358" TYPE="swap" /dev/mapper/vg_sys-lv_temp: UUID="0e830c56-c697-403f-993e-ae442e6827f8" TYPE="ext4" /dev/mapper/vg_sys-lv_usr: UUID="cbda1389-a042-4059-b43a-bb0920e02d2d" TYPE="ext4" /dev/mapper/vg_sys-lv_var: UUID="48e1864c-1e09-4204-88ad-7ca16429c8cd" TYPE="ext4" /dev/sda1: UUID="e7d8928a-fc04-40fd-b625-e9da99732c3b" TYPE="ext4" /dev/sda2: UUID="6wZrl9-XhOr-RALD-9Sbp-Lhoi-GRYe-EIs4LH" TYPE="LVM2_member" /dev/sdb: UUID="1mddKd-lDw7-YjDe-BVt9-4pKO-j8Oh-w3x6UA" TYPE="LVM2_member"
From the output above, we all can see that the logical volume named
u01_lv (member of
oracle_vg volume group) is assigned
The next listing represents the contents of the host
/etc/fstab file using the
UUID code for
u01_lv – as I originally entered it.
UUID=7a9e4e58-174b-4567-93a7-9a479d4ce999 /u01 ext4 defaults 1 2
Then, I noticed that the “other” logical volumes in this file had
2 so without much thinking I followed the already established pattern and I use them too. By the way, these other logical volumes belonged to the guest “
Days became weeks, weeks became months. Everything worked like a charm. About a year later, a guest had to be rebooted and it did not came back. The console “said” that
UUID of the disk holding
u01_lv has changed…… Isn’t this just peachy?
At this time, I recognized my mistake which I made without even knowing it. For as long as the
UUID of the
sdb disk stayed the same my mistake remained hidden. But eventually …… . For some reasons beyond our knowledge, one disk
UUID (which should always be “unique” and “constant”) has changed and its corresponding entry in the
/etc/fstab file was not longer true which resulted in the following – First, LINUX kernel “see” that
/dev/sdb has different
UUID and it throws a message about it to the console. Next, kernel want to mount
/u10 that is not there so kernel decides to
fsck the logical volume and the volume is not there …. . Kernel has no idea what is going on and it surrenders delivering us to the “Maintenance Mode” – please help me! Here, nothing can be done because the /etc/fstab cannot be edited as
/ is mounted
It took a few minutes and the guest was booted in the
rescue mode – in this mode you can modify the contents of the
/ file system. What content? Like for example
/etc/fstab. Inside this file, the start and the end of the line describing
/u01 file system has to be modified. To obtain the new
UUID one need to execute the
blkid command. Next, comes the end of the line – the last two digits to be exact. As per Mike’s advice, I replaced the existing numbers with
0 (zero). So now the line reads:
UUID=fa9e4e58-174b-4567-93a7-9a479d4ce341 /u01 ext4 defaults 0 0
Why the two zeros? After the host was back on-line, I spent some time and actually read the you guess what (the
man pages) – actually I got this info from www.linfo.org where I found a very detailed description of
/etc/fstab contents. Below are two two sections dealing with the fifth and the sixth element.
(5) The fifth column is used to determine whether the dump command will backup the file system. This column is rarely used and has two options: 0, do not dump, which is used for most partitions, and 1, dump, which is used for the root partition.
(6) The sixth column is used by the fsck program to determine the order in which the computer checks the file systems when it boots. The three possible values for the column are: 0, do not check, 1, check first (only the root partition should have this setting) and 2, check after the root partition has been checked. Most Linux distributions set all the partitions to 0, except for the root partition. If maintenance is important, 2 should be used, although this can increase the amount of time required for booting.
Going back to the changed
UUID it could be possible that this is the reason why – http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1026710).
During this “difficult” time, our new LINUX/TIVOLI administrator Mike “Ski” Swierczynski (also an ex Marine) walked me through the recovery process and he was the one who pointed the wrong options (the last two columns) in the
/etc/fstab – thanks Mike and Semper FI!