Skip to content

recovering rootvg missing vSCSI disks

Getting ready to AIX upgrade, it become apparent that “something” happened to one of the two VIO servers of this managed system (frame). All sixteen guests (lpars) had missing disk. The missing disk was always hdisk0 which points (in our case) to vios1 (the hdisk1‘s are delivered from disk pool of vios2).

At this time both VIOS servers are fully operational so the current question is how to quickly recover and restore the rootvg mirroring on the affected partitions?

Start with listing the host dump devices.

# sysdumpdev -l
primary              /dev/dump0
secondary            /dev/dump1
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         fw-assisted
full memory dump     disallow

Temporarily disable them.

# sysdumpdev -P -p /dev/sysdumpnull
# sysdumpdev -P -s /dev/sysdumpnull

Verify the last two steps:

# sysdumpdev -l
primary              /dev/sysdumpnull
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    TRUE
dump compression     ON

After LVM detects issues with a disk lasting longer than a “certain” length of time, it will declare this disk missing and LVM will no longer be interested in this device. You have to make LVM “re-analyze” rootvg disks executing the varyonvg rootvg command. LVM will change the state of the previously missing disk to active and in this case since volume group is mirrored this will also automatically trigger the syncvg command resulting in gradual disappearance of stale partitions.

# varyonvg rootvg

Verify that both disks are active

# lsvg -p rootvg

Check for the logical volume synchronization executing the ps command and if lvsync is not running start it executing syncvg -P 32 -v rootvg

Finally, activate both dumps πŸ™‚

# sysdumpdev -P -p /dev/dump0
# sysdumpdev -P -s /dev/dump1

Copy/paste/reuse on all remaining partitions.

Posted in Real life AIX.

extending file system in LINUX

This morning there is a ticket in my queue to extend the /tmp file system on one of RedHat 6.2 hosts to 6gb. A few weeks earlier, when needed to do that, I used to commands. First the lvextend to make larger the underlying file system logical volume and next, the command resize2fs to extend the file system to use the additional capacity of its logical volume.

Today, I actually took the time to read the output of the man lvextend and I recognized that this operation just like in AIX can be done with a single step. The objective is to make /tmp 6gb big.

# df -h /tmp
Filesystem            Size  Used Avail Use% Mounted on
                        4.0G  137M  3.7G   4% /tmp

# lvextend -r -L 6G /dev/mapper/vg_sys-lv_temp
  Extending logical volume lv_temp to 6.00 GiB
  Logical volume lv_temp successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_sys-lv_temp is mounted on /tmp; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/vg_sys-lv_temp to 1572864 (4k) blocks.
The filesystem on /dev/mapper/vg_sys-lv_temp is now 1572864 blocks long.

# df -h /tmp
Filesystem            Size  Used Avail Use% Mounted on
                      6.0G  137M  5.5G   3% /tmp

Note: It is the -r that makes extendlv to extend the logical volume and the associated with it file system. Using -L 6G sets the target size at 6gb but using -L +6G would make /tmp 10gb.

Posted in Real life AIX.

the cost of distraction …… restoring permissions on a filesystem and its contents

Today is a special day for us. We screw up and now we need to change ownership of every file and directory in a file system. How did we get there? We needed to change owners of file systems which names followed the pattern of /u01 through /u60. So what we did was chown oracle.dba /u* instead of chown oracle.dba /u[0-9][0-9]!
As the result the /usr file system and all of its contents got a new “mommy” and “daddy” which are known now as oracle.dba ….. Now, the original owners have to be restored and they might be not root.system …..
So what we did? We located a host with identical version of AIX as the two hosts we just messed up. After login-in, we executed the following script:

cd /tmp
rm reset.perms.out 2>/dev/null
find /usr -ls |awk β€˜{print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}’|
awk β€˜{ if ( NF == β€œ9β€³ ) {
printf (β€œchown %s.%s %s\n”,$3,$4,$9)
if(substr($1,2,1) == β€œr”)
            perms = perms + 400
if(substr($1,3,1) == β€œw”)
            perms = perms + 200
if(substr($1,4,1) == β€œx”)
            perms = perms + 100
if(substr($1,4,1) == β€œS”)
            perms = perms + 4000
if(substr($1,4,1) == β€œs”)
            perms = perms + 4100
if(substr($1,5,1) == β€œr”)
           perms = perms + 40
if(substr($1,6,1) == β€œw”)
           perms = perms + 20
if(substr($1,7,1) == β€œx”)
           perms = perms + 10
if(substr($1,7,1) == β€œS”)
           perms = perms + 2000
if(substr($1,7,1) == β€œs”)
           perms = perms + 2010
if(substr($1,8,1) == β€œr”)
           perms = perms + 4
if(substr($1,9,1) == β€œw”)
           perms = perms + 2
if(substr($1,10,1) == β€œx”)
           perms = perms + 1
if(substr($1,10,1) == β€œT”)
           perms = perms + 1000
if(substr($1,10,1) == β€œt”)
           perms = perms + 1001
           printf(β€œchmod %d %s # %s\n”,perms,$9,$1)
}’ >reset.perms.out

This script scans /usr and records its of its entities owners and permissions. This information is then stored in the file /tmp/reset.perms.out which was copied to with scp to each host that need /usr ownership restored. Next, the rest.perms.out was made “executable” chmod 700 reset.perms.out and executed. Nice!

You do know what to change if you need to use this script on a different file system, right? Yes, just replace the /usr above with the file system of your choice.

Posted in Real life AIX.

Tagged with .

adventures with npiv, xiv and brocade switches

Something really strange happened to me today ….. year or so ago, I built two “lpars” using the standard (for this site) approach – the toorvg disks as vscsi devices and SAN disks via two virtual FC adapters from each VIO server (four FC adapters in LPAR). Later each partition received two SAN disks and everything went dormant for almost six months. Last Friday, I asked for more storage, got it, created volume groups, logical volumes, file systems and at the end of the day, I rebooted both hosts and went home.

Monday, I was back in my cube to continue what I have left behind and here I have received my surprise – one of the two hosts had no SAN storage! Executing lspv showed only the two vSCSI disks defining the rootvg. Executing the command lsdev -Cc disk showed the “missing” disks but in the “Defined” state! I spend a few minutes trying to “resurrect” them doing the usual rmdev -dl hdisk# / cfgmgr to no avail. I gaved up and took a moment to think about it.

I know that if resources (virtual adapters, memory, CPU, and so forth) are added “dynamically” via HMC with the “Dynamic Logical Partitioning” option and the host is rebooted later the added resources will be “gone”, they will disappear – by the way this is something I cannot get used to (I think DLPAR should be permanent). I also know that to make DLPAR “modification” permanent I have to modify the partition PROFILE and if a reboot is required, I power the host and then I do ACTIVATE its profile on POWER ON!

So what went wrong this time? Is this AIX/VIOS/HMC error or mine or just simply some kind of a witchcraft? I have no idea. I called for help and IBM engineer informed me that they do not have any other customers reporting something like this …… This leaves me and the witchcraft – let spread the guilt equally πŸ™‚

But there is a lesson to be learnt from this experience and if you are new to VIOS, NPIV, AIX and everything in between this post could be your lesson too. Flip to the next page and you will find a valuable and interesting material (straight from IBM customer support) showing how to zone a XIV LUN (via a Brocade switch) to AIX partition with virtual FC adapters. Enjoy it!

Posted in AIX, Real life AIX.

rpm for AIX adminstrator

As I go deeper and deeper into the LINUX woods it becomes apparent that I better learn something about them both – rpm and yum. So what are they? The first one was originally developed to install software packages from the local repositories aka someone downloaded the package or packages to the host prior to execution of rpm. The second one (Yellow Dog Updater and Modifier) has been created with a remote software repository in mind. You could say that rpm is like installp and yum sort of like SUMMA (both in AIX) except that rpm was later modified and now you can use it to install software package from a remote source – the plot thickens …..

What follows it the listing of examples of rpm in action.

Posted in LINUX.

Tagged with .

enabling ftp on a LINUX (RedHat) host

Someone asked my to enable ftp on a LINUX host – what a lesson of humility!!!! I have no clue what to do! I have to learn! So, if you are like me – an otherwise “experienced” AIX administrator who is learning LINUX – you may benefiting from this post.

Remember that in RedHat the ftp package is called vsftpd …… on other LINUX distribution this name could vary(?) . Next, this package could be installed but the ftp service is not activated …..

To start, check if ftp is setup to run on your box. Execute the following command to see if ftp has been loaded and set to execute.

$ chkconfig --list | grep -i vsftpd
vsftpd     0:off  1:off  2:off  3:off  4:off  5:off  6:off

It looks like it is installed but not activated. Make it run at level 3, 4 and 5.

$ chkconfig --level 345 vsftpd on

$ chkconfig --list | grep -i vsftpd
vsftpd     0:off  1:off  2:off  3:on  4:on  5:on  6:off

Either bounce the box or start the service by hand.

$ service vsftpd start

If the package is not installed (rpm -qa | grep -i ftp than installed it with yum install vsftp – to download it directly from RedHat online repository.

To identify users prohibited from using this service view the file shown bellow:

$ cat /etc/vsftpd/ftpusers
# Users that are not allowed to login via ftp

If you want to allow root to use ftp remove the appropriate entry from /etc/vsftpd/ftpusers.

By the way, LINUX has more then one ftp “package” to choose from.

Posted in LINUX.

Tagged with , .

entstat equivalent for FC adapters in AIX.

The entstat -d ent# commands delivers a lot of useful information about a network adapter. We often used it to determine if the adapter has “LINK” and with what speed it move the data from and to the host. Its equivalent for FC adapters is called fcstat and it works for both physical and virtual FC adapters.

# fcstat -e fcs0 | grep -E "Type|fcs|Port Name|Port Speed"
Device Type: 8Gb PCI Express Dual Port FC Adapter (df1000f333108a03) (adapter/pciex/df1000f114108a0)
World Wide Port Name: 0x10000220C985EA6B
Port Speed (supported): 8 GBIT
Port Speed (running):   8 GBIT
Port Type: Fabric
# fcstat -e fcs0 | grep -E "Type|fcs|Port Name|Port Speed"
Device Type: Virtual Fibre Channel Client Adapter (adapter/vdevice/IBM,vfc-client)
World Wide Port Name: 0xC05076039B790032
Port Speed (supported): UNKNOWN
Port Speed (running):   8 GBIT
Port Type: Fabric

Even more FC adapter statistics can be obtained using the -D option like for example fcstat -D fcs0. The last command generates almost twice as much information than the fcstat -e fcs0.

Posted in Real life AIX.

Tagged with , , .

How to capture boot debug of a SAN boot PowerVM Virtual I/O Server or AIX/NPIV client partition that is failing to boot?

Does it interest you? Go to page number 2.

Posted in Real life AIX.

links to various performance tools for AIX

You will find it all in one place following this link

Posted in AIX.

a few words about EtherChannel

Originally, this technology was meant to protect a host against a failure of its network adapter and/or switch (network switch). Additionally, some unscrupulous salesmen claimed a fantastic increase in throughput aka two adapters tied together will double, three adapters tied together will triple throughput of the associated with them EtherChannel adapter – yes, in a salesman pure land of fantasy.

In reality you may get a few percent higher (max. at about 20%) – you do not EtherChannel for a volume increase but for a good night sleep! There is a big difference between EtherChannel and (Link) Aggregation these terms have different meaning! Check with your switch documentation as to which term they use to what AIX (IBM) calls EtherChannel to avoid confusion -as in most cases aggregation means a trunk of ports not the EtherChannel. By the way, AIX also supports link aggregation.

As you contemplate EtherChannel for your own use, keep in mind that AIX for a long time (I do not remember the version number) has the future called backup adapter …… and that at certain time (check it against your verion of AIX) an EtherChannel adapter built on top of two physical NIC’s (one being the PRIMARY and the second being the BACKUP) was not really an EtherChannel adapter and it requires no changes to the settings of their switches ports…. Finally, it is a BAD idea to connect EtherChannel adapter and its BACKUP adapter to a single switch, really.

Sometimes, the cables that you are convinced lead to different switches are in reality attached to the same switch – the one which just lost power and as the result the most important application was just killed by the cluster deamon on the node that now has no network connectivity and it was moved to the standby node in the other data center with the understandable few minute delay in the application services. It could not happen in a better moment!

To make the long story short, at the end of the day the cables have to be traced, labeled and the date scheduled for their swap. The question remains the same – how do you verify/check what cable goes to the switch A and what cable goes to switch B? What? Why? Well, as the BACKUP adapter is free from traffic its switch port cannot “see” its MAC address.

Let’s agree that our EtherChannel adapter is ent8 consists of ent0 and its backup adapter is ent4. The following shows how you can produce these details. In the output bellow, the entry adapter_names (plural) is not a mistake. An EtherChannel NIC may employee more than one physical NIC and this set of NIC’s may be protected by yet another physical NIC which role is to assume the role of the all composite adapters if they all fail and die – this is the BACKUP adapter. For as long as a single composite EtherChannel adapter is working the BACKUP adapter does nothing, it springs to live when the last component NIC dies an honorable death.

# lsattr -El ent8 | grep adapter  
adapter_names  ent0 EtherChannel Adapters                      True
backup_adapter ent4 Adapter used when whole channel fails      True

It is easy to validate/verify that each participating adapter is connected do a different switch while you are implementing EtherChannel – just assigned an IP address to each one by one each time asking LAN administrator to validate the connection with the switch (he can see individual adapter MAC address). Later it is not as easy….. To repeat the same procedure you have to destroy the EtherChannel and sometimes you may not be able to do it for reasons that are beyond you. So in this case just flip the roles – let BACKUP become ACTIVE and the MAC should be seen (hopefully on the other switch).

MAC – is it the MAC or is it not the MAC? Well, EtherChannel is really cool as it not only allow to group NIC together to keep the resulting logical adapter alive if its components start to die, additionally it allow you to have a backup adapter just in case all the “primary” components (adapters) are no longer operational. The whole idea of multiple adapters sharing the same IP address raises an immediate question which is – “what about the MAC address associated with this IP address?” Why? The changing MAC address may not be really well accepted neither by the operating system on the receiving end, its application of the intermittent routers or switches as in reality this breaks on of the principals of TCP/IP. I think, it is better to use EtherChannel ability yo assign on MAC to all its component adapters. In my case I create this new MAC replacing the first try characters of the first MAC with the string BADBEEF which consists of all valid hex characters …:-) In this case, it is easy to spot the address of an EtherChannel adapter on a switch.
So consult with your switch/router manual ahead of go-live date. There is one more issue to consider here – are the switches capable of sharing vlans? Can a vlanA on port X of switch A be also assigned to port Y of switch B? Most likely this is not an issue, still ask before the go-live.

Since, the existing EtherChannel adapter could not be destroyed, another way had to be identified to validate its components connectivity. There is such a way! All that needs to be done is to flip the adapter roles! To flip the adapters roles (so ent4 becomes the ACTIVE and ent8 becomes the BACKUP) you would have to execute the next command – ethchan_config.

# /usr/lib/methods/ethchan_config -f ent8

I like to exercise my extremities so I do not have the /usr/lib/methods in my PATH ….. πŸ™‚

So far, it works because the EtherChannel adapter had only one primary adapter (ent8). What if there is more? You could use the -d command line options to remove the components adapters till the EtherChannel has the only on active physical adapter and after testing you could re-add whatever you have removed with the -a option for example:

# /usr/lib/methods/ethchan_config -d entX

Test connectivity, and re-add the adapter to the EtherChannel:

# /usr/lib/methods/ethchan_config -a entX

By the way, the last command is really a cool tool to create and to manipulate EtherChannel devices – highly recommended (of course everything EtherChannel wise can be done via smitty too).

If you can, I recommend you insist that the cables from the EtherChannel adapters are of the different color than the cable used to provide connectivity to its BACKUP adapter, really sometimes it is worth it.

At few earlier posts, I mentioned the fact that computer science is in 2.333% (approx.) based on magic, and as such any procedure provided by manufacturer and/or this posts may fail when YOU are attempting to perform it. This one as any other of your failures as a system administrator may also be result of a combination of operating system and firmware versions, your overall luck and your accumulated karma. With this in mind, make sure you understand what you are about to do, set your expectations appropriately, test your procedure and schedule its date/time to minimize any negative outcome.

Play it safe in a data center be a hero in a pub celebrating your victory over a machine later!

Posted in Real life AIX.

Copyright © 2016 - 2017 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.