Skip to content

VIOS upgrade pitfalls – unintended

I do not know for how many, but for a lot of us VIOS servers (at least two per a frame) deliver disks to their clients (partitions/lpars) which use them as the rootvg disks. Upgrading VIOS it is easy to forget about this fact which ends in the client partitions demise.

What I am referring to, is the fact that to activate server OS upgrade, you have to reboot it and as the result it’s clients will loose the access to disks provided by the VIOS server being rebooted! Its partitionsr rootvg will become stale as Limburger cheese, well almost.
If you still did not recognize this fact, and proceed to upgrade of the second VIOS then guess what is going to happen a few seconds later after this VIOS gets its reboot? A pure stink!
At this time, all client partitions (lpar) smell really bad and so do you not to mentioned you may perspire more then before the upgrade ….. But, do not worry – you have a good company! I almost did it to myself today.

So what should you do after reboot of the first VIOS server. Well, start with verifying that its OS level is what you wanted – execute the ioslevel command. Satisfied with the results, login to its each and every client partition and establish what dump device (primary/secondary) is delivered by the just upgraded VIOS – disable the appropriate dump device (if in doubts, disable both).

To disable the Primary dump device:

# sysdumpdev -P -p /dev/sysdumpnull

To disable the Secondary dump device:

#  sysdumpdev -P -s /dev/sysdumpnull

Next, execute the following few commands:

# lsvg -p rootvg
# varyonvg -bu rootvg
# varyonvg rootvg
# lsvg -p rootvg

Wait for the syncvg processes to do their job and at the end enable the dump devices:

# sysdumpdev -P -p /dev/dump0
# sysdumpdev -P -s /dev/dump1

By the way, remember that mine and your dump devices may not be called the same.

But what will happen if you attempt to varyonvg a volume group containing an active dump device?

0516-1774 varyonvg: Cannot varyon volume group with an active dump device on a missing physical volume. Use sysdumpdev to temporarily replace the dump device with /dev/sysdumpnull and try again.

Posted in Real life AIX, VIO.

Tagged with , , , , , .

firmware upgrade, serviceable events and HMC

The manual recommends to close all “serviceable” events before performing a firmware upgrade. If there are a few events, then their “closing” can easily be done from the HMC GUI using the Manage Serviceable Events task (click on wrench icon within HMC GUI to access the task). You can use the task to view service events and select one or multiple events to close.
However, the easiest and quickest way to close many service events is from the HMC command line using following command.

# chsvcevent -o closeall

Follow this link recommended by Sebastian (see comments) – very cool indeed!

Posted in Real life AIX.

Tagged with , , .

printing to a specific paper tray in AIX

It can easy be accomplished executing command called qrpt.
Bellow, I am sending to tray 1 (-u1) of the printer associated with the print queue called myPrintQueue a file defined by its path and name as /tmp/

/usr/bin/qprt -u1 -PmyPrintQueue /tmp/

Be aware that in most cases the destination print queue must mach the type (PostScript, ASCI, PCL, ….) of the file to be printed or nothing will print.

Posted in AIX, Real life AIX.

Tagged with , , .

removing and validating removal of IP aliases

A few weeks ago, as the result of hurricane Sandy bad behavior I had to manually bring on-line cluster resources which normally are controlled by PowerHA.
In order to provide access to data and services I varied on the cluster volume groups, mounted their file systems and added (as an alias) the service address to network interface.
Yesterday, was the time to return everything back to its normal state.

The task at hand was not really difficult. After the application will be shut down the file systems will be un-mounted, volume groups varied off and exported from this node and imported, varied on on the other node in the cluster. Then these volume groups will be changed so they are not automatically varied on after the host boots.
Next, the service IP alias will have to be removed from the interfaces, the cluster services started on all nodes, the cluster synced and verified and finally on the selected node the cluster resources will be activated.

Have you ever removed (via smitty) a default route, changed it and shortly later recognized that now your host have two default routes? Well, it happened to me.
Have you ever removed (via smitty) IP alias and still see it “sitting” there on the interace? Well, it happen to me.

So the most burning question for me was this one “how to determine” what a host thinks about IP aliases – do I have one or more or not?
AIX host has a single source of all its knowledge which is its ODM. To really know the answer to the last question means to query a host ODM database for the presence of IP alias definition (repeat against each suspected interface).

# odmget CuAt | grep -p en0
        name = "en0"
        attribute = "netaddr"
        value = ""
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 4

        name = "en0"
        attribute = "netmask"
        value = ""
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 8

        name = "en0"
        attribute = "state"
        value = "up"
        type = "R"
        generic = "DU"
        rep = "sl"
        nls_index = 5

        name = "en0"
        attribute = "alias4"
        value = ","
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 0

        name = "en0"
        attribute = "alias4"
        value = ","
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 0

The answers delivered executing the command ifconfig cannot really be trusted in the case at hand. It may show nothing (no address) while ODM still believes and insists that there is an alias or aliases associated with a specific network interface.
ODM always wins. If ODM says that an interface has an alias (either type 4 or 6) the command chdev will have to be used to remove it. See the line bellow.

chdev -l en0 -a delalias4=,

Posted in Real life AIX.

edit files in place – remove lines

A day after Thanksgiving – a turkey sandwich for breakfast consumed already, another one waiting on my desk to be eaten for lunch 🙂 …..

As our workload increases, we try to “automate” as much as possible – I am not exception. Today, “magic” brought me a ticket to remove user accounts from a Lawson server. For this application it means to remove users entries from the /etc/passwd (which exist there just for the sake of the Lawson application) followed with removal of users definitions from LDAP repository (in my case).

The file containing the list of users to be remove is called terminate.list and it had the following format:

BondJ    950078   James  Bond
GoldM    937840   Gold Member
MarkD    937079   Mark Duszyk

It contained a considerable number of entries – I either had to find out how to “automate” this process or do it all by hand 🙁

A while ago, I had a post about “in-place” file edits. Then, I was interested to “find & replace” operation which really is not far away from “find and delete”. So, based on the previously gained knowledge the following snipped was created to answer today’s needs.


cp /etc/passwd ~duszyk/PASSWORD

for user in `tr '[A-Z]' '[a-z]' < terminate.list | awk '{print $1}'` do rmuser -R LDAP $user vi +%g/$user/d +wq ~duszyk/PASSWORD done Instead of modifying the /etc/passwd, the file is copied to my home directory under the new name of PASSWORD. Next, it contents are converted to the "lower" case and first token from each line (login name) is extracted and stored in the variable user.
The rmuser -R LDAP $user" command purges LDAP repository - removes the user definition.
The next line calls vi to perform a global (%g) search and delete (d) of login (name) represented by variable $user.

After this script executes, I will inspect the PASSWORCD and if satisfied, I will copy it to /etc/passwd.

Next, time you have to find/delete a number of entries in a text file - remember that this task does not need to be done manually!

Posted in Real life AIX.

Tagged with , , , , .

removing remote print queues

This morning I got a request to remove several print queues defined in the following way:

# enq -isWA
Queue                 Dev            Status       Job Files   User  PP   %  Blks  Cp Rnk
-------------------- -------------- --------- ------ ----- --- ---
chca_printx          hp@sat20x      READY
chca_printx-c        hp@sat20x      READY
chca_printx-c-l      hp@sat20x      READY
chca_printx-l        hp@sat20x      READY
chca_printz          hp@sat20z      READY
chca_printz-c        hp@sat20z      READY
chca_printz-c-l      hp@sat20z      READY
chca_printz-l        hp@sat20z      READY

Without giving it much thought, I logged to the host and executed smitty rmpq, selected the first queue to be removed and hit the Enter key to see my action fail promptly. This made me think about the task at hand …. Looking at the last output for a while longer, I recognized what I was seeing – these are remote print queues attached to HP print servers (in this case). It took me another moment to remember that I have done something like this years ago, and that then it took me two steps to delete such print device and its queue or queues.
This time around, each HP print server has several associated with it print queues – four to be precise.

AIX has three basic commands to “manually” remove “printing devices”. They are rmque, rmqueudev and rmvirprt. The first command Removes a printer queue, the second one Removes a printer or plotter queue device, and finally the last one Removes a virtual printer.

It seemes like the natural order of action is to remove every print Printer queue attached to the Printer queue device and finally to delete the device!

# rmque -qchca_printx
# rmque -qchca_printx-c
# rmque -qchca_printx-c-l
# rmque -qchca_printx-l

The print queues removed, it is time to remove the printer device.

# rmquedev -dhp@sat20x

The same applied to the second set of queues was gone and the ticket was closed.

It is funny, but this is not the first time that I recognize the limits of memory (mine) which puts me in a philosophical mood – we are “masters” only for a short time, other then that we just pretend ….

Tomorrow, have a Nice Thanksgiving Everybody!!!

Posted in AIX, Real life AIX.

Tagged with , , , .

adding disks to concurrent volume group ……

Today, I had to add a few more disks to two concurrent volume groups (shared in a PowerHA cluster) to increase their storage capacity. Bellow are the new disks that are still waiting to be assigned. Each cluster node has the same order of disks, so all of them show the same hdisk##.

hdisk44         none                                None
hdisk45         none                                None
hdisk46         none                                None
hdisk47         none                                None
hdisk48         none                                None
hdisk49         none                                None

In AIX there is just one way to add a disk to an existing volume group – use the extendvg command. I tried it against each “new/free” disks with identical results.

RDC:/root>extendvg -f epcdbm_vg hdisk44
0516-1254 extendvg: Changing the PVID in the ODM.
The distribution of this command (168) failed on one or more passive nodes.
0516-1396 extendvg: The physical volume 00c9f275db894e19, was not found in the
system database.
0516-792 extendvg: Unable to extend volume group.

At first, I thought that this time these disks needed to be explicitly available for LVM “consumption”, so I executed the following against each of them, and I received the same response each time.

RDC:/root>chpv -v a hdisk44
0516-304 getlvodm: Unable to find device id hdisk44 in the Device
        Configuration Database.
0516-722 chpv: Unable to change physical volume hdisk44.

Having nothing to lose, i decided to the following three commands against each of them.

RDC:/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
RDC:/root>varyoffvg wmd
RDC:/root>exportvg wmd

After I am done on this node, I move to the next on in the cluster and repeat the same for each disk here too.

TKP/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1398 mkvg: The physical volume hdisk44, appears to belong to
another volume group. Use the force option to add this physical volume
to a volume group.
0516-862 mkvg: Unable to create volume group.

These messages are not only expected but welcomed – PVIDs are being propagated to each node in the cluster. Finally, I return to the node with active resources groups and execute the extendvg against each appropriate volume group followed with the list of the selected disks. This time, it works!

For an alternate way of adding disks to a concurrent VG please see the comment from “ironman”.
What really makes my day is the comment from Matt – “reorgvg” can be used against concurrent volume group!!!!!! This rocks!!! Check you AIX level and if you do use CVG’s upgrade – it is worth it!!!!

Posted in Real life AIX.

Tagged with , , , .

HDS controllers, disks and how to establish which is which?

Some AIX hosts get their disks from Hitachi based SAN. Systems like USP_V do limit both their LUN size on the controller end (256GB) and the disk queue depth (32) on the AIX hdisk end – in order to provide a good I/O characteristics. For AIX hosts with rapidly growing capacity requirements the end is always the same – a large number of disks which is not a problem as long as there is no LVM mirroring. But if a host uses disks from two SAN fabrics which are mirrored by LVM and if additionally ShadowImage is involved to make backups than AIX administrator will quickly become uncomfortable ….. Follow to the next page if this sound interesting.

Posted in AIX, HDS, Real life AIX.

Tagged with , , , .

possible paging issues with AIX 6.1.7

On hosts with a specific application (EPIC/CACHE) we observed incidents of overwhelming paging which was consequently tracked to “enabled” 64kb memory pages – it happened after upgrade to AIX All other applications (ORACLE, LAWSON, CLARITY, etc) did not show any adverse behavior and are running just fine. For the one that did “page a lot” we disabled the support for 64kb pages.

Here is the official explanation from our IBM support engineer:

There are not currently any hard and fast rules regarding the ratio of paging space to memory. The requirements change based on apps, workload, … Paging space is tuned based on the needs of the server.

Regarding your system continually running out of paging space we believe you may have hit an APAR.  IV26272: REDUCE EARLY WORKING STORAGE PAGING 

An explanation and possible resolution 

The kernel parameter, numperm_global , was implemented and enabled with with AIX 6.1 TL7 SP4/7.1 TL1 SP4 to be able to look at the paging from a global perspective. That means that before AIX 6.1 TL7 SP4/7.1 TL1 SP4 the number_global tunable was not available. Unfortunately the number_global might cause in some environments early paging due to failed pincheck on 64K pages related to different size mempools. There are two possible ways to prevent the problem from happening:             

1) Disable global numperm (numperm_global=0)                                   
2) Increase the number of unpinned pages for the page size that is close to maxpin - In the previous case, the 64K page pool is close to maxpin.  Most of the pinned pages are kernel heap and there is a large number of 4K computational pages used by the customers application. Forcing the application to use 64K pages (LDR_CNTRL) will reduce the percentage pinned count for the 64K pool and therefore prevent the problem from happening.
Customers who are running workloads like DB2 or Oracle and follow the best practices (using 64K page size) are unlikely to experience this early paging problem. 


If you want to learn more about AIX and memory pages, follow the link provided by Lonny Niederstadt from EPIC Corporation.

Thanks Lonny!

Posted in Real life AIX.

Copyright © 2015 - 2016 Waldemar Mark Duszyk. - best viewed with your eyes.. Created by Blog Copyright.