Skip to content

Processors, cores, sockets, etc ….

Number of processors in AIX refers to the number of cores, never sockets or chip modules. However, it’s hard to tell from your output whether it’s referring to logical processors, virtual processors, or physical processors unless you can tell me what command you actually used and whether you ran it from an LPAR with only some of the resources, whether SMT is turned on, and if virtual processors are used.
Take a look bellow, maybe you will find answers to your questions among these commands and their output.

The command lsdev -Cc processor will show the number of physical processors (or virtual processors) in a shared processor LPAR.

# lsdev -Cc processor
proc0 Available 00-00 Processor
proc2 Available 00-02 Processor

The command lparstat -i shows the virtual and logical processors.

# lparstat -i | grep CPU
Online Virtual CPUs : 2
Maximum Virtual CPUs : 15
Minimum Virtual CPUs : 1
Maximum Physical CPUs in system : 2
Active Physical CPUs in system : 2
Active CPUs in Pool : 2
Physical CPU Percentage : 25.00%

The command topas -L shows logical processors but mpstat shows virtual ones.
SMT thread processors are seen with bindprocessor

# bindprocessor -q
The available processors are: 0 1 2 3

The next command also delivers SMT information.

# lsattr -El proc0
frequency 1498500000 Processor Speed False
smt_enabled true Processor SMT enabled False
smt_threads 2 Processor SMT threads False
state enable Processor state False
type PowerPC_POWER5 Processor type False

The next two commands also deliver CPU based information.

# lscfg -v | grep -i proc
Model Implementation: Multiple Processor, PCI bus
proc0 Processor
proc2 Processor
# prtconf | pg
System Model: IBM,9111-520
Machine Serial Number: 10EE6FE
Processor Type: PowerPC_POWER5
Number Of Processors: 2
Processor Clock Speed: 1499 MHz
CPU Type: 64-bit
Kernel Type: 64-bit

Posted in AIX, Real life AIX.

Tagged with , , , , .

A New Year’s Resolution?

Today, on my schedule was a simple task of relocating cluster resources (its volume groups and the service address) from one cluster node to another. The application administrator decided to shut it down ahead of time to speed up the fail over – it was lunch time and we wanted to do whatever was possible to shorten this event. A few minutes later, I synced cluster resources from each of its nodes – just for good measure – I thought while doing it.
The relocation of cluster resources started as planned. To track its progress, on the “target” node I executed the command tail –f /var/hacmp/log/hacmp.out. It did not take long for failure messages to show up on the screen of the “source” node. It smitty screen displayed the following lines:

Command: failed        stdout: yes           stderr: no

Before command completion, additional instructions may appear below.
Attempting to move resource group EpicTrain_RG to node epctrtpu001.
Waiting for the cluster to process the resource group movement request....
Waiting for the cluster to stabilize.......................

ERROR: Event processing has failed for the requested resource group movement.  The cluster is unstable and requires manual intervention to continue processing.

Woo! What is going on here? It was already impossible to access the source node – a sure indication that the cluster service address has already been removed (IP alias) from the source node network interface. I opened a new terminal session and logged in using the “routable” IP alias placed on the source node “boot” adapter. Executing the command lsvg –o showed that one volume group was still active – the error message did not lie, I thought. The next command df showed a mounted file system. Executing lsvg –l command with the name of the active volume group proved that the mounted file systems belong to this group.

Looking at the situation, I developed the following plan; I will unmount the stubborn file systems, vary their volume group off and on the “target” node, I will manually import this volume group and next, its automatic ability to come on-line (vary on) will be disabled and the group will be varied off. Finally, all cluster nodes will be re-booted, synchronized and cluster services will be started on all of them and the relocation will be tried one more time.
Content with the plan, I executed the umount command against the first offending file system. It took a while for this command to fail – “something” or someone was using it. It was time to behave like the master of this cluster. No more mercy, it is getting late and I am getting really hungry. The following snippet, executed from the command line was employed to un-mount the file systems.

# for fs in `lsvgfs modupgrade_vg`
> do
> fuser -kuxc $fs;umount $fs
> done
/epic/redalert:        1c(root) 2293912c(root) 2359472r(root) 2490554c(root) 2621560c(root) 2883634r(root) 3408060r(root) 3670132r(root) 3735672r(root) 3801208r(root) 3866746r(root)

As soon as I hit the last “Enter” the screen showed some processes being killed (courtesy of the fuser -kuxc command) and the screen froze. Yes, this node was going for a reboot! What is going on here? Wait a moment, the screen on the other node started to change too. The /var/hacmp/log/hacmp.out log on the target node came alive – the resources relocation finally started!
After a short while the cluster service IP address became available again, and the application administrator logged in to tend to her application. It did not take long for Lana to call me – “Mark, not all file systems survived this relocation”. This is going to be one late lunch indeed; I logged on and started to look around. Both nodes showed the same volume groups and identical count of file systems. How then the target node did not have the same file systems as the source?

Have you ever wondered, how does the df command work? We can speculate that it reconciles information delivered via the “lsvg” command (logical volume names, their sizes and the size of the physical partition) with the stanzas present in the file /etc/filesystems, which tie together logical volumes with file systems and their attributes. A file system name is defined by the stanza label, its logical volume and any additional attributes are contained in the stanza body. If you take this reasoning a bit further, you may see that the same logical volume may be “mounted” at different occasions using different file system names aka stanza labels aka mount points (directories).
Try it for yourself. Un-mount a file system, create a new mount point with the mkdir command, replace the label of the appropriate stanza in the /etc/filesystem file and mount using the new directory name. It works, right?

The file /etc/filesystems has different dates and sizes on each of my nodes; their contents are not identical. Are you thinking what I’m thinking? Using vi, I compared the contents of these files on each cluster node. It did not take long to realize that the “missing” file systems are not really missing but that they are associated with a different mount points – the were renamed. Look bellow. On one node has it as /epic/rtlupg02” but on the other has it as “/epic/rlsupg22”.
Source node example entry:

dev             = /dev/trone02_lv
vfs             = jfs2
log             = INLINE
mount           = false
check           = false
account         = false

Target node example entry:

dev             = /dev/trone02_lv
vfs             = jfs2
log             = INLINE
mount           = false
check           = false
account         = false

Shortly later, it became obvious that not only that the target’s node /etc/filesystems is different but also that this node does not have the directories (mount points) associated with the “missing” file systems. It became obvious that some files were renamed on the source node and that this information has not been propagated to the remaining nodes in the cluster. To recover was not difficult. Un-mount the selected file systems, create new mount points and adjust their ownership, edit appropriate stanzas, mount the re-named file systems aka the “missing” ones, call application administrator to verify and to start the application.

Now it is time to answer this question that everybody has in mind. What did happen? I cannot say why the particular volume group of the resource group did not move to the target node. I have no idea why the operating system on the source node was not able to un-mount its file systems and consequently vary this volume group off. This mystery forever belongs to the 1% of computer science commonly known as witchcraft or maybe this was just my karma?
On the other hand, the mystery of the missing file systems is not a mystery at all. Somehow, someone using AIX LVM instead of PowerHA LVM renamed a number of file systems on the source node and as the result the /etc/filesystems on the both nodes were no longer the same. But I do not have the luxury of pondering this issue. The line is quite long; reset a few passwords, install two lpars, expand a file system, find out why two people have apparently identical login and so forth.

A few hours later, while on a train going home I suddenly experienced a sort of spiritual awakening. Suddenly, the memories returned and I knew exactly why the /etc/filesystems from the few hours ago were different. It was me! Yes, it was me! In an early October, Sandy came and soaked us with heavy rain. It also brought with it heavy winds that destroyed some trees which on their way down to the ground took with them aerial fiber links connecting us with our data centers. As the result one of these data centers was effectively isolated and unreachable for a several days. To provide computer services to users, we broke clusters and their mirrors and started provided services from the nodes in the available data center. In the case of this particular cluster, a few days later its application administrators requested to rename some file systems. Without contact with the “remote” node, the local LVM was employed to answer this request. After the fact, I made a mental note to myself to reconcile the contents of /etc/fileystems as soon as all nodes join the cluster. After the connectivity was restored nobody wanted to go for another downtime and fully synchronize and verify the clusters. We were all very happy with restored access to all data centers, all hosts accounted for and all clusters up and running again. The memories of previous intentions and the things to remember completely faded away.

This story shows that in order to work with a lesser amount of stress, sysadmin must be better organized. I have to develop and follow a system that will help me keep track of everything that was put aside to be done at a later date.

What immediately comes to mind is an edit of ~/.profile file with a colored echo statement stating what has to be done and when. Afterwards, every time I login a colorful banner will remind me about it.

Posted in Real life AIX.

VIOS upgrade pitfalls – unintended

I do not know for how many, but for a lot of us VIOS servers (at least two per a frame) deliver disks to their clients (partitions/lpars) which use them as the rootvg disks. Upgrading VIOS it is easy to forget about this fact which ends in the client partitions demise.

What I am referring to, is the fact that to activate server OS upgrade, you have to reboot it and as the result it’s clients will loose the access to disks provided by the VIOS server being rebooted! Its partitionsr rootvg will become stale as Limburger cheese, well almost.
If you still did not recognize this fact, and proceed to upgrade of the second VIOS then guess what is going to happen a few seconds later after this VIOS gets its reboot? A pure stink!
At this time, all client partitions (lpar) smell really bad and so do you not to mentioned you may perspire more then before the upgrade ….. But, do not worry – you have a good company! I almost did it to myself today.

So what should you do after reboot of the first VIOS server. Well, start with verifying that its OS level is what you wanted – execute the ioslevel command. Satisfied with the results, login to its each and every client partition and establish what dump device (primary/secondary) is delivered by the just upgraded VIOS – disable the appropriate dump device (if in doubts, disable both).

To disable the Primary dump device:

# sysdumpdev -P -p /dev/sysdumpnull

To disable the Secondary dump device:

#  sysdumpdev -P -s /dev/sysdumpnull

Next, execute the following few commands:

# lsvg -p rootvg
# varyonvg -bu rootvg
# varyonvg rootvg
# lsvg -p rootvg

Wait for the syncvg processes to do their job and at the end enable the dump devices:

# sysdumpdev -P -p /dev/dump0
# sysdumpdev -P -s /dev/dump1

By the way, remember that mine and your dump devices may not be called the same.

But what will happen if you attempt to varyonvg a volume group containing an active dump device?

0516-1774 varyonvg: Cannot varyon volume group with an active dump device on a missing physical volume. Use sysdumpdev to temporarily replace the dump device with /dev/sysdumpnull and try again.

Posted in Real life AIX, VIO.

Tagged with , , , , , .

firmware upgrade, serviceable events and HMC

The manual recommends to close all “serviceable” events before performing a firmware upgrade. If there are a few events, then their “closing” can easily be done from the HMC GUI using the Manage Serviceable Events task (click on wrench icon within HMC GUI to access the task). You can use the task to view service events and select one or multiple events to close.
However, the easiest and quickest way to close many service events is from the HMC command line using following command.

# chsvcevent -o closeall

Follow this link recommended by Sebastian (see comments) – very cool indeed!

Posted in Real life AIX.

Tagged with , , .

printing to a specific paper tray in AIX

It can easy be accomplished executing command called qrpt.
Bellow, I am sending to tray 1 (-u1) of the printer associated with the print queue called myPrintQueue a file defined by its path and name as /tmp/

/usr/bin/qprt -u1 -PmyPrintQueue /tmp/

Be aware that in most cases the destination print queue must mach the type (PostScript, ASCI, PCL, ….) of the file to be printed or nothing will print.

Posted in AIX, Real life AIX.

Tagged with , , .

removing and validating removal of IP aliases

A few weeks ago, as the result of hurricane Sandy bad behavior I had to manually bring on-line cluster resources which normally are controlled by PowerHA.
In order to provide access to data and services I varied on the cluster volume groups, mounted their file systems and added (as an alias) the service address to network interface.
Yesterday, was the time to return everything back to its normal state.

The task at hand was not really difficult. After the application will be shut down the file systems will be un-mounted, volume groups varied off and exported from this node and imported, varied on on the other node in the cluster. Then these volume groups will be changed so they are not automatically varied on after the host boots.
Next, the service IP alias will have to be removed from the interfaces, the cluster services started on all nodes, the cluster synced and verified and finally on the selected node the cluster resources will be activated.

Have you ever removed (via smitty) a default route, changed it and shortly later recognized that now your host have two default routes? Well, it happened to me.
Have you ever removed (via smitty) IP alias and still see it “sitting” there on the interace? Well, it happen to me.

So the most burning question for me was this one “how to determine” what a host thinks about IP aliases – do I have one or more or not?
AIX host has a single source of all its knowledge which is its ODM. To really know the answer to the last question means to query a host ODM database for the presence of IP alias definition (repeat against each suspected interface).

# odmget CuAt | grep -p en0
        name = "en0"
        attribute = "netaddr"
        value = ""
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 4

        name = "en0"
        attribute = "netmask"
        value = ""
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 8

        name = "en0"
        attribute = "state"
        value = "up"
        type = "R"
        generic = "DU"
        rep = "sl"
        nls_index = 5

        name = "en0"
        attribute = "alias4"
        value = ","
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 0

        name = "en0"
        attribute = "alias4"
        value = ","
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 0

The answers delivered executing the command ifconfig cannot really be trusted in the case at hand. It may show nothing (no address) while ODM still believes and insists that there is an alias or aliases associated with a specific network interface.
ODM always wins. If ODM says that an interface has an alias (either type 4 or 6) the command chdev will have to be used to remove it. See the line bellow.

chdev -l en0 -a delalias4=,

Posted in Real life AIX.

edit files in place – remove lines

A day after Thanksgiving – a turkey sandwich for breakfast consumed already, another one waiting on my desk to be eaten for lunch 🙂 …..

As our workload increases, we try to “automate” as much as possible – I am not exception. Today, “magic” brought me a ticket to remove user accounts from a Lawson server. For this application it means to remove users entries from the /etc/passwd (which exist there just for the sake of the Lawson application) followed with removal of users definitions from LDAP repository (in my case).

The file containing the list of users to be remove is called terminate.list and it had the following format:

BondJ    950078   James  Bond
GoldM    937840   Gold Member
MarkD    937079   Mark Duszyk

It contained a considerable number of entries – I either had to find out how to “automate” this process or do it all by hand 🙁

A while ago, I had a post about “in-place” file edits. Then, I was interested to “find & replace” operation which really is not far away from “find and delete”. So, based on the previously gained knowledge the following snipped was created to answer today’s needs.

cp /etc/passwd ~duszyk/PASSWORD

for user in `tr ‘[A-Z]’ ‘[a-z]’ < terminate.list | awk '{print $1}'` do rmuser -R LDAP $user vi +%g/$user/d +wq ~duszyk/PASSWORD done[/code] Instead of modifying the /etc/passwd, the file is copied to my home directory under the new name of PASSWORD. Next, it contents are converted to the “lower” case and first token from each line (login name) is extracted and stored in the variable user.
The rmuser -R LDAP $user” command purges LDAP repository – removes the user definition.
The next line calls vi to perform a global (%g) search and delete (d) of login (name) represented by variable $user.

After this script executes, I will inspect the PASSWORCD and if satisfied, I will copy it to /etc/passwd.

Next, time you have to find/delete a number of entries in a text file – remember that this task does not need to be done manually!

Posted in Real life AIX.

Tagged with , , , , .

removing remote print queues

This morning I got a request to remove several print queues defined in the following way:

# enq -isWA
Queue                 Dev            Status       Job Files   User  PP   %  Blks  Cp Rnk
-------------------- -------------- --------- ------ ----- --- ---
chca_printx          hp@sat20x      READY
chca_printx-c        hp@sat20x      READY
chca_printx-c-l      hp@sat20x      READY
chca_printx-l        hp@sat20x      READY
chca_printz          hp@sat20z      READY
chca_printz-c        hp@sat20z      READY
chca_printz-c-l      hp@sat20z      READY
chca_printz-l        hp@sat20z      READY

Without giving it much thought, I logged to the host and executed smitty rmpq, selected the first queue to be removed and hit the Enter key to see my action fail promptly. This made me think about the task at hand …. Looking at the last output for a while longer, I recognized what I was seeing – these are remote print queues attached to HP print servers (in this case). It took me another moment to remember that I have done something like this years ago, and that then it took me two steps to delete such print device and its queue or queues.
This time around, each HP print server has several associated with it print queues – four to be precise.

AIX has three basic commands to “manually” remove “printing devices”. They are rmque, rmqueudev and rmvirprt. The first command Removes a printer queue, the second one Removes a printer or plotter queue device, and finally the last one Removes a virtual printer.

It seemes like the natural order of action is to remove every print Printer queue attached to the Printer queue device and finally to delete the device!

# rmque -qchca_printx
# rmque -qchca_printx-c
# rmque -qchca_printx-c-l
# rmque -qchca_printx-l

The print queues removed, it is time to remove the printer device.

# rmquedev -dhp@sat20x

The same applied to the second set of queues was gone and the ticket was closed.

It is funny, but this is not the first time that I recognize the limits of memory (mine) which puts me in a philosophical mood – we are “masters” only for a short time, other then that we just pretend ….

Tomorrow, have a Nice Thanksgiving Everybody!!!

Posted in AIX, Real life AIX.

Tagged with , , , .

adding disks to concurrent volume group ……

Today, I had to add a few more disks to two concurrent volume groups (shared in a PowerHA cluster) to increase their storage capacity. Bellow are the new disks that are still waiting to be assigned. Each cluster node has the same order of disks, so all of them show the same hdisk##.

hdisk44         none                                None
hdisk45         none                                None
hdisk46         none                                None
hdisk47         none                                None
hdisk48         none                                None
hdisk49         none                                None

In AIX there is just one way to add a disk to an existing volume group – use the extendvg command. I tried it against each “new/free” disks with identical results.

RDC:/root>extendvg -f epcdbm_vg hdisk44
0516-1254 extendvg: Changing the PVID in the ODM.
The distribution of this command (168) failed on one or more passive nodes.
0516-1396 extendvg: The physical volume 00c9f275db894e19, was not found in the
system database.
0516-792 extendvg: Unable to extend volume group.

At first, I thought that this time these disks needed to be explicitly available for LVM “consumption”, so I executed the following against each of them, and I received the same response each time.

RDC:/root>chpv -v a hdisk44
0516-304 getlvodm: Unable to find device id hdisk44 in the Device
        Configuration Database.
0516-722 chpv: Unable to change physical volume hdisk44.

Having nothing to lose, i decided to the following three commands against each of them.

RDC:/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
RDC:/root>varyoffvg wmd
RDC:/root>exportvg wmd

After I am done on this node, I move to the next on in the cluster and repeat the same for each disk here too.

TKP/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1398 mkvg: The physical volume hdisk44, appears to belong to
another volume group. Use the force option to add this physical volume
to a volume group.
0516-862 mkvg: Unable to create volume group.

These messages are not only expected but welcomed – PVIDs are being propagated to each node in the cluster. Finally, I return to the node with active resources groups and execute the extendvg against each appropriate volume group followed with the list of the selected disks. This time, it works!

For an alternate way of adding disks to a concurrent VG please see the comment from “ironman”.
What really makes my day is the comment from Matt – “reorgvg” can be used against concurrent volume group!!!!!! This rocks!!! Check you AIX level and if you do use CVG’s upgrade – it is worth it!!!!

Posted in Real life AIX.

Tagged with , , , .

HDS controllers, disks and how to establish which is which?

Some AIX hosts get their disks from Hitachi based SAN. Systems like USP_V do limit both their LUN size on the controller end (256GB) and the disk queue depth (32) on the AIX hdisk end – in order to provide a good I/O characteristics. For AIX hosts with rapidly growing capacity requirements the end is always the same – a large number of disks which is not a problem as long as there is no LVM mirroring. But if a host uses disks from two SAN fabrics which are mirrored by LVM and if additionally ShadowImage is involved to make backups than AIX administrator will quickly become uncomfortable ….. Follow to the next page if this sound interesting.

Posted in AIX, HDS, Real life AIX.

Tagged with , , , .

Copyright © 2016 - 2018 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.