Skip to content


AIX Technology Level update strategies

I found this very nice post on developerWorks from Brazil. For some it could be a refresh of what they already know, for the reset a nice tool to increase AIX knowledge. Please follow this link:

http://www.ibm.com/developerworks/aix/library/au-aixtlupdate/index.html

Posted in Real life AIX.

Tagged with , , , , , , .


LINUX on pOWER with mirrored boot disks

Finally, I was able to get running (REDHAT6.2/6.3) what apparently was set to run from the start. Why it did not in my case? Because, in one case I did not follow the current recommendations and in the second case I have not checked what was given.

LINUX boot mirroring will work on pOWER only if the disks are SAN or physical disks from VIO servers – no pools! Here I made a mistake and I used the storage pools from which I carved the boot disk.
The other case is so simple that it is really embarrassing; I got an already made partition where I installed LINUX. Since this was an already active partition, I have not checked the disks attributes assigned to this partition! Later, when I checked them, I could not believe my eyes; a disk on one VIO server has the correct value for its reserve_policy attribute aka no_reserve. But the disk on the other VIO server was set to single_path! Lol!!!!

Following document the changes required:

$ lsdev -dev hdisk3 -attr reserve_policy
value
single_path
$ lsmap -all
....................................................................
vhost2          U8204.E8A.10C9551-V2-C26     0x00000005

VTD                   client3_hdisk1
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk3
Physloc               U7311.D20.10054BC-P1-C08-T2-L9-L0
Mirrored              false
$ rmvdev -vtd client3_hdisk1
client3_hdisk1 deleted
$ chdev -dev hdisk3 -attr reserve_policy=no_reserve
hdisk3 changed
$ lsdev -dev hdisk3 -attr reserve_policy
value
no_reserve
$ mkvdev -vdev hdisk3 -vadapter vhost2 -dev client3_hdisk1

From this moment on, everything returned to normal. With the LINUX partition live, I took each VIO server down and all that happened to LINUX was a message like this one:

# ibmvscsi 3000001a: Virtual adapter failed rc 2!
# ibmvscsi 3000001a: error after reset

The partition stayed alive unaffected by a loss of a boot disk. During the boot, kernel correctly identified the missing disk and proceed to boot from the other disk. So except the change to the real-base from the OK prompt to 1000000 and the proper disk attributes of the “real” disk not elements of storage pools, everything else was just a peace of cake aka a cupcake 🙂

Posted in Linux, Real life AIX.

Tagged with , , , , , .


how to change partition SAN fabric access

Imagine two VIO servers, each with at least two FC adapters and a data center with two SAN “fabrics” accessed through two SAN switches. There is a number of possibilities what “fabric/switch” each FC adapter is connected to. For example:

FCa of VIOS1 could be attached to switch A while FCb of the same VIO server is attached to switch B.
FCc of VIOS2 could be attached to switch A while FCd of the same VIO server is connected to switch B.

Somebody else could have it reversed. Someone else could have both VIOS1 FC adapters attached to Switch A and the other two adapters connected with Switch B. …. . Let’s end the speculations.

What I want to show in this post is how to change the relationship between the physical FC adapter of a VIO server and its virtual FC adapter. One may ask why? Well, last Friday I logged in to a “new” VIOS server and without spending a minute or two to “learn” the “exiting method” used to connect Physical/Virtual adapters, I ended up with two partitions which virtual FC adapters were connected to the same fabric. Oops indeed! Why this is not good? If a host mirrors data, it is better that the disks/mirrors are from different fabrics so when access to one fabric is lost the volume group or groups are still on line because LVM has access to the other mirror (other fabric) – assuming of course, that each volume group quorum requirement is OFF.

Posted in AIX, Linux.

Tagged with , , , , , , .


Linux on steroids aka pSeries, openLDAP and TDS

I am getting more involved with RedHat. As the new RedHat “lpars” are being built the host based authentication is again starting to show its ugly side.
It is important that all our operating systems: AIX, Windows and LINUX not only use the same logins and the same passwords but also that the passwords attributes are consistent across all platforms. The last makes any security audits a real breeze.

I opened a PMR hoping to get some help configuring IBM TDS client on LINUX. I think, because I was too busy with other issues and probably too relaxed (just returned from two weeks in USVI) – I did not escalated it with the duty manager and after a week of a miserable emails between me and the engineer in charge of my PMR, I decided to drop it. Close it please, I said.

I had to work over the last weekend and having long breaks in between (dictated by schedule), I decided to poke around with the hope to getting LINUX and my LDAP with AD pass-through to cooperate. The following shows the procedure that allowed me to successfully configure LINUX openLDAP client to cooperate with IBM TDS LDAP server configured with pass-through authentication against Active Directory.

Posted in AIX, ldap, Linux.

Tagged with , , , , , , .


what mirror (side) is stale?

It is not just a fruit of a bored mind, stuff like this really happens …..
Someone tells you that there are/were issues with one of the SAN fabrics that a particular AIX host gets its LUNs/disks from and now the application users complain about its slowness. You cannot just syncvg to get rid of the STALE partitions as the “problematic” SAN fabric is still undergoing “repairs”. By the way, nobody around can tell you which of these SAN fabric is “sick” ……. You may be sitting at the keyboard wandering why did you get out of bed today? Well, it is not really as bad as you think.

There are at least two methods to determine which side of a mirror is STALE. There is the command mirscan which is capable of much more than finding the disks with the STALE partitions. I have not really used it as it is slow because of the amount the “stuff” it does and the amount of info it processes. The one that is fast and simple is the one you already know, it is lslv but for this purpose this command has to be slightly modified.
To identify disks with STALE partitions, execute lspv -M -L hdisk?.

# lspv -M -L hdisk57
hdisk57:1       u80_lv:17736:2  stale
hdisk57:2       u80_lv:17748:2  stale
hdisk57:3       u80_lv:17760:2  stale
hdisk57:4       u80_lv:17772:2  stale
hdisk57:5       u80_lv:17784:2  stale
hdisk57:6       u80_lv:17796:2  stale

To unmirror, to remove the side that contains the STALE partitions, execute unmirrorvg -c ? vg_name hdisk? hdisk?? hdisk??? ..... using all the appropriate disks known to contain the not so fresh partitions – they are the STALE mirror. If you find out that more than one mirror (side) in a volume group contain STALE partitions you may start looking for backups or call IBM for support.

Posted in Real life AIX.

Tagged with , , , , .


fixing paths in mpio environment

A SAN fabric event may disable a path or paths to one or more SAN disks. In case of a pure mpio environment, the process to re-enable the missing (disabled) paths is demonstrated bellow.

First, to locate any failed paths. If you want to list all disks which path issues, you could execute the lspath | grep hdisk | grep -v Enabled command and scan for any Missing ones. The output, will include disk or disks in any of the following states: disabled, failed, defined, missing, etc.

To display paths of a specific disks, execute:

# lspath -l hdisk87 -H -F"name parent path_id connection status"
name    parent path_id connection                     status

hdisk87 fscsi0 0       50060e8005bec311,e000000000000 Failed
hdisk87 fscsi1 1       50060e8005bec321,e000000000000 Enabled

The path to hdisk87 over the fscsi0 failed. How to enable it again?

A disk path could be enabled at least using two techniques. Once calls for a removal of the failed path and then and re-scanning of FC adapters 9executing the command cfgmgr). The following illustrates this process:

# rmpath -dl hdisk87 -p fscsi0 -w 50060e8005bec311,e000000000000
path Deleted
# lspath -l hdisk87 -H -F"name parent path_id connection status"
name    parent path_id connection                     status

hdisk87 fscsi1 1       50060e8005bec321,e000000000000 Enabled

# cfgmgr

# lspath -l hdisk87 -H -F"name parent path_id connection status"
name    parent path_id connection                     status

hdisk87 fscsi0 0       50060e8005bec311,e000000000000 Enabled
hdisk87 fscsi1 1       50060e8005bec321,e000000000000 Enabled

You could also try, to enable the Failed path directly as shown next:

# lspath -l hdisk87 -H -F"name parent path_id connection status"
name    parent path_id connection                     status

hdisk87 fscsi0 0       50060e8005bec311,e000000000000 Enabled
hdisk87 fscsi1 1       50060e8005bec321,e000000000000 Failed

# chpath -s enabled -l hdisk87 -p fscsi1
paths Changed

# lspath -l hdisk87 -H -F"name parent path_id connection status"
name    parent path_id connection                     status

hdisk87 fscsi0 0       50060e8005bec311,e000000000000 Enabled
hdisk87 fscsi1 1       50060e8005bec321,e000000000000 Enabled

Posted in Real life AIX.

Tagged with , , , .


NUM_PARALLEL_LPS for AIX and for PowerHA

Well, the vacations are over and I am back at the workbench……. . Last weekend, we were moving a cluster from one data center to another. from one network to another, from one set of SAN switches to another. All not only ended well, but proved to be yet another opportunity to learn something new.

The physical relocation completed, some volume groups varied on with the STALE stale partitions, which detected by cluster daemons immediately resulted in the cluster executing the syncvg -P 4 ...... command against the appropriate volume groups.

For me the number of LPS being “synced” in parallel was too small; it is a matter of fact that I (for some unknown even to me reason) always set the parallelism value to 24 (-P 24). For each machine I have root login, I put this entry in its /etc/profile:

export NUM_PARALLEL_LPS=24

This line gives my a peace of mind; whenever the syncvg command is executed (by me or someone else) the parallelism aka the number of LPs processed (synchronized) in parallel by the syncvg command will be set to 24 (unless it has been explicitly set to a different value). By the way, I think that the maximum value is 32.

Looking into the contents of the /etc/profile on the cluster nodes and not finding my entry I put it in.

At the next cluster POWER ON, two volume groups still had the STALE partitions and regardless of export NUM_PARALLEL_LPS=24 in the /etc/profile file the syncvg command was executed with -P 4 instead.

As in this case, Google was not really helpful, I had to start looking for the answer somewhere else. Looking into the file /var/hacmp/log/hacmp.out, I discovered why the entry I placed inside the /etc/profile files does not work – because PowerHA does not look into this file for the answer! It looks into the other one instead! Here is the excerpt, read for yourself:

This logical volume has stale partitions, so sync it. Doing 4 stale partitions at a time seems to be a win most of the time. However, we will honor the NUM_PARALLEL_LPS value in /etc/environment, if set. 

As you can see, PowerHA (HACMP) peeks inside the /etc/environment to find out if the NUM_PARALLEL_LPS has been preset, otherwise the syncvg command will be executed with NUM_PARALLEL_LPS=4.

To make both LVM and HA happy, one could place the entry defining the required value of this variable in the file /etc/environment. This way AIX will load this variable into the environment and HA will be able to find it too. Remember that variable placed in /etc/environment do not require the export statement, so the simple assignment is enough: NUM_PARALLEL_LPS=24.

UPDATE: if a cluster volume group is mirrored with any of the two SYNC options (background/foreground), you have no way to control the speed of “mirroring” – the command mirrorvg does not have this option. So with this in mind, mirror without the “SYNC” and then at a later time execute the syncvg against the appropriate volume group. Doing it this way, the sync will proceed using the value of NUM_PARALLEL_LPS defined in the file /etc/environment .

Posted in Real life AIX.

Tagged with , , .


an empty file system with no free capacity ….?

it sounds like an oxymoron, doesn’t it? I saw it once many years ago. To day, a colleague of mine noticed it on one of his machines.
Look bellow, the /epic/sup07 has a very little free space left.

/epic/sup07/ifc/stream> df -g .
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/sup07_lv      5.00      0.01  100%        9     1% /epic/sup07

After a closer inspection, we can locate only a pair of sub-directories, all empty.

/epic/sup07/ifc/stream> cd ../../
/epic/sup07> ls -l
total 0
drwxr-sr-x    3 epicadm  cachegrp        256 May 11 09:24 dcifc
drwxr-sr-x    3 epicadm  cachegrp        256 May 11 09:24 ifc
drwxr-xr-x    2 root     system          256 Nov 14 10:44 lost+found
/epic/sup07> du -ak . | sort -nr | more

Yes, there is nothing here to see for us but there still may be something there like for example open files ….
We leave the file system so our presence does not distort it and execute the lsof command against it.

 /epic/sup07> cd

/root> lsof /epic/sup07
In while loop:256
Value of I :61   np:256
COMMAND   PID  USER   FD   TYPE DEVICE   SIZE/OFF NODE NAME
cache 10289344 lyko cwd VDIR 37,39 256 2 /epic/sup07(/dev/sup07_lv)
ksh 11993330 epicadm cwd VDIR 37,39 256 2 /epic/sup07(/dev/sup07_lv)
dsmc 146 root 10 VREG 37,39 5360353284129/epic/sup07(/dev/sup07_lv)
ksh 17694934 lyko cwd VDIR 37,39 256 2 /epic/sup07(/dev/sup07_lv)

Yes, there is a huge file in /epic/sup07 created by the dsmc command. After a further investigation and determination that no active backup/restore takes place we kill the offending process. By the way, in the last output we shortened the PID of the dsmc in order to format the output to fit the screen. Here we go, killing the process.

/root> kill -9 14618974

Now, is there the free space or not?

/root> df -g /epic/sup07
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/sup07_lv      5.00      4.98    1%        8     1% /epic/sup07

Well, we got back what was/is rightfully ours 🙂

Posted in Real life AIX.

Tagged with , , .


10 GB Ethernet adapters – the missing details

Today, I was asked to verify firmware level of the new 10GB Ethernet adapters we recently installed in our TSM servers. Without much thinking, the following command was executed:

lscfg -vl ent4
  ent4             U78A0.001.DNWHPY1-P1-C2-T1  10 Gigabit Ethernet Adapter (ct3)

        Network Address.............00145E9952AE
        Displayable Message.........10 Gigabit Ethernet Adapter (ct3)

Wooo, a lot of information is missing. The firmware version is one of them….
To get the missing information, the last command has to be modified slightly…..

lscfg -vpl ent4
  ent4             U78A0.001.DNWHPY1-P1-C2-T1  10 Gigabit Ethernet Adapter (ct3)

        Network Address.............00145E9952AE
        Displayable Message.........10 Gigabit Ethernet Adapter (ct3)


  PLATFORM SPECIFIC

  Name:  ethernet
    Node:  ethernet@0
    Device Type:  network
    Physical Location: U78A0.001.DNWHPY1-P1-C2-T1

Well, still the required information is missing….. Let’s try something else:

lscfg -vp | grep -p "10 Gigabit"
  hba0             U78A0.001.DNWHPY1-P1-C2-T1                                    10 Gigabit Ethernet-SR PCI-Express Host Bus Adapter (2514300014108c03)

      10 Gigabit Ethernet-SR PCI Express Adapter:
        EC Level....................D76809
        FRU Number..................46K7897
        Part Number.................46K7897
        Manufacture ID..............1037
        Feature Code/Marketing ID...5769
        Serial Number...............YL11212300B0
        Network Address.............00145E9952AE
        ROM Level.(alterable).......RR0120
        Hardware Location Code......U78A0.001.DNWHPY1-P1-C2-T1

  ent4             U78A0.001.DNWHPY1-P1-C2-T1                                    10 Gigabit Ethernet Adapter (ct3)

        Network Address.............00145E9952AE
        Displayable Message.........10 Gigabit Ethernet Adapter (ct3)

Finally, we got it all!

Posted in AIX, Real life AIX.

Tagged with , , , .


command line editing pains con’t ……

For example, if a previous command line had the following shape:

crfs -v jfs2 -d u50_lv -A yes -a log=INLINE -m /u50

how would you change all 5’s into 6’s so we could re-execute it in the following way:

crfs -v jfs2 -d u60_lv -A yes -a log=INLINE -m /u60

Assuming that we are working in the ksh shell and the history mechanism has already been activated (set -o vi), we would recall the first line executing (for example) Esc/u50 so the command line reads:

crfs -v jfs2 -d u50_lv -A yes -a log=INLINE -m /u50

Next, we have to enter what I call the vi file edit mode, hitting the Esc-v key combination, which automatically opens vi editor on a temporary file which contents are the recalled entry:

crfs -v jfs2 -d u50_lv -A yes -a log=INLINE -m /u50

To change every 5 into 6, we execute

Esc:%/5/6/g

followed with

Esc:wq!

and this is all.

Esc represent the appropriate key of the keyboard.

Posted in Real life AIX.




Copyright © 2016 - 2017 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.