Skip to content


you are only as good as the last few weeks

Today is Friday! I am busy migrating a two node cluster from XIV to HDS SAN storage. Nothing could be easier, right? After HDS disks are zoned to the hosts WWPNs and cfgmgr takes them in and I establish what new disk comes from what HDS controller I do extend the volume group and execute the following command to get the third mirror (on the set of new disk delivered by the first HDS controller).

# mirrorvg -S -c 3 lawap_vg hdisk10 hdisk11 hdisk12 hdisk13
0516-404 allocp: This system cannot fulfill the allocation request.
There are not enough free partitions or not enough physical volumes
to keep strictness and satisfy allocation requests.  The command
should be retried with different allocation characteristics.
0516-1517 mklvcopy: Failed to create a valid partition allocation.
0516-842 mklvcopy: Unable to make logical partition copies for
logical volume.
0516-1199 mirrorvg: Failed to create logical partition copies
for logical volume u10_lv.
0516-1200 mirrorvg: Failed to mirror the volume group.

What? There is space for God’s sake. Do you see it?

# lsvg -p lawap_vg
lawap_vg:
PV_NAME   PV STATE    TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk3    active      799         400         80..00..00..160..160
hdisk5    active      15          0           00..00..00..00..00
hdisk4    active      15          0           00..00..00..00..00
hdisk7    active      799         400         65..00..15..160..160
hdisk10   active      249         249         50..50..49..50..50
hdisk11   active      249         249         50..50..49..50..50
hdisk12   active      249         249         50..50..49..50..50
hdisk13   active      249         249         50..50..49..50..50

The lslv shows that logical volumes are set to the minimum number of disks, and that they are relocatable. The chlv command comes to mind and I execute the following cludge just to be sure.

for lv in `lsvg -l lawap_vg | grep -v grep \
                    | grep jfs2 | awk '{print $1}'`
do
chlv -s n -e x  $lv
done

Next, the mirrorvg command is recalled from the shell history and it dies promptly generating as many error messages as before…..

“What is going on?” – I ask surprised. I look and read the man pages and I still cannot get it running. After a while when I am looking at the output of the lslv command while describing my predicament to my wife it suddenly hits me what I have forgotten! It is the UPPER BOUND parameter of each logical volume of this volume group! Yes Sir, this parameter value is currently smaller than the present number of disks this vg has!

I modify the previous cludge to change it to 8 to match the number of disks in the volume group.

# for lv in `lsvg -l lawap_vg | grep -v grep \
                       | grep jfs2 | awk '{print $1}'`
do
chlv -u 8 $lv
done

I am ready to do it again. This time mirroring works like it should and I smile for a while until I recognize that I have not adjusted the queue to the values HDS recommends for AIX….. Well, I will wait till mirroring is done, reduce the third mirror, remove the disks form the volume group, change their queue depths and back track back to the mirrorvg that now, I remember again how to make to execute.

Have a nice weekend!

Posted in Real life AIX.


concurrent volume groups - issues with their replications

If you do any replication of the “enhanced” concurrent volume groups using FlashCopy or ShadowImage you may have problems…. The recreatevg command may fail to bring the volume group copy into a different host (backup host) with this message:

0516-1972 varyonvg: The volume group is varied on in other node in concurrent
mode; you cannot vary on the volume group in non-concurrent mode.
Use -O flag to force varyon the volume group if needed.
0516-1320 recreatevg: Unable to recreate volume group.

If you search the FixCentral for a solution, you may find it! IBM has these APARs accordingly to http://www-01.ibm.com/support/docview.wss?uid=isg1IV41209

6100-08 – use AIX APAR IV41209
7100-02 – use AIX APAR IV40515

My cluster runs on 7.1.3 and TSM server on 6.1.9 with these APARs present but executing of the recreatevg still does not work…. Apparently these APARs not always work as IBM offers this “local fix” in the same page:

If recreatevg is being run on a different server than the source VG, and there are no name conflicts, then the following steps can be used:

# importvg -y vgname hdiskX
# varyonvg -O vgname
# varyoffvg vgname
# exportvg vgname

Then run recreatevg.

If you are familiar with the recreatevg command, you know that it automatically varies ON the “recreated” volume group. Look at the message produced by the failing recreatevg command again. Does is really applies to this or to the varyonvg command?

# recreatevg -f -y epcchePRD_vg -L /SImg -Y SImg $epcchePRD_vg_disks
0516-1972 varyonvg: The volume group is varied on in other node in concurrent
mode; you cannot vary on the volume group in non-concurrent mode.
Use -O flag to force varyon the volume group if needed.
0516-1320 recreatevg: Unable to recreate volume group.

I think that this message is poorly written. For me it suggest to use -O as the argument to the varyonvg command?
What this message really means is to use “-O” with the recreatevg like that:

# recreatevg -f -y epcchePRD_vg \
              -L /SImg -Y SImg -O $epcchePRD_vg_disks
epcchePRD_vg

It works! By the way, the variable epcchePRD_vg_disks contains a space separated list of appropriate disks.

Posted in Real life AIX.

Tagged with , , .


cannot create or extend a volume group, shadowimage fails too.....

somewhere, someone gave too much power to nmon ……. If you noticed that suddenly you cannot do what the topic of this post says check if nmon is running.
Next, kill it and repeat the command(s) that previously failed. Do they work now? I think so.

In the last few months I noticed this on few of our hosts but since I have not patched them I missed to make the connection. Yesterday, my colleague patched our DSMC backup servers to AIX 6.1.9.2 and tonight’s ShadowImage backups failed……

The following output explains everything:

# recreatevg -f -y epcdbm_vg $epcdbm_vg_disk
Method error (/usr/lib/methods/chgdisk):
0514-062 Cannot perform the requested function because the
specified device is busy.
pv
0516-1320 recreatevg: Unable to recreate volume group.

The command fails. I kill nmon

# ps -ef | grep nmon
    root  4587738 11731134   0 11:13:15  pts/2  0:00 grep nmon
    root 10748198        1   0 00:00:01      -  0:02 /usr/bin/topas_nmon  -f -T -d -A -m /var/nmon -s 180 -c 480 -youtput_dir=/var/nmon/tsmdbrpu001 -ystart_time=00:00:00,Jul25,2014
# kill -9 10748198

Let’s try to bring the vg back.

# recreatevg -f -y epcdbm_vg $epcdbm_vg_disk
epcdbm_vg

It worked! Like I said, the same will make extendvg work as well.

To resolve this new issue permanently, I modified my backup scripts as follows. Before I bring the volume groups in, my scripts executes now the following line.

kill -9 `ps -ef | grep nmon | grep -v grep | awk '{print $2}'`

At the end, just before the exist, I start nmon as follows (using the entry from root's own crontab.

/var/nmon/nmon.sh

Please let me know if you know about any other fix.

Have a good weekend!!!
:-)

Posted in Real life AIX.

Tagged with , , .


checksums …..

After downloading upgrade media for an HMC, I wanted to check consistency of the packages. During the download, I had to expand their file system and this gave me the “verify it” idea.

These are the contents I got from IBM server:

# ls -ltr
total 6219440
-rw-r--r--    1 root     system      2730176 Jul 08 06:59 bzImage
-rw-r--r--    1 root     system    817065984 Jul 08 07:20 disk1.img
-rw-r--r--    1 root     system   1456427008 Jul 08 07:58 disk2.img
-rw-r--r--    1 root     system           78 Jul 08 08:21 hmcnetworkfiles.sum
-rw-r--r--    1 root     system    873922560 Jul 08 08:21 disk3.img
-rw-r--r--    1 root     system     34185788 Jul 08 08:22 initrd.gz

The hmcnetworkfiles.sum list each file and its checksum value. See for yourself:

# cat hmcnetworkfiles.sum
02364:bzImage
06816:initrd.gz
55470:disk1.img
33312:disk2.img
45622:disk3.img

To identify checksum of these files, you will use the sum command maybe even in a loop like this one:

# for f in `cat hmcnetworkfiles.sum | awk -F ':' '{print $2}'`
do
sum $f | awk '{print $1":"$3}'
done 

Which in this case generated the following output:

02364:bzImage
06816:initrd.gz
55470:disk1.img
33312:disk2.img
45622:disk3.img

Comparing the latest with the cat hmcnetworkfiles.sum command proves that the data transferred successfully.

Posted in Real life AIX.

Tagged with , , .


map FC devices on aix box

A host with a large number of FC adapters, disks, tape drives, and so forth occasionally may put you in a difficult situation, especially when during a meeting someone asks you for “what is where”…..
The following few lines of shell can help you respond.

#!/usr/bin/ksh

# W.M.Duszyk 2/12
# map FC devices to their adapters and interfaces

for f in $(lsdev|awk '/fcs/{print $1}'); do
        printf "${f} - "
        j=$(lsdev -p $f|awk '/fscsi/{print $1}')
        echo "${j}:"
        z=$(lspath -p $j)
        if [ -n "$z" ]
         then
                 echo "$z"|awk '/rmt|smc|disk|d1/{printf "%s ", $2}END{print "\n"}'
         else
                 lsdev -p $j|awk '/rmt|smc|disk/{printf "%s ", $1}END{print "\n"}'
        fi
 done

For example:

# ./MapDev.ksh
fcs0 - fscsi0:

fcs1 - fscsi1:

fcs2 - fscsi2:

fcs3 - fscsi3: hdisk30 hdisk31 hdisk32 hdisk33 hdisk34 hdisk35 hdisk36 hdisk37 hdisk38 hdisk39 hdisk40 hdisk41 hdisk42 hdisk43 hdisk44 hdisk45 hdisk46 hdisk47 hdisk48 hdisk49 hdisk50 hdisk51 hdisk52

fcs4 - fscsi4: hdisk30 hdisk31 hdisk32 hdisk33 hdisk34 hdisk35 hdisk36 hdisk37 hdisk38 hdisk39 hdisk40 hdisk41 hdisk42 hdisk43 hdisk44 hdisk45 hdisk46 hdisk47 hdisk48 hdisk49 hdisk50 hdisk51 hdisk52

fcs5 - fscsi5:

fcs6 - fscsi6: rmt12 rmt13 rmt14 rmt15 rmt54 rmt55 rmt82 rmt83 rmt100 rmt101 rmt102 rmt103 rmt132 rmt133 rmt154 rmt155 rmt168 rmt169 rmt170 rmt171 rmt196 rmt197 rmt198 rmt199 rmt226 rmt227 rmt248 rmt249 rmt250 rmt251 rmt252 rmt253 rmt254 rmt255 rmt256 rmt257 rmt258 rmt259 rmt286 rmt287 rmt288 rmt289 smc1 smc11 smc17

fcs7 - fscsi7: rmt16 rmt17 rmt18 rmt19 rmt44 rmt45 rmt46 rmt47 rmt56 rmt57 rmt64 rmt65 rmt66 rmt104 rmt105 rmt106 rmt107 rmt134 rmt135 rmt156 rmt157 rmt172 rmt173 rmt174 rmt175 rmt200 rmt201 rmt202 rmt203 rmt228 rmt229 rmt296 rmt297 rmt298 rmt299 rmt300 rmt301 rmt302 rmt303 rmt304 rmt305 rmt306 rmt307 smc2 smc9 smc19

fcs8 - fscsi8: rmt20 rmt21 rmt22 rmt23 rmt58 rmt59 rmt67 rmt68 rmt69 rmt108 rmt109 rmt110 rmt111 rmt116 rmt117 rmt118 rmt158 rmt159 rmt176 rmt177 rmt178 rmt179 rmt204 rmt205 rmt206 rmt207 rmt230 rmt231 rmt260 rmt261 rmt262 rmt263 rmt264 rmt265 rmt266 rmt267 rmt268 rmt269 rmt270 rmt271 rmt336 rmt337 rmt338 rmt339 rmt340 rmt341 smc3 smc12 smc23

fcs9 - fscsi9: rmt24 rmt25 rmt26 rmt27 rmt60 rmt61 rmt70 rmt71 rmt72 rmt112 rmt113 rmt114 rmt115 rmt119 rmt120 rmt121 rmt140 rmt141 rmt142 rmt180 rmt181 rmt182 rmt183 rmt208 rmt209 rmt210 rmt211 rmt232 rmt233 rmt272 rmt273 rmt274 rmt275 rmt276 rmt277 rmt278 rmt279 rmt280 rmt281 rmt282 rmt283 smc4 smc13

fcs10 - fscsi10:rmt28 rmt29 rmt30 rmt31 rmt62 rmt63 rmt73 rmt74 rmt75 rmt84 rmt85 rmt86 rmt87 rmt122 rmt123 rmt124 rmt143 rmt144 rmt145 rmt184 rmt185 rmt186 rmt187 rmt212 rmt213 rmt214 rmt215 rmt234 rmt235 rmt236 rmt237 rmt238 rmt239 rmt240 rmt241 rmt242 rmt243 rmt244 rmt245 rmt246 rmt247 rmt284 rmt285 smc5 smc14 smc16

fcs11 - fscsi11: rmt32 rmt33 rmt34 rmt35 rmt48 rmt49 rmt76 rmt77 rmt88 rmt89 rmt90 rmt91 rmt125 rmt126 rmt127 rmt146 rmt147 rmt148 rmt188 rmt189 rmt190 rmt191 rmt216 rmt217 rmt218 rmt219 rmt308 rmt309 rmt310 rmt311 rmt324 rmt325 rmt326 rmt327 rmt328 rmt329 rmt330 rmt331 rmt332 rmt333 rmt334 rmt335 smc6 smc15 smc20 smc22

fcs12 - fscsi12: rmt0 rmt1 rmt2 rmt3 rmt36 rmt37 rmt38 rmt39 rmt50 rmt51 rmt78 rmt79 rmt92 rmt93 rmt94 rmt95 rmt128 rmt129 rmt136 rmt137 rmt138 rmt139 rmt149 rmt150 rmt151 rmt160 rmt161 rmt162 rmt163 rmt220 rmt221 rmt222 rmt223 rmt312 rmt313 rmt314 rmt315 rmt316 rmt317 rmt318 rmt319 rmt320 rmt321 rmt322 rmt323 smc7 smc10 smc21

fcs13 - fscsi13: rmt40 rmt41 rmt42 rmt43 rmt52 rmt53 rmt80 rmt81 rmt96 rmt97 rmt98 rmt99 rmt130 rmt131 rmt152 rmt153 rmt164 rmt165 rmt166 rmt167 rmt192 rmt193 rmt194 rmt195 rmt224 rmt225 rmt290 rmt291 rmt292 rmt293 rmt294 rmt295 smc8 smc18

Posted in Real life AIX.


mounting cifs from WIN2012

WIN2012 is not really supported….. These were the words of IBM engineer working on my PMR which I opened to find out how to mount WIN2012 shares (cifs)….. Well we can mount cifs from the previous releases of WINDOWS OS but not 2012 one. Next, he emailed me a link to Microsoft document explaining how to disable SMB signing – the reason for the failure. I was advised to follow the WIN2003 procedure with still applies to WIN2012.

You can have a look at the document following this link – “Overview of Server Message Block signing

To mount CIFS share in AIX:

# mount -v cifs -n lawisnqw1/lawsona/law1199 \
                 -o wrkgrp=wmd-edu,fmode=755 /tjtest /tjtest

aka

# mount -v cifs -n WINhost/WinUser/WinUserPassword \
                -o wrkgrp=wmd-edu,fmode=755 /ShareName /AIXmountPoint

To mount CIFS in RedHat6:

# mount.cifs //lawisnqw1/tjtest /test \
                -o username=lawsona,password=law1199,domain=wmd-edu

Now, let’s wait for Igor to get a moment of free time to disable SMB singing and check if this will put this request to bed…. :-)

Posted in Linux, Real life AIX.


Kerberos, Active Directory and ftp

It is not surprising that more and more users looks for Active Directory as the way to unify and to simplify the way to authenticate/authorize users and to save some money too.
After all Active Directory and Windows are almost in every office on this planet. Not to mention that if you work in a heavily audited environment the ability to have a single store of users definitions is really a blessing!

For about the last two months now, I am trying to get ftp working for AIX users authenticated with Kerberos services provided by Active Directory (2012).
I am not doing it alone, I have an open PMR and after numerous iptraces, snaps, and so forth my KERBEROS authenticated users still cannot use ftp…. and this sucks!
The locally (admin accounts) defined users can but the “flash & bone” users, the ones “living” in Active Directory cannot.

Is there someone out there who got this working and who is willing to share his/her knowledge, please?

Thanks,
MarkD:-)

Posted in Real life AIX.

Tagged with .


nim client removal

I have not done any patching for a while and today, when I had to remove a nim client definition I could not remember the second command to use. Now, I do so here it is for the record the process:

First, reset the client

# nim -F -o reset NIM_CLIENT_NAME

Now, remove all associated with the client resources.

# nim -o deallocate -a subclass=all  NIM_CLIENT_NAME

At this stage the client can be removed.

# nim -o remove -F  NIM_CLIENT_NAME

the NIM_CLIENT_NAME is the hostname of the client to be removed.

Posted in Real life AIX.

Tagged with , .


issues with a file system............

Last Friday evening, SAN administrator migrated disks of some hosts from one contraption to another. The effected AIX machines immediately lost their sanity and to get the file system to a usable state they all had to be rebooted. A day later, an application administrator sent out an email informing us that he has two hosts which are missing the same file system. These file systems are not “shared”, each of these two machines has its own disks (SAN delivered), the identical file system name is the only thing common to them both.

Since we follow the practice of making logical volumes and the associated with them file system names as close as they can be it was easy to figure out what is going in. Look at the output bellow:

# lsvg -l epcshreu001_vg
epcshreu001_vg:
LV NAME        TYPE   LPs  PPs     PVs  LV STATE      MOUNT POINT
epicbin_lv     jfs2   4       4    1    closed/syncd  #
engaudit_lv    jfs2   4       4    1    open/syncd    /epic/engaudit
epicprd_lv     jfs2   80      80   1    open/syncd    /epic/prd
epicjournal_lv jfs2   120     120  1    open/syncd    /epic/jrnlshde1

I decided to mount the file system using its appropriate name which was correctly represented as a stanza in /etc/filesystems. It worked like a charm :-)

# grep -p /epic/bin: filesystems
/epic/bin:
        dev             = /dev/epicbin_lv
        vfs             = jfs2
        log             = INLINE
        mount           = true
        check           = false
        options         = rw
        account         = false
# mount /dev/epicbin_lv /epic/bin 

Inspecting this file system reveled its contents were intact!. I created a new filesystem and filled it with copy of the /epic/bin, just in case. Next, the /epic/bin and the other file systems in this volume group were unmounted, volume group varied off and exported. Nect,the volume group was imported and varied on and it still showed the strange looking # instead of the file system name (executing lsvg -l epcshreu001_vg).

The syncvg -v epcshreu001_vg and the syncvg -l epicbin_lv commands were executed and nothing changed…..

Here we go again, the file systems were all unmounted, the volume group varied off and exported. Next, make copy of the /etc/filesystems and inspection of this file. We are looking for the stanza /epic/bin:. Yes, it is there! But it should not be there – every time a volume group is exported this file is updated removing the file systems stanzas associated with the exported volume group! I think, this is how it works, right?

The stanza was removed, and the volume group imported and varied on. The next execution of the lsvg -l epcshreu001_vg shows that the offending # character has been replaced with the /epic/bin as it should be! The next command (mount all) mounts all the file systems and opens them for user access.

Now, what happened here? Is this the result of the recent SAN migration. No, no, no! I do not think so. I bet my dollar, that someday in the past manual edit of the file left behind some hidden “special” character behind, which prevented AIX from removing this file system stanza…… the SAN migration and the following it reboot just happened to expose this fact and then when the second host was built (copy of the first) one the /etc/filesystems was copied too…..

Posted in Real life AIX.

Tagged with .


Power7, SMT, CPU utilization, etc

There is a lot of room for misunderstanding CPU utilization with active SMT (either 2 or 4 threads). Lately, I am in situation where not only I have to know what is going on with CPU utilization but also I have to be able to show and explain it my clients and my bosses as well.
For all of you who need to learn more about SMT and CPU utilization – check at least these two post by Mr. Nigel Griffiths, IBM.

nmon – I can’t see all the CPUs on-screen. Please Help!

nmon – new online Physical CPU Graphs arrive for latest AIX 6.1

Another reading material after comment from Rob: Power7 CPU and Virtual Processors . You may need to download this document to be able to read it (PowerPoint presentation).

Posted in Real life AIX.




© 2008-2014 www.wmduszyk.com - best viewed with your eyes.