Skip to content


removing remote print queues

This morning I got a request to remove several print queues defined in the following way:

# enq -isWA
Queue                 Dev            Status       Job Files   User  PP   %  Blks  Cp Rnk
-------------------- -------------- --------- ------ ----- --- ---
chca_printx          hp@sat20x      READY
chca_printx-c        hp@sat20x      READY
chca_printx-c-l      hp@sat20x      READY
chca_printx-l        hp@sat20x      READY
chca_printz          hp@sat20z      READY
chca_printz-c        hp@sat20z      READY
chca_printz-c-l      hp@sat20z      READY
chca_printz-l        hp@sat20z      READY

Without giving it much thought, I logged to the host and executed smitty rmpq, selected the first queue to be removed and hit the Enter key to see my action fail promptly. This made me think about the task at hand …. Looking at the last output for a while longer, I recognized what I was seeing – these are remote print queues attached to HP print servers (in this case). It took me another moment to remember that I have done something like this years ago, and that then it took me two steps to delete such print device and its queue or queues.
This time around, each HP print server has several associated with it print queues – four to be precise.

AIX has three basic commands to “manually” remove “printing devices”. They are rmque, rmqueudev and rmvirprt. The first command Removes a printer queue, the second one Removes a printer or plotter queue device, and finally the last one Removes a virtual printer.

It seemes like the natural order of action is to remove every print Printer queue attached to the Printer queue device and finally to delete the device!

# rmque -qchca_printx
# rmque -qchca_printx-c
# rmque -qchca_printx-c-l
# rmque -qchca_printx-l

The print queues removed, it is time to remove the printer device.

# rmquedev -dhp@sat20x

The same applied to the second set of queues was gone and the ticket was closed.

It is funny, but this is not the first time that I recognize the limits of memory (mine) which puts me in a philosophical mood – we are “masters” only for a short time, other then that we just pretend ….

Tomorrow, have a Nice Thanksgiving Everybody!!!

Posted in AIX, Real life AIX.

Tagged with , , , .


adding disks to concurrent volume group ……

Today, I had to add a few more disks to two concurrent volume groups (shared in a PowerHA cluster) to increase their storage capacity. Bellow are the new disks that are still waiting to be assigned. Each cluster node has the same order of disks, so all of them show the same hdisk##.

hdisk44         none                                None
hdisk45         none                                None
hdisk46         none                                None
hdisk47         none                                None
hdisk48         none                                None
hdisk49         none                                None

In AIX there is just one way to add a disk to an existing volume group – use the extendvg command. I tried it against each “new/free” disks with identical results.

RDC:/root>extendvg -f epcdbm_vg hdisk44
0516-1254 extendvg: Changing the PVID in the ODM.
The distribution of this command (168) failed on one or more passive nodes.
0516-1396 extendvg: The physical volume 00c9f275db894e19, was not found in the
system database.
0516-792 extendvg: Unable to extend volume group.
RDC:/root>

At first, I thought that this time these disks needed to be explicitly available for LVM “consumption”, so I executed the following against each of them, and I received the same response each time.

RDC:/root>chpv -v a hdisk44
0516-304 getlvodm: Unable to find device id hdisk44 in the Device
        Configuration Database.
0516-722 chpv: Unable to change physical volume hdisk44.
RDC:/root>

Having nothing to lose, i decided to the following three commands against each of them.

RDC:/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
wmd
RDC:/root>varyoffvg wmd
RDC:/root>exportvg wmd

After I am done on this node, I move to the next on in the cluster and repeat the same for each disk here too.

TKP/root>mkvg -y wmd hdisk44
0516-1254 mkvg: Changing the PVID in the ODM.
0516-1398 mkvg: The physical volume hdisk44, appears to belong to
another volume group. Use the force option to add this physical volume
to a volume group.
0516-862 mkvg: Unable to create volume group.
TKP:/root>

These messages are not only expected but welcomed – PVIDs are being propagated to each node in the cluster. Finally, I return to the node with active resources groups and execute the extendvg against each appropriate volume group followed with the list of the selected disks. This time, it works!

UPDATE:
For an alternate way of adding disks to a concurrent VG please see the comment from “ironman”.
What really makes my day is the comment from Matt – “reorgvg” can be used against concurrent volume group!!!!!! This rocks!!! Check you AIX level and if you do use CVG’s upgrade – it is worth it!!!!

Posted in Real life AIX.

Tagged with , , , .


HDS controllers, disks and how to establish which is which?

Some AIX hosts get their disks from Hitachi based SAN. Systems like USP_V do limit both their LUN size on the controller end (256GB) and the disk queue depth (32) on the AIX hdisk end – in order to provide a good I/O characteristics. For AIX hosts with rapidly growing capacity requirements the end is always the same – a large number of disks which is not a problem as long as there is no LVM mirroring. But if a host uses disks from two SAN fabrics which are mirrored by LVM and if additionally ShadowImage is involved to make backups than AIX administrator will quickly become uncomfortable ….. Follow to the next page if this sound interesting.

Posted in AIX, HDS, Real life AIX.

Tagged with , , , .



possible paging issues with AIX 6.1.7

On hosts with a specific application (EPIC/CACHE) we observed incidents of overwhelming paging which was consequently tracked to “enabled” 64kb memory pages – it happened after upgrade to AIX 6.1.7.5. All other applications (ORACLE, LAWSON, CLARITY, etc) did not show any adverse behavior and are running just fine. For the one that did “page a lot” we disabled the support for 64kb pages.

Here is the official explanation from our IBM support engineer:

There are not currently any hard and fast rules regarding the ratio of paging space to memory. The requirements change based on apps, workload, … Paging space is tuned based on the needs of the server.

Regarding your system continually running out of paging space we believe you may have hit an APAR.  IV26272: REDUCE EARLY WORKING STORAGE PAGING
http://www-01.ibm.com/support/docview.wss?uid=isg1IV26272 

An explanation and possible resolution 

The kernel parameter, numperm_global , was implemented and enabled with with AIX 6.1 TL7 SP4/7.1 TL1 SP4 to be able to look at the paging from a global perspective. That means that before AIX 6.1 TL7 SP4/7.1 TL1 SP4 the number_global tunable was not available. Unfortunately the number_global might cause in some environments early paging due to failed pincheck on 64K pages related to different size mempools. There are two possible ways to prevent the problem from happening:             

1) Disable global numperm (numperm_global=0)                                   
2) Increase the number of unpinned pages for the page size that is close to maxpin - In the previous case, the 64K page pool is close to maxpin.  Most of the pinned pages are kernel heap and there is a large number of 4K computational pages used by the customers application. Forcing the application to use 64K pages (LDR_CNTRL) will reduce the percentage pinned count for the 64K pool and therefore prevent the problem from happening.
                                                                              
Customers who are running workloads like DB2 or Oracle and follow the best practices (using 64K page size) are unlikely to experience this early paging problem. 

UPDATE:

If you want to learn more about AIX and memory pages, follow the link provided by Lonny Niederstadt from EPIC Corporation.
http://www-03.ibm.com/systems/resources/systems_p_os_aix_whitepapers_multiple_page.pdf

Thanks Lonny!

Posted in Real life AIX.


secldapclntd will not work with SSL

Once, there was AIX system which LDAP client refused to run on top of SSL. Now way, ever! AIX update did not help, LDAP software did not help, SSH/SLL upgrade did not help, GSKit patch did not help. It seems that this system was cursed.

# start-secldapclntd
Starting the secldapclntd daemon.
3001-710 SSL initialization failed. Check the SSL key path and key password in the /etc/security/ldap/ldap.cfg file.
3001-710 SSL initialization failed. Check the SSL key path and key password in the /etc/security/ldap/ldap.cfg file.
The secldapclntd daemon failed to start.

The ldapsearch command executed with SSL and a key file kept failing generating:

ldap_ssl_client_init failed! rc == -1, failureReasonCode == 804400244
Unknown SSL error

Well, here comes To Vo who says = “Mark, please execute this command:”

/opt/IBM/ldap/V6.3/bin/idslink -igl32 -f

It works, it works like a charm, thanks To Vo!

Posted in Real life AIX.

Tagged with , , , .


ANS1030E The operating system refused TSM request for memory

One AIX machines refused TSM backup of a certain file systems throwing the following messages:

ANS1999E Incremental processing of '/filesystem/name' stopped.

Followed with

ANS1030E The operating system refused a TSM request for memory allocation.

Investigation of /etc/security/limits reveled nothing unusual:

default:
        fsize = -1
        core = 2097151
        cpu = -1
        data = 262144
        rss = 65536
        stack = 65536
        nofiles = -1

Looking into the nmon and vmstat did not indicated any memory shortages neither. This host runs 64bit AIX and never had this problems before… The uptime command showed just fourteen days. I asked for a permission to reboot this machine – it may wait a few days.

Looking on-line, I found a few IBM TSM notes on this subject and with the newly gained knowledge, I implemented the following two changes.

The following line was added to the file called dms.sys

memoryefficientbackup yes

The next line was added to the file inclexcl, this is one continuous line not two like your browser may show.

INCLUDE.FS /filesystem/name MEMORYEFFICIENTBACKUP=DISKCACHEMETHOD DISKCACHELOCATION=/TSM_cache

Where the /filesystem/name is the path which dsmc previously failed to backup.

Following these two changes and refresh of the dsmc daemon, the next incremental worked like a charm.

Posted in Real life AIX.

Tagged with , , , .


AIX Support Center Tools

Do you know what zsnap is all about? What about devscan or VIOS Adviser? IBM engineer just gave me this link to the IBM SUPPORT CENTER TOOLS. Something new, something good.

Posted in Real life AIX.


replacing FC adapters …… port speed matches adapter speed?

After relocating a host from one rack into another its two 8Gb PCI Express Dual Port FC Adapters failed. They “live” in a 5208 I/O drawer attached to 8204-E8A. Each adapter lost one port – the lower one. I guess, misery loves company, right? The affected ones where fcs7 and fcs9. Both, when treated with sanscan utility (described in an earlier post) returned with the same message.

# sanscan fscsi7
sanscan v2.2
Copyright (C) 2010 IBM Corp., All Rights Reserved
Opening device /dev/fscsi7 failed with errno ENETUNREACH
Cleaning up...
Completed with error(s)

The diag routine executed with and without the “wrap plug” did not help – both adapters were declared OK. Still, I have seen bad adapters misdiagnosed by these utilities. The new fiber cables stretched from switch to both ports, done. Still the issue persists. Magic, pure magic.

Before you declare the cards bad or failed, make sure that the SAN administrator set the ports to match the speed of the attached to them FC adapters.
If adapters are 8Gbps make sure the ports are set to 8Gbps. If they are 4Gbps make sure the ports are set to 4Gbps. Otherwise you may spend more then 24 hours absolutely unnecessary suspecting the cards, waiting for their replacements and facing identical situation after the new cards are in while listening to people around you spinning tails of bad motherboards, issues with AIX, the level of your own skills and so forth ……… life can be really entertaining.

UPDATE:

a. I wrote this post after a very long session (over 24 hours) which apparently impaired my brains….. As the result, I was (wrongly) under impression that adapters and devices present in the I/O drawer attached to my host are not Hot Plug-able….. Nothing far from the TRUTH!!!!!!!. If lsdev can see them Hot Plug tasks in either smitty or diag will see them too. Yes, the contents of 5802 I/O drawers are fully hot plug-able.

b. We use Brocade switches, which support up to 16Gbps when they are LICENSED to do so! A few minutes ago, we discovered that the new switch we failed to connect to last Saturday is not licensed for 8Gbps!!!! We used the licenseshow command to verify it.

Posted in Real life AIX.

Tagged with , , , .


TDS LDAP client issues

One AIX host went through a motherboard replacement. Surprisingly, after the host was powered ON nobody could log in. The only way to do that was via HMC. It did not take a long time to determine that any commands associated with secldapclntd did not work. The following commands failed: lsldap -a passwd, lsuser -R LDAP and ls-secldapclntd.

This host communicates with TDS LDAP servers over SSL so the “key” files and their password quickly became the primary suspects. We renamed the original /etc/security/ldap/xxxxxxx.kdb" files and replaced it with a file copied from another host. This new file was renamed to match the name of the original. The secldapcnltd was restarted and any expectations of fixing this issued died quickly. LDAP still not working!

Next, the ldapsearch command was tried with the “.kdb” file and the bind user passwords in plain text.

host:RDC:/root> ldapsearch -h tdsServerName -Z \
-K /etc/security/ldap/FileName.kdb -P kdbPassword \
-D cn=bindUser -w bindUserPassword \
-b ou=People,cn=aixdata,dc=wmd,dc=edu -s sub objectclass *

The results were short of spectacular! OK, so there is something fishy about one of the passwords….. When the “key” files were created (about two years ago) the keys were set to expire in 10 years…. Could it be that the encrypted password of the bind account queering TDS LDAP servers on behalf of this host somehow stopped working? To test this hypothesis, the line in the /etc/security/ldap/ldap.cfg was changed and its content aka the “encrypted” password was replaced with its plain text version.

From this:

# LDAP server bind DN password
bindpwd:{DESv2}A3D8A8F5BCEA39 E 04599999996F2 E8F9CCC1AA3B68EC1DA

To this:

# LDAP server bind DN password
bindpwd:plaintextpassword

The secldapclntd was refreshed and the host exhibited a full LDAP functionality again. It is obvious that the existing encrypted password string is no longer accepted. To create a new one, we executed the next command with the original password

host:RDC:/root>secldapclntd -e original_password
{DESv2}744655DF EC085C3A53A5A7F436C6DC4host:RDC:/root>

The line above needs to be copied without the host:RDC/root> (which is in my case the host prompt) and pasted into the ldap.cfg file as shown next.

# LDAP server bind DN password
bindpwd:{DESv2}744655DF EC085C3A53A5A7F436C6DC4

Recycle secldapclntd after this change!

Posted in ldap, Real life AIX.

Tagged with , , .




Copyright © 2015 - 2016 Waldemar Mark Duszyk. - best viewed with your eyes.. Created by Blog Copyright.