Skip to content


ANS1030E The operating system refused TSM request for memory

One AIX machines refused TSM backup of a certain file systems throwing the following messages:

ANS1999E Incremental processing of '/filesystem/name' stopped.

Followed with

ANS1030E The operating system refused a TSM request for memory allocation.

Investigation of /etc/security/limits reveled nothing unusual:

default:
        fsize = -1
        core = 2097151
        cpu = -1
        data = 262144
        rss = 65536
        stack = 65536
        nofiles = -1

Looking into the nmon and vmstat did not indicated any memory shortages neither. This host runs 64bit AIX and never had this problems before… The uptime command showed just fourteen days. I asked for a permission to reboot this machine – it may wait a few days.

Looking on-line, I found a few IBM TSM notes on this subject and with the newly gained knowledge, I implemented the following two changes.

The following line was added to the file called dms.sys

memoryefficientbackup yes

The next line was added to the file inclexcl, this is one continuous line not two like your browser may show.

INCLUDE.FS /filesystem/name MEMORYEFFICIENTBACKUP=DISKCACHEMETHOD DISKCACHELOCATION=/TSM_cache

Where the /filesystem/name is the path which dsmc previously failed to backup.

Following these two changes and refresh of the dsmc daemon, the next incremental worked like a charm.

Posted in Real life AIX.

Tagged with , , , .


AIX Support Center Tools

Do you know what zsnap is all about? What about devscan or VIOS Adviser? IBM engineer just gave me this link to the IBM SUPPORT CENTER TOOLS. Something new, something good.

Posted in Real life AIX.


replacing FC adapters …… port speed matches adapter speed?

After relocating a host from one rack into another its two 8Gb PCI Express Dual Port FC Adapters failed. They “live” in a 5208 I/O drawer attached to 8204-E8A. Each adapter lost one port – the lower one. I guess, misery loves company, right? The affected ones where fcs7 and fcs9. Both, when treated with sanscan utility (described in an earlier post) returned with the same message.

# sanscan fscsi7
sanscan v2.2
Copyright (C) 2010 IBM Corp., All Rights Reserved
Opening device /dev/fscsi7 failed with errno ENETUNREACH
Cleaning up...
Completed with error(s)

The diag routine executed with and without the “wrap plug” did not help – both adapters were declared OK. Still, I have seen bad adapters misdiagnosed by these utilities. The new fiber cables stretched from switch to both ports, done. Still the issue persists. Magic, pure magic.

Before you declare the cards bad or failed, make sure that the SAN administrator set the ports to match the speed of the attached to them FC adapters.
If adapters are 8Gbps make sure the ports are set to 8Gbps. If they are 4Gbps make sure the ports are set to 4Gbps. Otherwise you may spend more then 24 hours absolutely unnecessary suspecting the cards, waiting for their replacements and facing identical situation after the new cards are in while listening to people around you spinning tails of bad motherboards, issues with AIX, the level of your own skills and so forth ……… life can be really entertaining.

UPDATE:

a. I wrote this post after a very long session (over 24 hours) which apparently impaired my brains….. As the result, I was (wrongly) under impression that adapters and devices present in the I/O drawer attached to my host are not Hot Plug-able….. Nothing far from the TRUTH!!!!!!!. If lsdev can see them Hot Plug tasks in either smitty or diag will see them too. Yes, the contents of 5802 I/O drawers are fully hot plug-able.

b. We use Brocade switches, which support up to 16Gbps when they are LICENSED to do so! A few minutes ago, we discovered that the new switch we failed to connect to last Saturday is not licensed for 8Gbps!!!! We used the licenseshow command to verify it.

Posted in Real life AIX.

Tagged with , , , .


TDS LDAP client issues

One AIX host went through a motherboard replacement. Surprisingly, after the host was powered ON nobody could log in. The only way to do that was via HMC. It did not take a long time to determine that any commands associated with secldapclntd did not work. The following commands failed: lsldap -a passwd, lsuser -R LDAP and ls-secldapclntd.

This host communicates with TDS LDAP servers over SSL so the “key” files and their password quickly became the primary suspects. We renamed the original /etc/security/ldap/xxxxxxx.kdb" files and replaced it with a file copied from another host. This new file was renamed to match the name of the original. The secldapcnltd was restarted and any expectations of fixing this issued died quickly. LDAP still not working!

Next, the ldapsearch command was tried with the “.kdb” file and the bind user passwords in plain text.

host:RDC:/root> ldapsearch -h tdsServerName -Z \
-K /etc/security/ldap/FileName.kdb -P kdbPassword \
-D cn=bindUser -w bindUserPassword \
-b ou=People,cn=aixdata,dc=wmd,dc=edu -s sub objectclass *

The results were short of spectacular! OK, so there is something fishy about one of the passwords….. When the “key” files were created (about two years ago) the keys were set to expire in 10 years…. Could it be that the encrypted password of the bind account queering TDS LDAP servers on behalf of this host somehow stopped working? To test this hypothesis, the line in the /etc/security/ldap/ldap.cfg was changed and its content aka the “encrypted” password was replaced with its plain text version.

From this:

# LDAP server bind DN password
bindpwd:{DESv2}A3D8A8F5BCEA39 E 04599999996F2 E8F9CCC1AA3B68EC1DA

To this:

# LDAP server bind DN password
bindpwd:plaintextpassword

The secldapclntd was refreshed and the host exhibited a full LDAP functionality again. It is obvious that the existing encrypted password string is no longer accepted. To create a new one, we executed the next command with the original password

host:RDC:/root>secldapclntd -e original_password
{DESv2}744655DF EC085C3A53A5A7F436C6DC4host:RDC:/root>

The line above needs to be copied without the host:RDC/root> (which is in my case the host prompt) and pasted into the ldap.cfg file as shown next.

# LDAP server bind DN password
bindpwd:{DESv2}744655DF EC085C3A53A5A7F436C6DC4

Recycle secldapclntd after this change!

Posted in ldap, Real life AIX.

Tagged with , , .


Integrating Red Hat with Active Directory

This morning, I found this publication showing everything required for a successful integration with AD – including time synchronization, DNS, Samba setup and so forth.
If you are branching to LINUX or just have to support LINUX in addition to AIX then this document may help you.

Integrating Red Hat with Active Directory

Posted in Linux.

Tagged with , , , , , , .


a nice WMWARE link

My friend Adam is into WMWARE.Today, he sent me an email with this link, which he highly recommends. If you are “into” VMWARE this could be something for you too.

http://vsphere-land.com/

My friend Tony, recommends this link

http://planet.vsphere-land.com/

Posted in Real life AIX.

Tagged with .


any issues upgrading to AIX 6.1.7.5?

Recently, we upgraded two machines to AIX 6.1.7.5 and both experienced incidents of very heavy paging to the point that they had to be rebooted – we are not sure that there is any relation between this upgrade and paging. Has anybody else experienced the same?
If so, please let me know – we have to patch few extremely “important” machines and we do not want to complicate our lives.

Thanks!!!!

Update:

so far it looks like TL7SP5 automatically turns ON support for 64kb memory pages. You can check it executing vmstat –P ALL. To disable this feature, execute vmo -r -o vmm_mpsize_support=0, agree to update multibos and reboot the system.

Posted in Real life AIX.


when ShadowImage pairs do not cooperate

After a while, a long while indeed we had to move our ShadowImage based backup process to another server. When I say server, I mean the host with active horcm processes i.e. the host which owns the P-VOLS.
After a few modifications to our backup scripts we were ready to go. We queried the disks status and were immediately surprised – the pairs were not in the same state!

pairdisplay -g epicdb_SI1 -CLI -fcx -IM0
Group   PairVol L/R   Port# TID  LU-M   Seq# LDEV# P/S Status % P-LDEV# M
epicdb_SI1 000A-0055 L   CL1-A-1  0 2 0  29386 a P-VOL PAIR  99 55 -
epicdb_SI1 000A-0055 R   CL3-B-0  0 0 0  29386 55 S-VOL PAIR  99  a -
epicdb_SI1 000D-0056 L   CL1-A-1  0 3 0  29386 d P-VOL PAIR  99 56 -
.....................................................................
epicdb_SI1 00A5-00AB L   CL1-A-1  0 19 0  29386 a5 P-VOL COPY 94 ab -
epicdb_SI1 00A5-00AB R   CL3-B-2  0 15 0  29386 ab S-VOL COPY 94 a5 -
epicdb_SI1 00A6-00AC L   CL1-A-1  0 20 0  29386 a6 P-VOL COPY 94 ac -
epicdb_SI1 00A6-00AC R   CL3-B-2  0 16 0  29386 ac S-VOL COPY 94 a6 -
epicdb_SI1 00A7-00AD L   CL1-A-1  0 21 0  29386 a7 P-VOL COPY 94 ad -
epicdb_SI1 00A7-00AD R   CL3-B-2  0 17 0  29386 ad S-VOL COPY 94 a7 

The pairs were not in the same state and nobody could say why it is like that. The decision was made to split the whole consistency group.

# pairsplit -g epicdb_SI1 -d 000A-0055 -IM0

# pairdisplay -g epicdb_SI1 -CLI -fcx -IM0
Group   PairVol L/R   Port# TID  LU-M   Seq# LDEV# P/S Status % P-LDEV# M
epicdb_SI1 000A-0055 L   CL1-A-1  0 2 0  29386  a P-VOL PSUS  99 55 W
epicdb_SI1 000A-0055 R   CL3-B-0  0 0 0  29386 55 S-VOL COPY  99  a -
epicdb_SI1 000D-0056 L   CL1-A-1  0 3 0  29386  d P-VOL PAIR  99 56 -
epicdb_SI1 000D-0056 R   CL3-B-0  0 1 0  29386 56 S-VOL PAIR  99  d -
epicdb_SI1 0013-0057 L   CL1-A-1  0 4 0  29386 13 P-VOL PAIR  99 57 -
epicdb_SI1 0013-0057 R   CL3-B-0  0 2 0  29386 57 S-VOL PAIR  99 13 -
.....................................................................
epicdb_SI1 00A5-00AB L   CL1-A-1  0 19 0  29386 a5 P-VOL COPY 94 ab -
epicdb_SI1 00A5-00AB R   CL3-B-2  0 15 0  29386 ab S-VOL COPY 94 a5 -
epicdb_SI1 00A6-00AC L   CL1-A-1  0 20 0  29386 a6 P-VOL COPY 94 ac -
epicdb_SI1 00A6-00AC R   CL3-B-2  0 16 0  29386 ac S-VOL COPY 94 a6 -
epicdb_SI1 00A7-00AD L   CL1-A-1  0 21 0  29386 a7 P-VOL COPY 94 ad -
epicdb_SI1 00A7-00AD R   CL3-B-2  0 17 0  29386 ad S-VOL COPY 94 a7 

As you see, the split worked for some but not all disk pairs. Something is definitely rotten in the state Denmark (do not get me wrong, I love the state of Denmark, for me Copenhagen is the most beautiful city ever).

But the way, trying to split pairs one by one (as shown bellow) did not change the situation neither.

# pairsplit -g epicdb_SI1 -d 000D-0056 -IM0
# pairsplit -g epicdb_SI1 -d 0013-0057 -IM0

In the last two cases HORCM produced the following messages:

pairsplit: [EX_CMDRJE] An order to the control/command device was rejected. Refer to the command log(/HORCM/log0/horcc_marcoPolo.log) for details. It was rejected due to SKEY=0x05, ASC=0x26, SSB=0xB9A1,0x2310 on Serial#(XXXXX)

It was SAN administrator idea to split the pair using the -S – apparently it was time to stop being a nice AIX administrator that I always am. This execution will delete disk pairs, which meant (at least to us) that all observed inconsistencies will be gone too. At the end the disks should show SMPL as their MODE.

pairsplit -g epicdb_SI1 -S -IM0

The following execution of pairdisply command shows all disks in SMPL mode – their basic state at which there are no PAIRs. Now, the PAIRs must be re-created executing the paircrate command.

paircreate -g epicdb_SI1 -m grp -vl -IM0

At this moment we were very happy campers indeed. Little we knew that the next scheduled backup will destined to fail due to recreatevg fussing about one of the disks not being a member of the same volume group as the rest of them ….. It was a very short night indeed, little sleep but a lot of stress – this is the life I have chosen, and I love it!
:-)

Posted in Real life AIX.


to prevent an error from showing in errorlog

Sometimes the contents of the error log are not really errors but still they are interpreted as ones – it is in the error log so it must be an error. Right? Well, for a lot of analyst this is true for a lot of admins not always. Today, one machine error log showed a number of entries defined by

LABEL:          DR_DMA_MIGRATE_FAIL
IDENTIFIER:     4DA8FE60
.................................
Description
Memory related DR operation failed

These messages generated a lot of excitement today …. Verifying firmware, FC adapters settings and a few others “things” not to mention calling for IBM support we determined beyond any doubt that they represent no harm. It is time to turn the off with the errupdate command. What follows is self-explanatory.

# errupdate
=4DA8FE60: 
Report=False 
<Ctrl-D>
<Ctrl-D>

If you change your mind and want to log these messages again do the following:

# errupdate
-4DA8FE60:
<Ctrl-D>
<Ctrl-D>

Posted in Real life AIX.

Tagged with , .





Copyright © 2015 - 2016 Waldemar Mark Duszyk. - best viewed with your eyes.. Created by Blog Copyright.