Skip to content

Methods for Identifying Memory Leaks in AIX Systems

The previous post is the result of Adi’s question about a particular Oracle process memory (virtual) consumption. Next, “ironman” suggested using “workloads” to answer such question. This resulted in my brain generating the following thought – “is it possible to use workload(s) to restrict consumption of resources” to prevent the “not enough memory to fork a process” message from showing up again? Well, you know what the implication of this message, right? For you who has never seen it the message indicates that it could already be too late to login and all that you can do is reboot the box ……….
Well, we gona try it with Adi sometime next week – I will let you know how did it work. By the way, today on a test LAWSON server, I noticed paging space utilized at 63% (suddenly, and growing) with JAVA being the main offender so this “host/application component” could become a second candidate for using a workload to arrest unlimited resources consumption by an application?

While searching for workload wisdom, I found another little gem which is responsible for the title of this post. I found a document authored by two IBMers Barry J. Saad and Harold R. Lee titled “Methods for Identifying Memory Leaks in AIX Systems“. This document not only sheds light on the heap and memory allocation techniques but it provides tools that every AIX administrator can use to identify the presence of a “leaky” application. The simplicity of this document in comparison to the built-in difficulty and complexity of its scope is simply mind boggling. It takes two subject matter masters to write a technical document that even a preschooler is able to follow and to understand – no kidding.

By the way, the authors use ps -gv pvid command to obtains the current value (under the SIZE colum) of the virtual memory (RAM + PAGING) used by a process. This value is expressed in units of KB not pages!

Posted in AIX, WLM.

Tagged with , , , , , .

aix process memory consumption – how much does it use?

It is Friday night, and I am back to the We used to have two boxers, one was white and many mistook Kajtek for an american bulldog…. I am looking at the dogs waiting for an adoption trying to come with an excuse I could use to convince my wife. I already tried “this is the best intruder detection “system” which does not a require a backup power supplies ……” – I failed short. She just gave me the look number FIVE!

While I am looking at the available pooches, Adi “reaches and touches” me via the MS Communicator; “Mark, are you there?”. He has some issues with a database server and ask me few questions about memory consumption. “Do you know if DSMC agent really should be using “so and so” memory? Where the “so and so” refers to a specific number expressed in MBs. Well, to be honest I have no idea. So we chat for a few more minutes, I kill some processes and then to make the long story short I offer to reboot this machine which I can do at will since it is not a production one – low level testing server which currently see no usage. Adi agrees and I reboot the host, text him good night and I get out and into my bed.

The question remains, how to establish memory usage of a single process? The most difficult aspect of this question is to remember that the value found is expressed in pages! So if you now the process ID (the number in the second column generated executing ps -ef | grep particular_process_name) you can find its memory consumption like that:

svmon -P 17367068 | more
Pid Command   Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
17367068 java   440648     9786        0   435928      Y     Y     N

PageSize         Inuse        Pin       Pgsp    Virtual
s    4 KB        31080       1450          0      26360
m   64 KB        25598        521          0      25598

The same could be rephrased or its subject extended to for example – “what are this system the five largest memory consumers”? There are at least two ways to answer this question.

>svmon -Pt5 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}'
Pid Command   Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
17367068 java  442000  9786      0   435928      Y     Y     N
13893728 java  423720  9663      0   417884      Y     Y     N
17957084 java  383216  9220      0   379021      Y     Y     N
10158108 java  219702  8445      0   218189      Y     Y     N
15990974 kulagent 215901  8380   0    44549      Y     Y     N

Yes, I do not use the previous command as I am old and my memory is not what is used to be….. I use this instead.

svmon -Pt5 | grep -p Pid
Pid Command   Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
17367068 java 441251  9786        0   435927      Y     Y     N

Pid Command   Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
13893728 java 422329  9663        0   417884      Y     Y     N

Pid Command   Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
17957084 java 384788  9220        0   379021      Y     Y     N

Pid Command   Inuse      Pin  Pgsp  Virtual 64-bit Mthrd  16MB
10158108 java 219710 8445        0   218189      Y     Y     N

Pid Command   Inuse      Pin  Pgsp  Virtual 64-bit Mthrd  16MB
15990974 kulagent  215932 8380        0    44549      Y     Y     N

The output is not as compact but at least the command is easy to remember. If you know about the other ways and you feel like doing so, please left a comment.
By the way, if you are interested in finding the five top most users of your host paging space, execute svmon -Pgt 5.


In one of the comments, ironman suggested using the script that is part of the perfpmr package (free downloads from and/or the workload manager to accurately defined the amount of memory used. I downloaded and executed the script and and the amount of information its generates left me speechless and scratching my head – I do not think it can be used to list usage of a specific process. On the other hand, I am about to believe that ironman’s second suggestion is more of what I am looking for and need. You may start learning about workloads (if you are like me and are new to workloads) reading Nigel Griffith’s presentation titled “Setting up AIX Workload Manager in 30 minutes“. There is also a RedBook wholly dedicated to this subject titled “AIX 5L Workload Manager (WLM)”.

Posted in Real life AIX.

heartbeat networks and PowerHA

HACMP ver.5.4.1 introduced a new type of heartbeat network – the multi-node one. The preceding sentence gives reason for the following question: what are the differences and why now we have a choice between the two – the traditional and the multi-node heartbeat network?

The following illustration answers both questions.

There is a cluster with four (for example) nodes in an environment with two SAN fabrics. In the traditional heartbeat network, each node needs four disks to communicate with its neighbors – each node has to have read/write access to four disks (one per fabric (for redundancy), two disks per neighbor, four disks for two neighbors.).
The multi-node heartbeat requires only two disks shared (one from each fabric) by all the nodes in the cluster. There are definitely less administrative efforts and physical resources required with this network. There is one important point to consider before choosing this network type. One needs to be concern with utilization of the disk sets underlying the luns created for the multi-node network. If these disk-sets are used a lot then the LUNs could be slow and the number of lost heartbeats undermines the whole reason for this network….. In the traditional heartbeat network each of its disks (luns) is only used by two nodes. In the multi-node network, each heartbeat disk is used by all the nodes = more traffic on this disks…
The multi-node heartbeat network requires at least 32MB disk configured in an enhanced concurrent volume group with one uniquely named logical volume. The traditional network requires only disks – no logical volumes need to be present. Finally, both types do not require dedicated disks, still this is the preferred way.

Shop Amazon’s New Kindle Fire

Posted in HACMP, Real life AIX.

Tagged with , , , , .

hints, tips and usage of the instfix command

I found this “gem” a few days ago, and for my own good I decided to copy and re-post it here. This is re-post or IBM TechNote Ref#T1011859, definitely worth reading especially during the yearly os-upgrade cycle. For your and mine convenience:

usage of the instfix command
Hints, Tips and usage of the ‘instfix’ command
This document will describe many of the various and most common uses of the ‘instfix’ command.

The main topics covered will include:

– TL verses ML – Which is correct?
– Usage of the ‘instfix’ command to check for APARs
– Usage of the ‘instfix’ command to install APARs
– Adding missing APAR information to the ‘fix’ object class of the ODM

Posted in Real life AIX.

Tagged with , , , , , .

creating multi-node disk heartbeats with smitty cl_manage_mndhb

This post shows how to set a multi-node Disk Heartbeat – smitty cl_manage_mndhb. What is the difference between the traditional and the multi-node disk heartbeat? The first one is a “network” between two nodes where one disk is shared between only two nodes. The second one allows one disk to be shared by multiple nodes. The second one requires creation of a logical volume. The first one does not need any volumes. The principle of the “single point of failure” still applies – it is not a good idea to have only one mult-inode heartbeat disk in a cluster.

Executing this shortcut, administrator is presented with a screen allowing the following choices:

Create a new Volume Group and Logical Volume for Multi-Node Disk Heartbeat
Add a Concurrent Logical Volume for Multi-Node Disk Heartbeat
Show Volume Groups in use for Multi-Node Disk Heartbeat
Stop using a Volume Group for Multi-Node Disk Heartbeat
Configure failure action for Multi-Node Disk Heartbeat Volume Groups

It is peculiar that the first two options imply creation of a logical volume…. after creating these entities using the standard method does not require me/you to create a logical volume.

I tried this option today, and I have to say that I really like it as it is a simpler one which does all in a single step. But I failed the very first time I did it. Looking at the ouput form smitty it very quickly became apparent why. See for yourself – the error message:

Error executing mklv -y mndhb_lv_01 -u 1 -c 1 -e m -t jfs -v n -w n -r n mndhb_vg_01 32 hdisk2 on node #####

The last line shows a request to create a logical volume which size equal 1 x 32 = 32MB. What size is the disk (hdisk2) I specified for this action?

bootinfo -s hdisk2

This disk is 20MB. How it is possible to fit a 32MB logical volume in a 20MB physical disk? It is not possible!!! So for me to get this show running, I had to ask SAN administrator to “expand” this and the LUN from another SAN controller (another fabric) to 40MB (I like this “round” number). Next after chvg -g vg_name to make AIX aware of the new disk size, I could finish what I intended to do.
From now, I have to remember to always ask for 40MB LUNs if they are intended to be used as the “multi-node” heartbeat disks.

Posted in HACMP, Real life AIX.

Tagged with , , , , , .

working with “strangely” named files

Today, I decided to put all my JPEG files into one flash drive – I got a “picture frame”!!!! There was just one problem. My camera creates files using the same schema in sub-directory named using the current date. So today, I have to finally spend some time and rename all these files stored in directories so they are uniquely named. To cut the suspense, and to let you know what I mean look bellow. Do you see what I mean by.

-rw-r--r--    1 mduszyk  staff 2706165 Mar 19 2011  Picture 201.jpg
-rw-r--r--    1 mduszyk  staff 2783445 Mar 19 2011  Picture 202.jpg
-rw-r--r--    1 mduszyk  staff 2480088 Mar 19 2011  Picture 203.jpg
-rw-r--r--    1 mduszyk  staff 2553840 Mar 19 2011  Picture 204.jpg
-rw-r--r--    1 mduszyk  staff 2476612 Mar 19 2011  Picture 205.jpg
-rw-r--r--    1 mduszyk  staff 2572827 Mar 19 2011  Picture 206.jpg

It could be because of the early Sunday morning. I mean really early one – just pour my first cup of coffee. Without much thinking (the body is a the keyboard but the brain is still in bed) I typeL

for file in `ls | awk '{print $1}'`
        mv $file aaa$fille

Before the same hand that just hit the ENTER key moves to grab the coffee cup the eyes catch AIX spitting back garbage of the pretty much following format.

ls: 0653-341 The file Picture does not exist.
ls: 0653-341 The file 202.jpg does not exist.

Of course, the rename does not work and nothing happen. So what is going on here? Apparently there is “some character between the Picture and the following it number. For ls command it looks like there are two objects not just one. The first object is called Picture and the second one is an numeral with the extension jpg.

I scratch my head, still no coffee for me. Mickey the Cat just jumped at my table and looks at me with the looks in his eyes that tells me “FEED ME!!!!” – I obey without a word.

Getting Mickey’s food, I get my first sinister idea! Let use ls -i to get the inodes associated with each file and process them with find -inum to rename them. I serve Mickey his food and I feel empowered, live is so great!

Back at the keyboard, I execute:

for file in `ls -i | grep Picture | awk '{print $1}'`
     find . -inum $file -exec mv aaa$file {} \;

Pretty much as before, it does not work. Two times down for me.

It is obvious that what I just entered does not get the file name to rename; the aaa$file works like it is the first argument to the mv and not the second one. I drink my coffee, steer at the screen and think, I get an idea and I type it:

for file in `ls -i | grep Picture | awk '{print $1}'`
    find . -inum $file -exec mv {} aazaa$file.jpg \;

In the find snippet shown last, the {} takes on the first argument of command mv – the file matching the inode number delivered by ls -i. Next, the second argument is created in the format of aazaa$file.jpg and now the mv is working as designed. So now, Picture 201.jpg is not longer, replaced by aazaaPicture201.jpg. After I am done renaming this set of files and they are moved to the flash drive, the second batch will be prefixed with a different prefix and so forth till all of the images are processed and safely stored on my new flash drive in the new picture frame.

Of course, there are other ways to work this situation. For example, one could figure out what character separates the Picture from the number and use sed to either remove it or replace it with something else resulting in one “solid” file name to process. By the way, how do you delete a file name with a blank character at the end of its name? Inodes and find is my bet.

Posted in Real life AIX.

Tagged with , , , , , .

If I were to give you a gift, what would it be?

In you own place, at your own time – I hope you will enjoy it. Follow this link with your eyes wide open 🙂

Posted in Real life AIX.

Tagged with , , , , , .

What do they mean when they say “stanza”?

and why you should never manually edit files in the /etc/security…….

A lot of AIX configuration files have the “stanza” format. Look at the /etc/qconfig or almost any file in /etc/security to see what I mean. So what is the “stanza”?
It is a block of ASCII text starting with a token (a word) ending with : and ending with at least one blank line.

Why do I write about it, today? Well, yesterday I asked my colleague (Jon is the Tivoli Management Framework Administrator – among others) to execute on all our AIX hosts (he can do it with a single stroke of a keyboard) one “small” script that I put together to enable LDAP authentication for two specific users. Here are the contents of this script (there is only one long line starting with the echo command – not few as shown on your browser):


rmuser -p svcvulscan 
rmuser -p svcvulscan2

echo "svcvulscan:\n\tSYSTEM = LDAP\n\tregistry = LDAP\n\nsvcvulscan2:\n\tSYSTEM = LDAP\n\tregistry = LDAP\n" >> /etc/security/user

In the perfect world this should work like a charm….. but I forgot the this is the real world. What happen? On some AIX hosts the last user prior to running this script could no longer log-in. Why? If you look above at the line starting with the echo statement, you will notice that the entry svcvulscan: just get inserted int the file. Plain and simple.
But what is going to happen if the last entry in the /etc/security/user is not a “blank” line? In this case, the last stanza in this file extends “swallowing” the the svcvulscan entry as the result making the last user in this file an LDAP user. The following illustrates what I mean.

        admin = false
        SYSTEM = LDAP
        registry = LDAP

        SYSTEM = LDAP
        registry = LDAP

To really make the point and to clear any doubts, look at the following:

# grep -p brownh /etc/security/user
        admin = false
        SYSTEM = LDAP
        registry = LDAP

At this moment, AIX will not allow brownh to login – AIX cannot make sense of this user stanza in /etc/security/user! It is not just this user, svcvulscan also will not be able to fucntion.
To fix it, the truly yours had to insert a blank line above svcvulscan to mark the end of the stanza defining brownh.

Could this be avoided? Sure. Look bellow.


rmuser -p svcvulscan
rmuser -p svcvulscan2

echo "\nsvcvulscan:\n\tSYSTEM = LDAP\n\tregistry = LDAP\n\nsvcvulscan2:\n\tSYSTEM = LDAP\n\tregistry = LDAP\n" >> /etc/security/user

Do you see that now the script will enter a blank line before inserting the stanzas (the \n in front of svcvulscan:? It does not really matter how many blank lines are used to separate stanzas but there must be at least on for stanza to be a stanza. 🙂

What I have described in this post would not happened if on some machines at one point or another for some “then” valid reasons some AIX administrator (it could be me) manually edited the contents of /etc/security/user forgetting and not leaving at least one blank line at the end of this file. Have a good day!

Posted in Real life AIX.

Tagged with , , .

Improving PowerVM Environment

There is no question about it – PowerVM is here to stay. Its flexibility – the ease of employment of new “partitions” combined with the ease of modifying the existing ones transformed PowerVM from a novelty into something to be expected in each data center housing AIX. Earlier, when building PowerVM environments (VIOS + partitions) and, to be precise, when configuring the networking side of these environments, I noticed that “my” partitions network adapters were all attached to one virtual switch (Ethernet 0).

Well, how this Ethernet 0 switch came to be and if the digit 0 following the Eternet indicates the possibility of additional switches (like for example Ethernet 1, 2, ....) – how to create and use them? Are there any advantages or disadvantages of building and employing multiple Ethernet switches with PowerVMs? For those interested in this subject, I recommended studying this document: “Using Virtual Switches in PowerVM to Drive Maximum Value of 10Gb Ethernet” – thanks Rob for locating it!

Usually, if one builds two VIO servers in a frame, one does it to provide a level of redundancy to protect partitions against a failure of one of the VIO servers delivering resources to frames partitions. If this is the case, then the presence of a single Ethernet is a single point of failure, right? This could be on more reason for you to get acquainted with the above document…..

Posted in Real life AIX, VIO.

Tagged with , , , , .

VIOS Advisor Explained

“The goal of the VIOS advisor is not to provide another monitoring tool, but instead have an expert system view performance metrics already available to the customer and make assessments and recommendations based on the expertise and experience available within the IBM systems performance group.”

Sounds interesting? It does? Follow this link to the latest article by Rob McNelly in the “IBM Systems Magazine“, AIX edition.

Posted in AIX, Real life AIX.

Tagged with , , , .

Copyright © 2015 - 2016 Waldemar Mark Duszyk. - best viewed with your eyes.. Created by Blog Copyright.