Skip to content


illegal root access

Suddenly, out of a blue, one particular environment became a source of issues for us – the loyal servants of our user’s community aka your truly system administrators.
First, from being OK the environment became sluggish, slow, unresponsive. IBM asked for snaps which after verification proved that there was nothing wrong with the application server. Its memory, network, CPU and disk I/O did not show any stress, on the contrary there was/is an abundance of resources. So, the Oracle DBA’s in charge of the database associated with this application was asked to “look” into the database server, so he went looking all the way.

A day went by without nothing but the following morning the same application owner declared that some cron jobs and some of his scripts went missing, groups memberships were modified – who dare to do these things? To add more importance, his statement included the following line “It does not look good if/when I have to explain this to managerA or ManagerB“. Since, it is entirely possible that a virgin mind may read this post, I will refrain myself from announcing here what has exactly crossed my mind in response to this guy last statement.

So the pressure is on, every day in our mailboxes we find a new email with the same question – “have you found anything yet?”

Posted in Real life AIX.

Tagged with , , , , .


tuning AIX for XIV ….

or how to get out of bad situation…….. Life is, and always will be life, and as such it is not good nor bad. It is LIFE and we got to live it and have fun doing it. Satisfying the philosophical side of my nature, I will proceed to the task at hand.

This post describe procedure which allows change of SAN attributes like num_cmd_elems (in the case of FC adapters) and queue_depth for SAN disks with no need for the host re-boot.

Posted in AIX, Real life AIX.

Tagged with , , , , , .


timeout_policy the new PCM attribute

Straight from IBM AIX Support:

The new PCM attribute for default AIX PCM, called timeout_policy.

http://www-01.ibm.com/support/docview.wss?uid=isg1IZ96396

timeout_policy adjusts the behavior of the PCM (Path Control Module) related to command timeouts, and transport errors. Setting timeout_policy to either fail_path or disable_path may decrease performance degradation when a MPIO device encounters intermittent SAN fabric issues on some, but not all the paths to the device.

retry_path = First occurrence of command timeout on path will not cause immediate path failure. If a path that failed due to transport issues is recovered by a health check, then that path may be used immediately.

fail_path = Path will be failed on first occurrence of a command timeout (assuming it is not the last path in the path group). If a path that failed due to transport issues recovers, the path will not be used for read/write I/O until a period of time has expired with no failures on that path. Enabling this feature may add a delay before read/write I/O is routed to paths that have just recovered from a transport error.

disable_path = Path will be failed on first occurrence of a command timeout (assuming it is not the last path in the path group). If a path that failed due to transport issues recovers, the path will not be used for read/write I/O until a period of time has expired with no failures on that path. If this path continues to experience multiple command timeouts during a period of time, then it may be disabled. Disabled paths remain disabled (and not usable), until a user specifically runs the chpath command to enable the disabled path (or the affected disk is reconfigured or system rebooted). This option is not recommended for most users, since it may require manual intervention to recover paths. Refer to chpath and lspath man pages for more details regarding uses of these commands.

Posted in Real life AIX.

Tagged with , , , .


removing ShadowImage from a AIX host

In the past, HDS ShadowImage environment was installed and configured with its P-VOLs (the source disks) on one AIX host, and S-VOLs (the target disks) on a second AIX host.
For a convinience, all VOLs were grouped into a consistency group (a named collection of P/S-VOLs pairs). All control over this environment (HORCM) was set on the first host.
After all pairs in consistency group were created with the paircreate command, the data “inside” S-VOLs the consistency group is accessed by “splitting” the S-VOLs executing the pairsplit command. Following the split the recreatevg command executed against the hdisks identified in the horcm0.conf and horcm1.conf as the S-VOLs recreates the volume group and its contents. If there is no more need for this data, this volume group is destroyed. With a new need for the data the consistency group is re-synchronized, split and so forth all over again…… .

One day, there need for ShadowImage disappeared completely. This post shows how to safely remove ShadowImage from the host with running HORCM instance or instances and how to return all VOLs to their original state (SMPL) state as before the paircreate command was executed for the very first time.

Posted in Real life AIX.

Tagged with , , , , , .


command line editing pains ….. .

Hi,

it is bugging me for a some time already.
Do you know how to do that: “we need to change u50_lv into u60_lv and /u50 into /u60 in a “single step”, kind of like a global change in vi?

crfs -v jfs2 -d u50_lv -A yes -a log=INLINE -m /u50

I know, how to turn each 5 into 6, but this requires two steps and I want to do it with just one (is this possible at all ?).
Please let someone show me how this is done.

How to do it as a command line edit….. Not as an edit of a line in a file…..

Thanks,

MarkD 🙂

Posted in Real life AIX.


to remove a file with special characters in its name

-rw-r-----  1 root   system 1582339  Apr 05 20:24 smit.log
-rw-------  1 root   system  14751   Apr 06 07:36 .lsof_lawaptpu001
-rw-r-----  1 root   system      0   Apr 06 08:02 *
-rw-r-----  1 root   system 136792   Apr 10 05:25 dsmerror.log
-rw-r-----  1 root   system 37641208 Apr 10 05:25 dsmsched.log

Looking at the output above, you may find a surprise and if you have not dealt with such surprises before you may already be thinking how to remove the “offending” file. Executing rm * will quickly decrease the number of files in this location and most likely it is not what one intends to do. We have a few options.

One could execute the command rm simultaneously suppressing the meaning of the special character or characters included in the file name.

# rm '*'

Above, the command was instructed not to “expand” the * into any existing file name that does not have an extension, but to treat it as the single meaningless character * and to remove only the one file which name is a single *.

What to do if one finds a file named * ?.? In the last case, it is often difficult to establish the number of spaces in the file name. In this situation, you may use the find command to find the value of the inode associated with the file.

# ls -i
   52 *
   36 *    ? 
   11.java
   40 .lsof_lawaptpu001
   47 .sh_history
   64 .ssh
   31 .toc
16384 .topasrecrc
    6 .vi_history

Next, the find command is instructed to associate the the earlier identified inode with the file name which is given to the rm command for a prompt removal.

# find . -inum 36 -exec rm '{}' \;

If instead of removal you are interested in renaming the file, you could proceed as follow:

# find . -inum 36 -exec mv '{}' new_file_name \;

where the new_file_name is the new name.

Posted in Real life AIX.

Tagged with , , , .


argument list is too long……

Today, during our lunch break we attempted to fix my friend stationary bike (an exercise machine). Being the “real” admins we attempted the repairs without reading the manual – but of course! Well, you are free to imagine the outcome; now Adi has one almost functioning exercise machine and a few spare parts!

After the lunch, Adi calls for help – one of his file system is 96% full and in order to reclaim its capacity the files older then 90 days needs to be removed. Regardless that he has a full access to the directory tree and its contents and his command always worked before, this time the procedure does not work…. He gets error message about “something” being “too long”. The command he executes is:

find . -mtime +90 -exec rm -f {} \;

In the same directory, I execute the command ls -ltr which after a long while also refuses to work – “the argument list is too long”. In the past, I discovered that to prevent situation like this one I need to increase the value of the attribute called ncargs which belongs to the sys0 device. Currently, this attribute is set to 256 blocks of 4kb each and I cannot increase its value as I think a reboot is required to make the change effective.

# lsattr -El sys0 | grep ncargs
ncargs    256     ARG/ENV list size in 4K byte blocks     True

I gradually increase the age of files to be removed from 90 through, 180, 360, 720 days and still AIX responds with the same message – “the argument list is too long”…. There must be a huge number of files in this directory…..

It has to be my “lunch” time experience that made me execute the command man xargs. This decision proved to be a really good one indeed! I find that one can limit the number of arguments processed by the xargs command using its -L parameter followed with an appropriate number. With the freshly acquired knowledge, I modify the find command as follows:

find . -mtime +90 | xargs -L180 rm

Guess what? Now, the files are being removed!!!!!

I still have to digest the meaning of the -L of the xargs command. Accordingly with the man page “The generated command line length is the sum of the size, in bytes, of the Command and each Argument treated as strings, including a null byte terminator for each of these strings. The xargs command limits the command line length”. Digging deeper into the man page, one finds that

-L Number
            Runs the Command parameter with the specified number of nonempty parameter lines read from standard input. The last invocation of the Command parameter can have fewer parameter lines if fewer than the specified Number remain. A line ends with the first new-line character unless the last character of the line is a space or a tab. A trailing space indicates a continuation through the next nonempty line.

I am not sure about the meaning of these statements so I interrupted the last command and changed -L value to nargs * 4 = 1024 as in

find . -mtime +90 | xargs -L1024 rm

The procedure still works, files older then 90 days are being removed. Did I requested up to 1024 lines (file names) at a time for to the rm command to process? Is there any relation between nargs and the -L value or not? You who knows, please leave a comment, the inquiring minds wants to know 🙂

Posted in Real life AIX.

Tagged with , , , , , , .


centralized authentication – preparing for its failure

As the conventional wisdom goes, in UNIX (AIX) “plain” users can (it is recommended they do) authenticate via some global methods like NIS, LDAP, Kerberos, and so forth. For the application (also known as the administrative) and the “system” accounts it is recommended that they authenticate locally – that they are defined on the host.

I do not argue that for the few “really” secured environments it is an excellent idea to authenticate administrative users with a token (so passwords are never the same) or to use the equivalent method of obtaining the password for a specific account from a specific location and then immediately changing it and recording the change in the same secure depository so the next time the password is needed it will be used and also immediately changed. Yes, some admins work like that and I say it again – I understand and do not dispute the need for extreme security measures.

But, “the shoes that fit John may not fit his little brother Johnny”…. So for some other organizations the describe above security requirements may not be appropriate.
There is already a number of organizations storing all login names and passwords in a central depository like LDAP or AD to name just a few.
Without diving into details, the reasoning follows this path – if users cannot login because our authentication mechanism is not functioning, why do I need to worry if an admin account can or cannot do the same?
You may also ask a UNIX administrator how often does he/she has to change the ORACLE administrative password if there are 30 or more Oracle servers and 6 DBAs? Well, sometimes often, sometimes not but always the source of the same pain for both – UNIX and DB administrators.

If the communications between the password depository and client are made secure (for example using SSL) and the repository is protected from the “outside” interference why not to authenticate even the administrative accounts centrally instead of locally?
I vote for the centralized authentication, what about you?

Still, I believe that it is (if possible) a splendid idea to allow an administrative account the opportunity to authenticate locally when the global authentication mechanism is not functional. Why not?
For example, isn’t it nice to be able to log-in as root or an application administrator to gracefully shut down the host or the application despite not being able to resolve their credentials by LDAP server?

As always, AIX comes through and delivers…. The rest of this post shows how to quickly allow a user to authenticate locally when IBM TDS (LDAP) service is not available.

Posted in Real life AIX.

Tagged with , , , .


disabling journaling of AIX jfs2 file systems …

Since introduction of ver 6.1, file system journaling is not longer a permanent feature of AIX. If this is “news” for you, then you most likely wonder why? Well, let’s think about it for a moment.
Every “write” is written to the file system and it is also “acknowledged” in associated with the file system journal (log logical volume). The last sentence implies that for each write there are actually two.
With this knowledge in mind, one may really clearly see the beauty of the INLINE logs by visualizing a disk head moving to write to a file system and then moving (relocating) to the location on the platter where the journal “sits” – the INLINE log is placed in the middle of its file system so the time required for the disk head to travel is shorter then if the log is located outside of its file system….

Regardless of the journal (log) type there is time spent moving back and forth, from file system to its log volume and back again…… This time to move the head sometimes represents a waste as at specific circumstances the journal updates are counter productive – there waste time.

Posted in Real life AIX.

Tagged with , , .


fixing offline paths in dlnkmgr environment.

The dlnkmgr SAN driver is no exception, like in the case of sdddrv, sddpcm or just mpio for one reason or another one or more “paths” may go missing. This post shows how to enable missing paths in the dlnkmgr environment.

Posted in HDS, Real life AIX.




Copyright © 2016 - 2017 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.