Skip to content

user administration in modern times ……

It is often repeated, that system administrator should investigate non-local user authentication methods if his server farm is more than a few machines. I am not going to define what “a few machines” means. Instead, I will say that as long as there is already in place an existing authentication environment (like for example Active Directory) which can provide authentication services to UNIX than regardless of the number of UNIX host they all should be integrated with the authentication environment already in place.

For a lot of us (maybe for most of us), Active Directory is the “already in place” authentication environment as soon as a UNIX host is built it should to participate in your local Active Directory domain which means that the Active Directory UNIX services support is already enabled, and fitted with the necessary groups and users.

Often, while participating in the global authentication environment, applications administrative accounts are often still left at the host local level (/etc/passwd and /etc/groups), why? I am not talking about operating system administrative accounts (the ones pre-installed during operating system installation). What I have in mind are the accounts like oracle, oinstall, deploy, nagios, and so forth. I believe, that they also should be defined in Active Directory.

Some of you may start looking around for a stone or a rock to throw in my direction, please do not do it yet (let me hide first). Some of you are already screaming – “How to protect my applications against Active Directory failure”? Well, first start with the top and make the Active Directory highly available (load balancer?). But what will happen if Active Directory fails? Will users be able to login to their PC’s to access the data delivered by or residing on a UNIX host? If users are not able to login to their laptops why do you worry about the application account?

For the few of you who still find a reason to worry, you may be relieved to know that, for example AIX has the option to use a secondary authentication method – set it to the local authentication. If Active Directory fails to authenticate your user, AIX will use the data in /etc/passwd. By the way, Active Directory can also synchronize UNIX local passwords …… Isn’t this sweet?

What about if there is no Active Directory? Can you use NIS, NIS+ and/or LDAP?

The moral of this post is – get out of user, group, UID and GID administration, get out of the local user management – as much as possible.

Sooner you do it more time to waste or to learn you will have.

Posted in Linux, Real life AIX.

Tagged with , , , , , , , , , .

even sudo could be dangerous

sudo is just a tool. It is its usage that determines if it defends or defeats you. Consider the following excerpt from /etc/suders:

#Allows members of the users group to shutdown this system
#%users  localhost=/sbin/shutdown -h now

#Read drop-in files from /etc/sudoers.d (the # does not mean a comment)
#includedir /etc/sudoers.d

At the first glance, it looks like all lines above are the “comments” – each line starts with the # character which is the “default” UNIX character indicating a comment.
If looks can be deceiving this is the prime example how very true it is! Look again and this time read it!

The last line is not a “comment”. It is a working directive that says to process sudo directives found in the files contained by the /etc/sudoers.d directory! Oops this is a surprise! Isn’t it? Especially for AIX administrators just learning LINUX….

There a few files in this directory:

# ls
00_admin  01_devel  50_deploy  93_httpd  98_tomcat

One of these has the following line:

%admin    ALL=(ALL)   NOPASSWD: ALL

All you can can do now is to SCREAM!!!!!!!! The line says that all members of the admin group can execute all commands with no need for the root password! Let’s identify the members of this group.

# grep admin /etc/group

The three “team mates” – ren, stimpy and donald have ALL root privileges and if you ever forget the root password you may ask one of them to reset it for you – are you sure this is what you want?

Posted in AIX, Linux, Real life AIX.

Tagged with , , .

what application is using this port?

At time to time, I have to find what process is using a given port. Today, after asked this question and finding the answer in my old “PILOT” (do you remember PDA’s?) database, I recognized the need to put these answers here.

There are probably more ways to accomplish this task, I know about these two – in each case we are interested in application using port 19255 (cache):

The first solution:

markD:RDC:/root>lsof -i :19255
cache   47775962 epicdmn 3u  IPv4 0xf1000e000131cbb8  0t0  TCP *:19255 (LISTEN)
cache   47775962 epicdmn 4u  IPv4 0xf1000e000098d3b8  0t0  TCP> (ESTABLISHED)

The same could be simplified as follows:

markD:RDC:/root>lsof -i :19255 | grep 19255  | awk '{print $1}'

The second solution (requires three steps):

markD:RDC:/root>netstat -Aan | grep 19255
f1000e0012ea6bb8 tcp4   0  0  *.19255       *.*          LISTEN

markD:RDC:/root/>rmsock f1000e0012ea6bb8 tcpcb
The socket 0xf1000e0012ea6808 is being held by proccess 47775962 (cache).

markD:RDC:/root/>ps ax | grep 47775962
 47775962  - A  0:00 cache -s/epic/mod/cachesys/mgr -cj -p1121 JOB^%ZdUJ
 52035720 pts/12 A     0:00 grep 47775962

Posted in AIX, Real life AIX.

Tagged with , , , , .

ssh help wanted

I have 140 hosts – 80% AIX, the rest LINUX. From one host, I can ssh (no password needed) to all but 4 hosts. Why? I have checked everything I can think of. I can ssh tho these four from any other host with zero problems.
Why this one host causes me grief? Any ideas, please?

————-The next day……..
How many of you noticed that when faced with a seemingly difficult issue as soon as you share your thoughts (and your grief + the pain) with someone else, the resolution magically appears shortly later? I do believe that the collective compassion multiplied by the desire to help you is the solution delivery vehicle. I really do.
First, I want to thank all who answered my call! It worked again! All four hosts had the same issue.

I have a host, from which we can login/execute commands (ssh) on all other hosts with no need to enter the root password. This mechanism works for all but four machines which do not allow root logins and do not allow the following transaction too:

# ssh-copy-id -i root@badHost
root@badHost's password:
Permission denied, please try again.
root@grdoraqp1's password:
Permission denied, please try again.
root@grdoraqp1's password:

The “business” end of badHost looks like that:

# ls -ld /root/.ssh
drwx------    2 root     system          256 May 14 10:38 .ssh

The inside of the /root/.ssh:

# ls -l
total 32
drwx------    2 root     system          256 May 14 10:38 .
drwx------    5 root     system         4096 Apr 09 15:43 ..
-rw-------    1 root     system          396 May 14 10:38 authorized_keys
-rw-------    1 root     system         1679 May 14 10:33 id_rsa
-rw-r-----    1 root     system          396 May 14 10:33

I noticed that root is the only one having these issues. For no apparent reason (I do not know it yet but the solution is being delivered right now :-)), I decided to change root's password to something really simple, different.

# passwd root
Changing password for "root"
3004-616 User "root" does not exist.
3004-709 Error changing password for "root".

Woo, this is a surprise! I can login to this host with putty but I cannot change the password? Let’s see what AIX thinks about this accont.

# cd /etc
# grep -w root: passwd

Nothing wrong with the line above. Let’s dig deeper.

# cd security
# grep -p root: user

        admin = true
        expires = 0
        SYSTEM = "compat"
        account_locked = false
        rlogin = false
        loginretries = 0
        histexpire = 0
        histsize = 0
        minage = 0
        maxage = 0
        maxexpired = -1
        minalpha = 0
        minother = 0
        minlen = 0
        mindiff = 0
        maxrepeats = 8
        dictionlist =
        pwdchecks =
        admgroups = asmadmin,dba,oinstall,itmusers

Now, if you have not been “dealing” lately with authentication issues you may miss it. Something in this output is missing! Do you know what?
Since, root account is authenticated locally the missing line is:

 registry = files

As soon, as this live was added all the issues disappeared…… My ssh issues are over.


thanks gents!

Posted in Real life AIX.

LDAP replication issues and how to deal with them

Last week, we recognized that something was wrong with our LDAP environment. We noticed that some “newly” created user accounts were present on only one LDAP server instead of the two we have configured as PEERs. By now, it is painfully obvious that in a replication environment verification of contents of all participating servers (suppliers/replicas/PEERS) is of the paramount importance.
There are few ways to do that. The simplest one is to execute ldapsearch or tdsldapsearch command against each LDAP server of the given replicated environment looking for a user created a few minutes or hours ago ……
The other way is to verify the state of the replicated environment (not more complicated but possibly less known) done executing the idsldapldiff or ldapldiff commands. These two commands not only will “show” the “differences” between a pair of servers but also they are the tools to reconcile any discrepancies in the replicated environment. They saved the last Friday (why the most “issues” have tendencies to happen on Friday?).

The following line shows the ldapldiff command executed in the query mode (no fixing, no reconciliation will take place) – show me the differences!.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
           -sh aixtds1 -sp 389 -sD cn=root -sw secretpassword \
           -ch aixtds2 -cp 389 -cD cn=root -cw secretpassword \
           2>&1 | tee ldapDiff.Out

The -b argument defines the start of the “comparison”.
The -sh, -sp, -sD and -sw define the appropriate attributes for the Supplier server.
The -ch, -cp, -cD and -sw define the same attributes as above but for the Customer server.
These attributes are hostname, port used to communicated (in my case no SSL is used to do the querying), -sD/-cD indicates the following dn to use to bind to the server and finally -sw/-cw indicate the password to be used. The 2>&1 | tee ldapDiff.Out redirect all output to the file called ldapDiff.Out

The last command generates no output if both servers have the same content. The following excerpt is an example of what to expect when both servers contents are different.

< uid=mcknightr,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=viskerm1,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=finnp,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=zavorskim,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=ferraroa,ou=People,cn=aixdata=wmd,dc=edu
< uid=zervosr,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=kirklandv,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=nettless,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=klusman,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=landgraf,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=nair,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=garbesi,ou=People,cn=aixdata,dc=wmd,dc=edu

With the two servers out of sync, it is time to sync them. The same ldapldiff command will do the work. In a busy environment, first you should stop the server you are about to sync. If your environment does not see much traffic you can do it with all LDAP servers UP and RUNNING.

First, we will sync the “client” (aixtds2).

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds1 -sp 389 -sD cn=root  \
                   -sw secretpassword \
                   -ch aixtds2 -cp 389 -cD cn=root  \
                   -cw secretpassword -a -F -x

Next, we will verify it.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds1 -sp 389 -sD cn=root \
                   -sw secretpassword \
                   -ch aixtds2 -cp 389 -cD cn=root \
                   -cw secretpassword

Now, we will sync the “supplier”:

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds2 -sp 389 -sD cn=root \
                   -sw secretpassword \
                   -ch aixtds1 -cp 389 -cD cn=root \
                   -cw secretpassword -a -F -x

Next, we will verify it.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                  -sh aixtds2 -sp 389 -sD cn=root \
                  -sw secretpassword \
                  -ch aixtds1 -cp 389 -cD cn=root \
                  -cw secretpassword

At any time (as long as you have access to a browser) the state of replication can be investigated using the Administrative Console of the TDS LDAP servers .
More information on this subject can be found following these links:

Posted in Real life AIX.

Tagged with , , , , .

working with LINUX LVM

I am running out of space in a file system and its volume group does not have any free capacity left. Today is very special day – for the first time, I have to add a disk to the guest to expand its volume group to grow one of its file systems.

Currently, these are the disks the guest has following disks:

# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sdb  /dev/sdb1

which are employed as follows:

# pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda2  vg_syssatpl1 lvm2 a--   39.80g 19.29g
  /dev/sdb1  vg_satellite lvm2 a--  100.00g   243m

Adding the disk was easy – VMware and a few mouse clicks. Now, to check that the disk was really acquired:

# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sdb  /dev/sdb1  /dev/sdc

The new disk is called sdc. To move this disk into LVM realm it has to become a “physical volume” (remember, in AIX is exactly the same).

# pvcreate /dev/sdc

Now, to add it to the destination volume group (vg_satellite), we need to execute:

# vgextend vg_satellite /dev/sdc
  Volume group "vg_satellite" successfully extended

To simultaneously extend logical volume (vg_satellite-satellite_lv) and its file system (by 20GB):

# lvextend -r -L +20G /dev/mapper/vg_satellite-satellite_lv
  Extending logical volume satellite_lv to 70.83 GiB
  Logical volume satellite_lv successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_satellite-satellite_lv is mounted on /var/satellite; on-line resizing required
old desc_blocks = 4, new_desc_blocks = 5
Performing an on-line resize of /dev/mapper/vg_satellite-satellite_lv to 18567168 (4k) blocks.
The filesystem on /dev/mapper/vg_satellite-satellite_lv is now 18567168 blocks long.

Almost sweeting… First time is always nervous.

Posted in LINUX, Real life AIX.

users issues on RedHat

What is trivial for me in AIX is not trivial in LINUX – got to learn (:-)). This morning, a ticket in my queue announced that a user is locked out from a host. In AIX, the lsuser command would be the one I executed first. But this is not AIX, so I check my “training”manual, next I Google and then Mike comes to the office so I ask Mike.

Mike says, check /etc/passwd and /etc/shadow. Do you see any “strange” characters, anything not “normal” in there”. Nope, all looks OK.
Next, we go to the /var/log and check the contents of the file called secure. There is nothing really inside it – this file is very fresh – just a few hours old.
Mike says, look for the previous “versions” of this file.

# ls secure*
-rw------- 1 root root  2836 Apr 30 07:48 secure
-rw------- 1 root root 16623 Apr 27 07:44 secure.1
-rw------- 1 root root  6177 Apr 20 07:33 secure.2
-rw------- 1 root root 17075 Apr 13 07:32 secure.3
-rw------- 1 root root 13100 Apr  6 07:59 secure.4

OK, let’s check if the first of them contains any information (we are looking for user called jamesb).

# grep -i jamesb secure.1
Apr 25 20:15:54 IronMike sshd[19825]: pam_unix(sshd:account): expired password for user jamesb (password aged)
Apr 25 20:15:54 IronMike sshd[19825]: Accepted password for jamesb from port 53690 ssh2
Apr 25 20:15:54 IronMike sshd[19825]: pam_unix(sshd:session): session opened for user jamesb by (uid=0)
Apr 25 20:16:17 IronMike passwd: pam_unix(passwd:chauthtok): authentication failure; logname=jamesb uid=21126 euid=0 tty=pts/3 ruser= rhost=  user=jamesb
Apr 25 20:16:19 IronMike sshd[19825]: pam_unix(sshd:session): session closed for user jamesb
Apr 25 20:16:50 IronMike sshd[20409]: pam_unix(sshd:account): expired password for user jamesb (password aged)
Apr 25 20:16:50 IronMike sshd[20409]: Accepted password for jamesb from port 53695 ssh2
Apr 25 20:16:50 IronMike sshd[20409]: pam_unix(sshd:session): session opened for user jamesb by (uid=0)
Apr 25 20:18:48 IronMike passwd: pam_unix(passwd:chauthtok): password changed for jamesb
Apr 25 20:18:48 IronMike sshd[20409]: pam_unix(sshd:session): session closed for user jamesb
Apr 25 20:20:24 IronMike sshd[22419]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=  user=jamesb
Apr 25 20:20:26 IronMike sshd[22419]: Failed password for jamesb from port 53703 ssh2
Apr 25 20:20:41 IronMike sshd[22420]: Disconnecting: Too many authentication failures for jamesb
Apr 25 20:20:41 IronMike sshd[22419]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=  user=jamesb

It looks like jamesb had to change his password and promptly forgot what it was – let’s reset it to let james in.

By the way, I noticed that the archiving method for the /var/log/secure file has changed from RedHad 6.1 to 6.3. In RedHat newer version the “archived” versions are named by appending a date to the original file name.

# ls /var/log/secure*
secure  secure-20130407  secure-20130414  secure-20130421  secure-20130428

Please let me know if you know a better way.

Posted in Real life AIX.

Tagged with .

/etc/fstab, UUID‘s RedHat and VMware

Learn through suffering, really!

I started to build VMware RedHat guests with no experience whatsoever and while doing so I followed nothing else but what I have already learnt and what I do know – AIX. Building, I have used LINUX Logical Volume Manager to create volume groups and logical volumes which I topped with file systems. Following the best procedures, I populated /etc/fstab with UUID’s of the disks. Why? Because the literature says that the universally unique identifiers (which AIX equivalent is called PVID) is the preferred way to associate disks with their file systems because once UUID get assigned to a disk it will never change (a disk description like /dev/sda may change to something else if more disks are added to the host).

To see the currently used UUID‘s execute command called blkid.

#  blkid | sort
/dev/mapper/oracle_vg-u01_lv: UUID="7a9e4e58-174b-4567-93a7-9a479d4ce999" TYPE="ext4"
/dev/mapper/vg_sys-lv_home: UUID="8e393748-4432-4752-900d-dbdd71a4f7bb" TYPE="ext4"
/dev/mapper/vg_sys-lv_root: UUID="c2c612e2-ccf0-4311-b705-c1788a8afbbf" TYPE="ext4"
/dev/mapper/vg_sys-lv_swap: UUID="03f634ac-28d6-44fb-9742-dfb1e621e358" TYPE="swap"
/dev/mapper/vg_sys-lv_temp: UUID="0e830c56-c697-403f-993e-ae442e6827f8" TYPE="ext4"
/dev/mapper/vg_sys-lv_usr: UUID="cbda1389-a042-4059-b43a-bb0920e02d2d" TYPE="ext4"
/dev/mapper/vg_sys-lv_var: UUID="48e1864c-1e09-4204-88ad-7ca16429c8cd" TYPE="ext4"
/dev/sda1: UUID="e7d8928a-fc04-40fd-b625-e9da99732c3b" TYPE="ext4"
/dev/sda2: UUID="6wZrl9-XhOr-RALD-9Sbp-Lhoi-GRYe-EIs4LH" TYPE="LVM2_member"
/dev/sdb: UUID="1mddKd-lDw7-YjDe-BVt9-4pKO-j8Oh-w3x6UA" TYPE="LVM2_member"

From the output above, we all can see that the logical volume named u01_lv (member of oracle_vg volume group) is assigned UUID=fa9e4e58-174b-4567-93a7-9a479d4ce341.

The next listing represents the contents of the host /etc/fstab file using the UUID code for u01_lv – as I originally entered it.

UUID=7a9e4e58-174b-4567-93a7-9a479d4ce999  /u01 ext4 defaults   1 2

Then, I noticed that the “other” logical volumes in this file had 1 and 2 so without much thinking I followed the already established pattern and I use them too. By the way, these other logical volumes belonged to the guest “rootvg“.

Days became weeks, weeks became months. Everything worked like a charm. About a year later, a guest had to be rebooted and it did not came back. The console “said” that UUID of the disk holding u01_lv has changed…… Isn’t this just peachy?

At this time, I recognized my mistake which I made without even knowing it. For as long as the UUID of the sdb disk stayed the same my mistake remained hidden. But eventually …… . For some reasons beyond our knowledge, one disk UUID (which should always be “unique” and “constant”) has changed and its corresponding entry in the /etc/fstab file was not longer true which resulted in the following – First, LINUX kernel “see” that /dev/sdb has different UUID and it throws a message about it to the console. Next, kernel want to mount /u10 that is not there so kernel decides to fsck the logical volume and the volume is not there …. . Kernel has no idea what is going on and it surrenders delivering us to the “Maintenance Mode” – please help me! Here, nothing can be done because the /etc/fstab cannot be edited as / is mounted read-only!

It took a few minutes and the guest was booted in the rescue mode – in this mode you can modify the contents of the / file system. What content? Like for example /etc/fstab. Inside this file, the start and the end of the line describing /u01 file system has to be modified. To obtain the new UUID one need to execute the blkid command. Next, comes the end of the line – the last two digits to be exact. As per Mike’s advice, I replaced the existing numbers with 0 (zero). So now the line reads:

UUID=fa9e4e58-174b-4567-93a7-9a479d4ce341  /u01 ext4 defaults   0 0

Why the two zeros? After the host was back on-line, I spent some time and actually read the you guess what (the man pages) – actually I got this info from where I found a very detailed description of /etc/fstab contents. Below are two two sections dealing with the fifth and the sixth element.

(5) The fifth column is used to determine whether the dump command will backup the file system. This column is rarely used and has two options: 0, do not dump, which is used for most partitions, and 1, dump, which is used for the root partition.
(6) The sixth column is used by the fsck program to determine the order in which the computer checks the file systems when it boots. The three possible values for the column are: 0, do not check, 1, check first (only the root partition should have this setting) and 2, check after the root partition has been checked. Most Linux distributions set all the partitions to 0, except for the root partition. If maintenance is important, 2 should be used, although this can increase the amount of time required for booting.

Going back to the changed UUID it could be possible that this is the reason why –

During this “difficult” time, our new LINUX/TIVOLI administrator Mike “Ski” Swierczynski (also an ex Marine) walked me through the recovery process and he was the one who pointed the wrong options (the last two columns) in the /etc/fstab – thanks Mike and Semper FI!

Posted in Real life AIX.

recovering rootvg missing vSCSI disks

Getting ready to AIX upgrade, it become apparent that “something” happened to one of the two VIO servers of this managed system (frame). All sixteen guests (lpars) had missing disk. The missing disk was always hdisk0 which points (in our case) to vios1 (the hdisk1‘s are delivered from disk pool of vios2).

At this time both VIOS servers are fully operational so the current question is how to quickly recover and restore the rootvg mirroring on the affected partitions?

Start with listing the host dump devices.

# sysdumpdev -l
primary              /dev/dump0
secondary            /dev/dump1
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         fw-assisted
full memory dump     disallow

Temporarily disable them.

# sysdumpdev -P -p /dev/sysdumpnull
# sysdumpdev -P -s /dev/sysdumpnull

Verify the last two steps:

# sysdumpdev -l
primary              /dev/sysdumpnull
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    TRUE
dump compression     ON

After LVM detects issues with a disk lasting longer than a “certain” length of time, it will declare this disk missing and LVM will no longer be interested in this device. You have to make LVM “re-analyze” rootvg disks executing the varyonvg rootvg command. LVM will change the state of the previously missing disk to active and in this case since volume group is mirrored this will also automatically trigger the syncvg command resulting in gradual disappearance of stale partitions.

# varyonvg rootvg

Verify that both disks are active

# lsvg -p rootvg

Check for the logical volume synchronization executing the ps command and if lvsync is not running start it executing syncvg -P 32 -v rootvg

Finally, activate both dumps 🙂

# sysdumpdev -P -p /dev/dump0
# sysdumpdev -P -s /dev/dump1

Copy/paste/reuse on all remaining partitions.

Posted in Real life AIX.

Copyright © 2016 - 2017 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.