Skip to content


ssh help wanted

I have 140 hosts – 80% AIX, the rest LINUX. From one host, I can ssh (no password needed) to all but 4 hosts. Why? I have checked everything I can think of. I can ssh tho these four from any other host with zero problems.
Why this one host causes me grief? Any ideas, please?

————-The next day……..
How many of you noticed that when faced with a seemingly difficult issue as soon as you share your thoughts (and your grief + the pain) with someone else, the resolution magically appears shortly later? I do believe that the collective compassion multiplied by the desire to help you is the solution delivery vehicle. I really do.
First, I want to thank all who answered my call! It worked again! All four hosts had the same issue.

I have a host, from which we can login/execute commands (ssh) on all other hosts with no need to enter the root password. This mechanism works for all but four machines which do not allow root logins and do not allow the following transaction too:

# ssh-copy-id -i id_rsa.pub root@badHost
root@badHost's password:
Permission denied, please try again.
root@grdoraqp1's password:
Permission denied, please try again.
root@grdoraqp1's password:

The “business” end of badHost looks like that:

# ls -ld /root/.ssh
drwx------    2 root     system          256 May 14 10:38 .ssh

The inside of the /root/.ssh:

# ls -l
total 32
drwx------    2 root     system          256 May 14 10:38 .
drwx------    5 root     system         4096 Apr 09 15:43 ..
-rw-------    1 root     system          396 May 14 10:38 authorized_keys
-rw-------    1 root     system         1679 May 14 10:33 id_rsa
-rw-r-----    1 root     system          396 May 14 10:33 id_rsa.pub

I noticed that root is the only one having these issues. For no apparent reason (I do not know it yet but the solution is being delivered right now :-) ), I decided to change root's password to something really simple, different.

# passwd root
Changing password for "root"
3004-616 User "root" does not exist.
3004-709 Error changing password for "root".

Woo, this is a surprise! I can login to this host with putty but I cannot change the password? Let’s see what AIX thinks about this accont.

# cd /etc
# grep -w root: passwd
root:!:0:0::/root:/usr/bin/ksh

Nothing wrong with the line above. Let’s dig deeper.

# cd security
# grep -p root: user

root:
        admin = true
        expires = 0
        SYSTEM = "compat"
        account_locked = false
        rlogin = false
        loginretries = 0
        histexpire = 0
        histsize = 0
        minage = 0
        maxage = 0
        maxexpired = -1
        minalpha = 0
        minother = 0
        minlen = 0
        mindiff = 0
        maxrepeats = 8
        dictionlist =
        pwdchecks =
        admgroups = asmadmin,dba,oinstall,itmusers

Now, if you have not been “dealing” lately with authentication issues you may miss it. Something in this output is missing! Do you know what?
Since, root account is authenticated locally the missing line is:

 registry = files

As soon, as this live was added all the issues disappeared…… My ssh issues are over.

:-)

thanks gents!

Posted in Real life AIX.


LDAP replication issues and how to deal with them

Last week, we recognized that something was wrong with our LDAP environment. We noticed that some “newly” created user accounts were present on only one LDAP server instead of the two we have configured as PEERs. By now, it is painfully obvious that in a replication environment verification of contents of all participating servers (suppliers/replicas/PEERS) is of the paramount importance.
There are few ways to do that. The simplest one is to execute ldapsearch or tdsldapsearch command against each LDAP server of the given replicated environment looking for a user created a few minutes or hours ago ……
The other way is to verify the state of the replicated environment (not more complicated but possibly less known) done executing the idsldapldiff or ldapldiff commands. These two commands not only will “show” the “differences” between a pair of servers but also they are the tools to reconcile any discrepancies in the replicated environment. They saved the last Friday (why the most “issues” have tendencies to happen on Friday?).

The following line shows the ldapldiff command executed in the query mode (no fixing, no reconciliation will take place) – show me the differences!.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
           -sh aixtds1 -sp 389 -sD cn=root -sw secretpassword \
           -ch aixtds2 -cp 389 -cD cn=root -cw secretpassword \
           2>&1 | tee ldapDiff.Out

The -b argument defines the start of the “comparison”.
The -sh, -sp, -sD and -sw define the appropriate attributes for the Supplier server.
The -ch, -cp, -cD and -sw define the same attributes as above but for the Customer server.
These attributes are hostname, port used to communicated (in my case no SSL is used to do the querying), -sD/-cD indicates the following dn to use to bind to the server and finally -sw/-cw indicate the password to be used. The 2>&1 | tee ldapDiff.Out redirect all output to the file called ldapDiff.Out

The last command generates no output if both servers have the same content. The following excerpt is an example of what to expect when both servers contents are different.

...............................
< uid=mcknightr,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=viskerm1,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=finnp,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=zavorskim,ou=People,cn=aixdata,dc=wmd,dc=edu
< uid=ferraroa,ou=People,cn=aixdata=wmd,dc=edu
< uid=zervosr,ou=People,cn=aixdata,dc=wmd,dc=edu
...............................
> uid=kirklandv,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=nettless,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=klusman,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=landgraf,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=nair,ou=People,cn=aixdata,dc=wmd,dc=edu
> uid=garbesi,ou=People,cn=aixdata,dc=wmd,dc=edu

With the two servers out of sync, it is time to sync them. The same ldapldiff command will do the work. In a busy environment, first you should stop the server you are about to sync. If your environment does not see much traffic you can do it with all LDAP servers UP and RUNNING.

First, we will sync the “client” (aixtds2).

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds1 -sp 389 -sD cn=root  \
                   -sw secretpassword \
                   -ch aixtds2 -cp 389 -cD cn=root  \
                   -cw secretpassword -a -F -x

Next, we will verify it.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds1 -sp 389 -sD cn=root \
                   -sw secretpassword \
                   -ch aixtds2 -cp 389 -cD cn=root \
                   -cw secretpassword

Now, we will sync the “supplier”:

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                   -sh aixtds2 -sp 389 -sD cn=root \
                   -sw secretpassword \
                   -ch aixtds1 -cp 389 -cD cn=root \
                   -cw secretpassword -a -F -x

Next, we will verify it.

# ldapdiff -b "cn=aixdata,dc=wmd,dc=edu" \
                  -sh aixtds2 -sp 389 -sD cn=root \
                  -sw secretpassword \
                  -ch aixtds1 -cp 389 -cD cn=root \
                  -cw secretpassword

At any time (as long as you have access to a browser) the state of replication can be investigated using the Administrative Console of the TDS LDAP servers .
More information on this subject can be found following these links:
TroubleshootReplicationTopology.pdf
idsldapdiffldapdiffMAN.pdf

Posted in Real life AIX.

Tagged with , , , , .


working with LINUX LVM

I am running out of space in a file system and its volume group does not have any free capacity left. Today is very special day – for the first time, I have to add a disk to the guest to expand its volume group to grow one of its file systems.

Currently, these are the disks the guest has following disks:

# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sdb  /dev/sdb1

which are employed as follows:

# pvs
  PV         VG           Fmt  Attr PSize   PFree
  /dev/sda2  vg_syssatpl1 lvm2 a--   39.80g 19.29g
  /dev/sdb1  vg_satellite lvm2 a--  100.00g   243m

Adding the disk was easy – VMware and a few mouse clicks. Now, to check that the disk was really acquired:

# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sdb  /dev/sdb1  /dev/sdc

The new disk is called sdc. To move this disk into LVM realm it has to become a “physical volume” (remember, in AIX is exactly the same).

# pvcreate /dev/sdc

Now, to add it to the destination volume group (vg_satellite), we need to execute:

# vgextend vg_satellite /dev/sdc
  Volume group "vg_satellite" successfully extended

To simultaneously extend logical volume (vg_satellite-satellite_lv) and its file system (by 20GB):

# lvextend -r -L +20G /dev/mapper/vg_satellite-satellite_lv
  Extending logical volume satellite_lv to 70.83 GiB
  Logical volume satellite_lv successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_satellite-satellite_lv is mounted on /var/satellite; on-line resizing required
old desc_blocks = 4, new_desc_blocks = 5
Performing an on-line resize of /dev/mapper/vg_satellite-satellite_lv to 18567168 (4k) blocks.
The filesystem on /dev/mapper/vg_satellite-satellite_lv is now 18567168 blocks long.

Almost sweeting… First time is always nervous.
:-)

Posted in LINUX, Real life AIX.


users issues on RedHat

What is trivial for me in AIX is not trivial in LINUX – got to learn (:-)). This morning, a ticket in my queue announced that a user is locked out from a host. In AIX, the lsuser command would be the one I executed first. But this is not AIX, so I check my “training”manual, next I Google and then Mike comes to the office so I ask Mike.

Mike says, check /etc/passwd and /etc/shadow. Do you see any “strange” characters, anything not “normal” in there”. Nope, all looks OK.
Next, we go to the /var/log and check the contents of the file called secure. There is nothing really inside it – this file is very fresh – just a few hours old.
Mike says, look for the previous “versions” of this file.

# ls secure*
-rw------- 1 root root  2836 Apr 30 07:48 secure
-rw------- 1 root root 16623 Apr 27 07:44 secure.1
-rw------- 1 root root  6177 Apr 20 07:33 secure.2
-rw------- 1 root root 17075 Apr 13 07:32 secure.3
-rw------- 1 root root 13100 Apr  6 07:59 secure.4

OK, let’s check if the first of them contains any information (we are looking for user called jamesb).

# grep -i jamesb secure.1
Apr 25 20:15:54 IronMike sshd[19825]: pam_unix(sshd:account): expired password for user jamesb (password aged)
Apr 25 20:15:54 IronMike sshd[19825]: Accepted password for jamesb from 10.25.40.3 port 53690 ssh2
Apr 25 20:15:54 IronMike sshd[19825]: pam_unix(sshd:session): session opened for user jamesb by (uid=0)
Apr 25 20:16:17 IronMike passwd: pam_unix(passwd:chauthtok): authentication failure; logname=jamesb uid=21126 euid=0 tty=pts/3 ruser= rhost=  user=jamesb
Apr 25 20:16:19 IronMike sshd[19825]: pam_unix(sshd:session): session closed for user jamesb
Apr 25 20:16:50 IronMike sshd[20409]: pam_unix(sshd:account): expired password for user jamesb (password aged)
Apr 25 20:16:50 IronMike sshd[20409]: Accepted password for jamesb from 10.25.40.3 port 53695 ssh2
Apr 25 20:16:50 IronMike sshd[20409]: pam_unix(sshd:session): session opened for user jamesb by (uid=0)
Apr 25 20:18:48 IronMike passwd: pam_unix(passwd:chauthtok): password changed for jamesb
Apr 25 20:18:48 IronMike sshd[20409]: pam_unix(sshd:session): session closed for user jamesb
Apr 25 20:20:24 IronMike sshd[22419]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.25.40.3  user=jamesb
Apr 25 20:20:26 IronMike sshd[22419]: Failed password for jamesb from 10.25.40.3 port 53703 ssh2
Apr 25 20:20:41 IronMike sshd[22420]: Disconnecting: Too many authentication failures for jamesb
Apr 25 20:20:41 IronMike sshd[22419]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.25.40.3  user=jamesb

It looks like jamesb had to change his password and promptly forgot what it was – let’s reset it to let james in.

By the way, I noticed that the archiving method for the /var/log/secure file has changed from RedHad 6.1 to 6.3. In RedHat newer version the “archived” versions are named by appending a date to the original file name.

# ls /var/log/secure*
secure  secure-20130407  secure-20130414  secure-20130421  secure-20130428

Please let me know if you know a better way.
:-)

Posted in Real life AIX.

Tagged with .


when and how to virtualize, a few years later

Over two and a half years ago, I posted a “handful” of my own ideas on the subject of virtualization (when and how to virtualize). A few days ago, someone I know very well completed research of their own costs of running multiple virtualization platforms built on VMware and IBM VIOS technology. The verdict is in and it does not favor virtualization regardless of its maker.

Using the dollars spent on hardware + software and including the costs of associated maintenance it is clear that virtualization is not just more expensive than buying individual hardware and software but also in many (if not always) cases it is “the” reason behind the “waste” of resources. It seems that contrary to the popular belief, virtualization does not promote better hardware utilization.
First, they compared costs of the very number of guests/lpars they have versus the costs of individual hosts configured like the lpars/guests (CPU/RAM, etc). The second option was less expensive.

What about my second claim – “waste of resource”? When you build virtual “silo” you always scope more in the terms of resources (RAM, CPU, etc) then you really need because you “need to have” the “room to grow” – it seems that we do not like to live with no room to grow(?) But is it really necessary? In most cases this additional capacity is not used for months at a time and when it is finally used the “reserve” must be replenished (as soon as possible) again because you do not know what the future holds for you …. So you buy it and it waits again for the opportunity to be used, maybe weeks, maybe months maybe more. Your money that could at least produce interests in a bank, they are spent, gone -while what you have purchased waits to be used. Are you absolutely sure that this is the best form of allocating your capital? Couldn’t you wait with this purchase a little bit longer? Just kidding, relax.

You say that you virtualised because of mobility. As soon as you say it you have to think not just about the additional license (I do not remember, but I think it still requires a license) but also about the costs of SAN. Yes, mobility cannot exist without SAN boot (and a few other minor requirements like identical virtual adapters and partition ID’s on the source and the destination managed system – why still these restrictions?).

The same people that I already mentioned calculated the cost of 1GB SAN storage versus 1GB of a local disk. The verdict – SAN is more expensive. Is it really necessary to SAN boot every AIX/LINUX/SOLARIS/HP machine in your data center? Really? Give me one good reason to blindly set all UNIX hosts for SAN boot. Is it because you do not foot the bill or because all your partitions are “mobile”?

Virtualization, at least when done with VIOS places restrictions on the environment. Why restrictions? OK, you have X partitions and a firmware needs to be upgraded.

Question: How many partitions have to experience DOWNTIME?
Answer: All.
Question: How many application owners do you have to negotiate with to agree on a date/time for this operation?
Answer: A lot.

So you do partition mobility in order to avoid downtime to do firmware upgrade? Can you come up with a better reason, please?

Has anybody out there has access to electric bills before and after virtualization? I find it almost impossible to believe that the virtualized uses less electricity than the traditional data center. A network adapter serving one guest should not take more power than a network adapter serving ten or twenty. Even LHA adapter serving one host should consume less power than when shared by more than one host. If I am wrong than such adapter must be working “slower” to consume less power. More load on a resource equals more generated heat – isn’t this always the case? I do not want a proof from a contracted by the manufacturer testing company, I would like to see the bills before and after, really.

Am I saying that virtualization sucks? Nope, not at all. There is a definite room for it and if used for the right reasons (for example in a data center with a limited footprint – under a deck of a battleship?) it brings benefits. Sometimes the flexibility it offers is more important than the associated with it costs.
But if implemented for its own sake based on nothing else but a salesman propaganda, it drains ones wallet and it does it very efficiently indeed. Can an application be put behind a load balancer? Yes, then what do you need partition mobility for?

So what are the choices? How to virtualize on a budget? I put my bet on a wpar. Get a RAID controller and a drawer full of disks and use wpars with their rootvg storage based on these “RAID-ed” disks (do not LVM mirror it), use SAN for volume groups with data if their capacities are large enough to make it the better option. This virtualization model does not involve costs of deploying VIO server or servers. Additionally the resources (and the costs) that you would have to allocate to VIOS can now be allocated to wpars and if you really want to you can relocate wpar too. Not to mention that wpar “parent” does not need to be a dedicated partition like a VIO, I know about parents running applications like its wpars…… like father like son (so to speak).

I have to admit, I liked wpars from the moment I saw one, and I still like them a lot!

Posted in Real life AIX.


/etc/fstab, UUID‘s RedHat and VMware

Learn through suffering, really!

I started to build VMware RedHat guests with no experience whatsoever and while doing so I followed nothing else but what I have already learnt and what I do know – AIX. Building, I have used LINUX Logical Volume Manager to create volume groups and logical volumes which I topped with file systems. Following the best procedures, I populated /etc/fstab with UUID’s of the disks. Why? Because the literature says that the universally unique identifiers (which AIX equivalent is called PVID) is the preferred way to associate disks with their file systems because once UUID get assigned to a disk it will never change (a disk description like /dev/sda may change to something else if more disks are added to the host).

To see the currently used UUID‘s execute command called blkid.

#  blkid | sort
/dev/mapper/oracle_vg-u01_lv: UUID="7a9e4e58-174b-4567-93a7-9a479d4ce999" TYPE="ext4"
/dev/mapper/vg_sys-lv_home: UUID="8e393748-4432-4752-900d-dbdd71a4f7bb" TYPE="ext4"
/dev/mapper/vg_sys-lv_root: UUID="c2c612e2-ccf0-4311-b705-c1788a8afbbf" TYPE="ext4"
/dev/mapper/vg_sys-lv_swap: UUID="03f634ac-28d6-44fb-9742-dfb1e621e358" TYPE="swap"
/dev/mapper/vg_sys-lv_temp: UUID="0e830c56-c697-403f-993e-ae442e6827f8" TYPE="ext4"
/dev/mapper/vg_sys-lv_usr: UUID="cbda1389-a042-4059-b43a-bb0920e02d2d" TYPE="ext4"
/dev/mapper/vg_sys-lv_var: UUID="48e1864c-1e09-4204-88ad-7ca16429c8cd" TYPE="ext4"
/dev/sda1: UUID="e7d8928a-fc04-40fd-b625-e9da99732c3b" TYPE="ext4"
/dev/sda2: UUID="6wZrl9-XhOr-RALD-9Sbp-Lhoi-GRYe-EIs4LH" TYPE="LVM2_member"
/dev/sdb: UUID="1mddKd-lDw7-YjDe-BVt9-4pKO-j8Oh-w3x6UA" TYPE="LVM2_member"

From the output above, we all can see that the logical volume named u01_lv (member of oracle_vg volume group) is assigned UUID=fa9e4e58-174b-4567-93a7-9a479d4ce341.

The next listing represents the contents of the host /etc/fstab file using the UUID code for u01_lv – as I originally entered it.

UUID=7a9e4e58-174b-4567-93a7-9a479d4ce999  /u01 ext4 defaults   1 2

Then, I noticed that the “other” logical volumes in this file had 1 and 2 so without much thinking I followed the already established pattern and I use them too. By the way, these other logical volumes belonged to the guest “rootvg“.

Days became weeks, weeks became months. Everything worked like a charm. About a year later, a guest had to be rebooted and it did not came back. The console “said” that UUID of the disk holding u01_lv has changed…… Isn’t this just peachy?

At this time, I recognized my mistake which I made without even knowing it. For as long as the UUID of the sdb disk stayed the same my mistake remained hidden. But eventually …… . For some reasons beyond our knowledge, one disk UUID (which should always be “unique” and “constant”) has changed and its corresponding entry in the /etc/fstab file was not longer true which resulted in the following – First, LINUX kernel “see” that /dev/sdb has different UUID and it throws a message about it to the console. Next, kernel want to mount /u10 that is not there so kernel decides to fsck the logical volume and the volume is not there …. . Kernel has no idea what is going on and it surrenders delivering us to the “Maintenance Mode” – please help me! Here, nothing can be done because the /etc/fstab cannot be edited as / is mounted read-only!

It took a few minutes and the guest was booted in the rescue mode – in this mode you can modify the contents of the / file system. What content? Like for example /etc/fstab. Inside this file, the start and the end of the line describing /u01 file system has to be modified. To obtain the new UUID one need to execute the blkid command. Next, comes the end of the line – the last two digits to be exact. As per Mike’s advice, I replaced the existing numbers with 0 (zero). So now the line reads:

UUID=fa9e4e58-174b-4567-93a7-9a479d4ce341  /u01 ext4 defaults   0 0

Why the two zeros? After the host was back on-line, I spent some time and actually read the you guess what (the man pages) – actually I got this info from www.linfo.org where I found a very detailed description of /etc/fstab contents. Below are two two sections dealing with the fifth and the sixth element.

(5) The fifth column is used to determine whether the dump command will backup the file system. This column is rarely used and has two options: 0, do not dump, which is used for most partitions, and 1, dump, which is used for the root partition.
(6) The sixth column is used by the fsck program to determine the order in which the computer checks the file systems when it boots. The three possible values for the column are: 0, do not check, 1, check first (only the root partition should have this setting) and 2, check after the root partition has been checked. Most Linux distributions set all the partitions to 0, except for the root partition. If maintenance is important, 2 should be used, although this can increase the amount of time required for booting.

Going back to the changed UUID it could be possible that this is the reason why – http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1026710).

During this “difficult” time, our new LINUX/TIVOLI administrator Mike “Ski” Swierczynski (also an ex Marine) walked me through the recovery process and he was the one who pointed the wrong options (the last two columns) in the /etc/fstab – thanks Mike and Semper FI!

Posted in Real life AIX.


recovering rootvg missing vSCSI disks

Getting ready to AIX upgrade, it become apparent that “something” happened to one of the two VIO servers of this managed system (frame). All sixteen guests (lpars) had missing disk. The missing disk was always hdisk0 which points (in our case) to vios1 (the hdisk1‘s are delivered from disk pool of vios2).

At this time both VIOS servers are fully operational so the current question is how to quickly recover and restore the rootvg mirroring on the affected partitions?

Start with listing the host dump devices.

# sysdumpdev -l
primary              /dev/dump0
secondary            /dev/dump1
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON
type of dump         fw-assisted
full memory dump     disallow

Temporarily disable them.

# sysdumpdev -P -p /dev/sysdumpnull
# sysdumpdev -P -s /dev/sysdumpnull

Verify the last two steps:

# sysdumpdev -l
primary              /dev/sysdumpnull
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    TRUE
dump compression     ON

After LVM detects issues with a disk lasting longer than a “certain” length of time, it will declare this disk missing and LVM will no longer be interested in this device. You have to make LVM “re-analyze” rootvg disks executing the varyonvg rootvg command. LVM will change the state of the previously missing disk to active and in this case since volume group is mirrored this will also automatically trigger the syncvg command resulting in gradual disappearance of stale partitions.

# varyonvg rootvg

Verify that both disks are active

# lsvg -p rootvg

Check for the logical volume synchronization executing the ps command and if lvsync is not running start it executing syncvg -P 32 -v rootvg

Finally, activate both dumps :-)

# sysdumpdev -P -p /dev/dump0
# sysdumpdev -P -s /dev/dump1

Copy/paste/reuse on all remaining partitions.

Posted in Real life AIX.


extending file system in LINUX

This morning there is a ticket in my queue to extend the /tmp file system on one of RedHat 6.2 hosts to 6gb. A few weeks earlier, when needed to do that, I used to commands. First the lvextend to make larger the underlying file system logical volume and next, the command resize2fs to extend the file system to use the additional capacity of its logical volume.

Today, I actually took the time to read the output of the man lvextend and I recognized that this operation just like in AIX can be done with a single step. The objective is to make /tmp 6gb big.

# df -h /tmp
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_sys-lv_temp
                        4.0G  137M  3.7G   4% /tmp

# lvextend -r -L 6G /dev/mapper/vg_sys-lv_temp
  Extending logical volume lv_temp to 6.00 GiB
  Logical volume lv_temp successfully resized
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/mapper/vg_sys-lv_temp is mounted on /tmp; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/vg_sys-lv_temp to 1572864 (4k) blocks.
The filesystem on /dev/mapper/vg_sys-lv_temp is now 1572864 blocks long.

# df -h /tmp
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_sys-lv_temp
                      6.0G  137M  5.5G   3% /tmp

Note: It is the -r that makes extendlv to extend the logical volume and the associated with it file system. Using -L 6G sets the target size at 6gb but using -L +6G would make /tmp 10gb.

Posted in Real life AIX.


the cost of distraction …… restoring permissions on a filesystem and its contents

Today is a special day for us. We screw up and now we need to change ownership of every file and directory in a file system. How did we get there? We needed to change owners of file systems which names followed the pattern of /u01 through /u60. So what we did was chown oracle.dba /u* instead of chown oracle.dba /u[0-9][0-9]!
As the result the /usr file system and all of its contents got a new “mommy” and “daddy” which are known now as oracle.dba ….. Now, the original owners have to be restored and they might be not root.system …..
So what we did? We located a host with identical version of AIX as the two hosts we just messed up. After login-in, we executed the following script:

#!/bin/ksh
cd /tmp
rm reset.perms.out 2>/dev/null
find /usr -ls |awk ‘{print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}’|
awk ‘{ if ( NF == “9″ ) {
printf (“chown %s.%s %s\n”,$3,$4,$9)
{
perms=0
if(substr($1,2,1) == “r”)
            perms = perms + 400
if(substr($1,3,1) == “w”)
            perms = perms + 200
if(substr($1,4,1) == “x”)
            perms = perms + 100
if(substr($1,4,1) == “S”)
            perms = perms + 4000
if(substr($1,4,1) == “s”)
            perms = perms + 4100
if(substr($1,5,1) == “r”)
           perms = perms + 40
if(substr($1,6,1) == “w”)
           perms = perms + 20
if(substr($1,7,1) == “x”)
           perms = perms + 10
if(substr($1,7,1) == “S”)
           perms = perms + 2000
if(substr($1,7,1) == “s”)
           perms = perms + 2010
if(substr($1,8,1) == “r”)
           perms = perms + 4
if(substr($1,9,1) == “w”)
           perms = perms + 2
if(substr($1,10,1) == “x”)
           perms = perms + 1
if(substr($1,10,1) == “T”)
           perms = perms + 1000
if(substr($1,10,1) == “t”)
           perms = perms + 1001
           printf(“chmod %d %s # %s\n”,perms,$9,$1)
}
}
}’ >reset.perms.out

This script scans /usr and records its of its entities owners and permissions. This information is then stored in the file /tmp/reset.perms.out which was copied to with scp to each host that need /usr ownership restored. Next, the rest.perms.out was made “executable” chmod 700 reset.perms.out and executed. Nice!

You do know what to change if you need to use this script on a different file system, right? Yes, just replace the /usr above with the file system of your choice.

Posted in Real life AIX.

Tagged with .


adventures with npiv, xiv and brocade switches

Something really strange happened to me today ….. year or so ago, I built two “lpars” using the standard (for this site) approach – the toorvg disks as vscsi devices and SAN disks via two virtual FC adapters from each VIO server (four FC adapters in LPAR). Later each partition received two SAN disks and everything went dormant for almost six months. Last Friday, I asked for more storage, got it, created volume groups, logical volumes, file systems and at the end of the day, I rebooted both hosts and went home.

Monday, I was back in my cube to continue what I have left behind and here I have received my surprise – one of the two hosts had no SAN storage! Executing lspv showed only the two vSCSI disks defining the rootvg. Executing the command lsdev -Cc disk showed the “missing” disks but in the “Defined” state! I spend a few minutes trying to “resurrect” them doing the usual rmdev -dl hdisk# / cfgmgr to no avail. I gaved up and took a moment to think about it.

I know that if resources (virtual adapters, memory, CPU, and so forth) are added “dynamically” via HMC with the “Dynamic Logical Partitioning” option and the host is rebooted later the added resources will be “gone”, they will disappear – by the way this is something I cannot get used to (I think DLPAR should be permanent). I also know that to make DLPAR “modification” permanent I have to modify the partition PROFILE and if a reboot is required, I power the host and then I do ACTIVATE its profile on POWER ON!

So what went wrong this time? Is this AIX/VIOS/HMC error or mine or just simply some kind of a witchcraft? I have no idea. I called for help and IBM engineer informed me that they do not have any other customers reporting something like this …… This leaves me and the witchcraft – let spread the guilt equally :-)

But there is a lesson to be learnt from this experience and if you are new to VIOS, NPIV, AIX and everything in between this post could be your lesson too. Flip to the next page and you will find a valuable and interesting material (straight from IBM customer support) showing how to zone a XIV LUN (via a Brocade switch) to AIX partition with virtual FC adapters. Enjoy it!

Posted in AIX, Real life AIX.




© 2008-2013 www.wmduszyk.com - best viewed with your eyes.