Skip to content

rpmdb: unable to join the environment error

See bellow, what happened when I wanted to look at some rpm package:

# rpm -q --last kernel
rpmdb: unable to join the environment
error: db3 error(11) from dbenv->open: Resource temporarily unavailable
error: cannot open Packages database in /var/lib/rpm
rpmdb: unable to join the environment
error: db3 error(11) from dbenv->open: Resource temporarily unavailable

There was something now correct with rpm environment on this machine! After a quick search, I found an on-line article providing this solution:

# rm -f /var/lib/rpm/__db*
# echo .%__dbi_cdb create private cdb mpool mp_mmapsize=16Mb mp_size=1Mb. > /etc/rpm/macros
# rpm --rebuilddb

I tried it and it failed too.

I would probably found the answer if I checked the state of /var sooner but as they say – it is never too late. I decided to execute the previous command with more verbose output. Look what it did say:

# rpm -vv --rebuilddb
D: rebuilding database /var/lib/rpm into /var/lib/rpmrebuilddb.51978
D: creating directory /var/lib/rpmrebuilddb.51978
error: failed to create directory /var/lib/rpmrebuilddb.51978: 
No space left on device

What it is at the very end? Yes, /var was full!!! From this point it was all easy – resize /var and re-execute the rpm --rebuildb.

Posted in Linux.

SystemMirrors, building clusters and fighting with the CAA services

I am expecting something good to come my way soon. Why? I have suffered for a few days trying to get to work a very simple cluster that consistently refused all of my efforts insisting on not letting the first node to start the HA services and to join the cluster. This is not an extraordinary cluster, there is nothing special about just two nodes, one resource group and this is all.

But first, this is how the cluster was created.

# clmgr add cluster lawmsmpa1 \
            nodes=lawmsmpa1c1,lawmsmpa1c2 \
            type=NSC \
            heartbeat_type=unicast \

Note, that hdisk2 has identical PVID on both nodes and its reserve_policy is set to no_reserve on both nodes.

# lsattr -El hdisk2 | grep reserve_policy
reserve_policy no_reserve 

Next, I defined the application controller aka the “entity” identifying the highly available application start and stop scripts.

# clmgr add application_controller lawmsmp \
            startscript=/usr/es/sbin/cluster/scripts/start_cluster.ksh \

The service address was defined next.

# clmgr add service_ip \
            netmask= \

Finally, the cluster resource group is defined as follows.

# clmgr add resource_group lawmsmpRG \
            nodes=lawmsmpa1c1,lawmsmpa1c2 \
            startup=OHN fallover=FNPN fallback=NFB \
            service_label=lawmsmpa1 \
            applications=lawmsmp \
            volume_group=lawson_vg \

At this time, we should have the caavg_private volume group present on the hdisk2 on the both nodes… Well, this was not the case….. It was present on the second node (lawmsmpa1c2) but not on the first one which by the way is the primary node of this cluster (it was declared first while defining the cluster).
Rebooting both nodes definitely did not help. The situation did not change. Every attempt to start cluster services failed on the primary node. The error message was always the same –

lawmsmpa1c1: rc.cluster: Error: CAA cluster services are not active on this node.


RSCT cluster services (cthags) are not active on this node

And the clconfg service was not running.

# lssrc -g caa
Subsystem         Group            PID          Status
  clcomd           caa              13041908     active
  clconfd          caa              failed

Indeed, id does not work. The cluster services and its resource group could be started and brought on line on the second node but not on the first one…. Looking at the /etc/services on lawmsmpa1c1, I noticed that one HA entry was missing. The missing entry was

caa_cfg         6181/tcp

Well, this could explain at least some of the reasons behind this disaster but not all. The cluster sync following the update to the services file did not help. The repository was still mangled and the caavg_private volume group was still only present on the second node. It is time for some scrubbing!

Executed on both cluster nodes:

# export CAA_FORCE_ENABLED=1                                            
# rmcluster -f -r hdisk2
# lsattr -El cluster0                                                   
# rmdev -dl cluster0

On lawmsmpa1c1

# mkvg -f -y scrubvg hdisk2                                              
# varyoffvg  scrubvg

On lawmsmpa1c2

# importvg -f -y scrubvg hdisk2             
# varyoffvg scrubvg                                                      
# exportvg  scrubvg

On lawmsmpa1c1

# exportvg scrubvg

Yes, we validated it – the disk for the repository volume group is accessible from both nodes.
On both nodes

# shutdown -Fr

After reboot, the next command executed on both nodes showed that cluster definition present (do not be surprised, we only cleaned the repository disk!)

# odmget -q name=cluster0 CuAt
        name = "cluster0"
        attribute = "node_uuid"
        value = "749fb3f2-05f9-11e5-95f1-56bdbf443b02"
        type = "R"
        generic = "DU"
        rep = "s"
        nls_index = 3

Crossed my fingers, and asked to synchronize the cluster which if everything works as intended will create the caa_private volume group on both nodes – the way it was designed to be!

# clmgr sync cluster

It worked like a charm and the cluster started like nothing bad ever happened…… Thanks God it is Friday! 🙂

By the way, PowerHA 7.1.3 has the following requirement:

CAAnodename  =   hostname   =  COMMUNICATION_PATH

Because hostname and COMMUNICATION_PATH are both in the short-name format the contents of /etc/hosts also followed this formula.      lawmsmpa1c1      lawmsmpa1c2      lawmsmpa1   

Posted in HACMP, Real life AIX.

Tagged with , , , , , , , .

migrating from nslcd to sssd with AD/KRB

Recently, I have found same articles stating that Red Hat is depreciating nslcd and choosing sssd as its successor. Looking for more info about it, I recognized that sssd offers caching of credentials which means that even in the absence of authentication authority be it LDAP, AD , KRB and so forth users will still be able to log into a host (as long as they have done it before and their data still resides in the local host database). This alone makes migration to sssd worth the effort! Can you imagine storing all application administrative accounts like oracle, lawson, epicadm and many more in the AD instead locally? Isn’t it great?

If you look around this blog, you will find a few posts about implementing nslcd based on LDAP/KRB5/AD authentication/authorization with Red Hat. In this environment Active Directory also provides LDAP and Kerberos services and there is PAM component in it also. What follows next, is an illustration of the steps taken to migrate such environment to sssd. There are many other ways how sssd could be implemented!

I will backup the existing authentication/authorization environment just in the case I run out of time switching to sssd and have to restore the system back to its original shape.

# authconfig --savebackup=wmdbackup

Second, we will need to make sure the proper packages are installed:

# yum -y install sssd-ldap sssd-ad sssd-client \
                 sssd-common sssd-common-pac \
                 sssd-krb5 sssd-krb5-common 

Next, you will need to stop nslcd and nscd. We are replacing nslcd with sssd for authentication, and since sssd also does caching, we disable nscd to keep them from having conflicts:

# service nslcd stop
# service nscd stop
# chkconfig 2345 nslcd off
# chkconfig 2345 nscd off

Now, execute this little command to set sssd going.

# authconfig --enablesssd --enablesssdauth \
             --enablelocauthorize --enablemkhomedir \

Be aware that the last command changes nsswitch.conf and few other files in the /etc/pam.d directory. Look for sss.

It is time to edit the /etc/sssd/sssd.conf starting with setting the appropriate permissions.

# chmod -R 600 /etc/sssd

If these permissions are not exactly as show, the daemon will refuse to start.

In my case, I asked RedHat to provide me this file for my current version of RH which is 6.6. I copied it to the host and after starting sssd I was allowed me to log-in remotely with no further issues. But there was one caveat with this file as it is currently – it is it guarantees that any user log-in will result in AD wide search. This file identified the ldap_search_base which is the very top of my domain – all searches have to start from the top and follow every branch till the result is found…. This is not what I want. I want to limit the search (time) to the very specific branches of my AD tree, the ones storing UNIX users log-in and group membership info only. Initially, I included in it the following statements:

ldap_user_search_base=OU=Secured,OU=Corporate Users,DC=wmd,DC=edu
ldap_group_search_base=OU=Unix,OU=Security Groups,OU=Corporate Groups,dc=wmd,dc=edu

Normally, the named parameter is normally followed by an equals sign then the values that are to be assigned to that variable, right? Not in this case. With such entries inside the its configuration file sssd wouldn’t start… After a while of searching, I found one site ( that has it right! Instead of the equal you have to use the comma character. The whole sssd.conf is shown bellow.

config_file_version = 2
services = nss, pam
domains = WMD.EDU
debug_level = 4

ldap_id_use_start_tls = False
cache_credentials = True
id_provider = ldap
auth_provider = krb5
chpass_provider = krb5
ldap_schema = rfc2307bis
ldap_force_upper_case_realm = True
ldap_user_object_class = user
ldap_group_object_class = group
ldap_user_gecos = displayName
ldap_user_home_directory = unixHomeDirectory
ldap_uri = ldap://
ldap_search_base = dc=wmd,dc=edu
ldap_user_search_base,OU=Managed By Others,DC=wmd,DC=edu
ldap_user_search_base,OU=Shared,OU=Corporate Users,DC=wmd,DC=edu
ldap_user_search_base,OU=Secured,OU=Corporate Users,DC=wmd,DC=edu
ldap_user_search_base,OU=ServiceAccounts,OU=Corporate Servers,DC=wmd,DC=edu
ldap_group_search_base,ou=Unix,ou=Security Groups,ou=Corporate Groups,dc=wmd,dc=edu
ldap_default_bind_dn = CN=aixldapquery,OU=ServiceAccounts,OU=Corporate Servers,DC=wmd,DC=edu
ldap_default_authtok_type = password
ldap_default_authtok = bind_account_password_goes_here
ldap_tls_cacertdir = /etc/openldap/cacerts
ldap_referrals = false

krb5_realm = WMD.EDU
krb5_kpasswd =
krb5_server =
krb5_canonicalize = False

To make sure that sssd starts at reboot, we will do this:

# chkconig 2345 sssd on

Let’s start it and test it.

# service sssd start

# id duszyk
 uid=923810(duszyk) gid=216(operator) groups=216(operator)

# getent passwd duszyk
duszyk:*:923810:216:Duszyk, Waldemar M:/home/duszyk:/bin/bash

The rest that follows are a few general observations that I have developed/learned in my so far very short interaction with sssd.
If a user for some reason has a changed UID/GID number, then the SSSD cache must be cleared for that user before that user can log in again. The sssd-tools package contains the neccesary utilitites.

# yum install sssd-tools 
# sss_cache -u markd

There can be times when it is useful to seed users into the SSSD database rather than waiting for users to log-in and be added (kickstart comes to mind…?). New users can be added using the sss_useradd command.

# sss_useradd --UID 501 --home /home/jsmith --groups admin,dev-group jsmith

The cache purge utility, sss_cache, invalidates records in the SSSD cache for a user, a domain, or a group. Invalidating the current records forces the cache to retrieve the updated records from the identity provider, so changes can be realized quickly. Most commonly, this is used to clear the cache and update all records:

# sss_cache -E

The sss_cache command can also clear all cached entries for a particular domain:

# sss_cache -Ed WMD.EDU

If the administrator knows that a specific record (user, group, or netgroup) has been updated, then sss_cache can purge the records for that specific account and leave the rest of the cache intact:

# sss_cache -u markd

Be careful when you delete a cache file. This operation has significant effects: Deleting the cache file deletes all user data, both identification and cached credentials. Consequently, do not delete a cache file unless the system is online and can authenticate with a user name against the domain’s servers. Without a credentials cache, offline authentication will fail. If the configuration is changed to reference a different identity provider, SSSD will recognize users from both providers until the cached entries from the original provider time out.

While troubleshooting failed sssd, remove its cache before restarting sssd service.

# rm -f /var/lib/sss/db/*
# service sssd start

Posted in Real life AIX.

Tagged with , , , , .

re-sizing disks in VMware

It is not a good idea to partition disks with fdisk. It is better to allow LVM work with whole physical disks, let’s LVM manage the disks. If disks were not whole owned by LVM the following procedure would fail and a disk would not be able to “grow”.

# pvs
  PV         VG      Fmt  Attr PSize  PFree
  /dev/sda2  vg_sys  lvm2 a--  69.51g    6.91g
  /dev/sdb   vg_u01  lvm2 a--  30.00g   10.00g
  /dev/sdc   vg_sshi lvm2 a--  30.00g   10.00g
  /dev/sdd   soa_vg  lvm2 a--  16.00g 1020.00m

There is about 1GB left in the soa_vg but the application owner demands 15GB more. It is easy to address this request as long as the disks are not partitioned with fdisk which is my case. Using vmware console, we “grow” the appropriate disk by the additional 15GB. Next, the operating system must be instructed to re-scan its disks, which is achieved with these steps.

# cd /sys/class/scsi_disk
# for i in `ls`; do echo "1" > $i/device/rescan; done 

Now, let the kernel know to “re-size” the disk.

# pvresize -v /dev/sdd
   DEGRADED MODE. Incomplete RAID LVs will be processed.
    Using physical volume(s) on command line
    Archiving volume group "soa_vg" metadata (seqno 4).
    Resizing volume "/dev/sdd" to 62912512 sectors.
    No change to size of physical volume /dev/sdd.
    Updating physical volume "/dev/sdd"
    Creating volume group backup "/etc/lvm/backup/soa_vg" (seqno 5).
  Physical volume "/dev/sdd" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Were we successful?

# pvs
  PV         VG      Fmt  Attr PSize  PFree
  /dev/sda2  vg_sys  lvm2 a--  69.51g  6.91g
  /dev/sdb   vg_u01  lvm2 a--  30.00g 10.00g
  /dev/sdc   vg_sshi lvm2 a--  30.00g 10.00g
  /dev/sdd   soa_vg  lvm2 a--  30.00g 15.00g

Now, we can increase the size of the file systems in soa_vg

Posted in LINUX, Real life AIX.

re-synchronizing Satellite server

To re-synchronize (re-build) channel or channels database (metadata) you can execute one of the following commands.

To rebuild an individual channel metadata:

# satellite-sync -c rhel-x86_64-server-6 --force-all-packages

To rebuild all channels:

# satellite-sync --force-all-packages

These operations are time consuming! More channels to sync, more time it will take.

Posted in Linux.

upgrading RH Satellite server schema …..

After upgrading operating system (RedHat 6.6) on my Satellite server (version 5.7) the WEB GUI stopped functioning but lucky for me the same GUI “told” me that in order to get it to work the schema of Satellite server needs to be updated. The rest of this post documents this process. Yes, I am not using Oracle as the database.

Backup, Satellite database.

[root@rh_satellite1]# /usr/sbin/rhn-satellite stop
[root@rh_satellite1]# db-control backup /ColdBackup
[root@rh_satellite1]# /usr/sbin/rhn-satellite start

Stop the Satellite services except the database.

[root@rh_satellite1]# rhn-satellite stop --exclude postgresql
Shutting down spacewalk services...
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Stopping cobbler daemon:                                   [  OK  ]
Stopping rhn-search...
Stopped rhn-search.
Stopping MonitoringScout ...
[ OK ]
Stopping Monitoring ...
[ OK ]
Shutting down osa-dispatcher:                              [  OK  ]
Stopping httpd:                                            [  OK  ]
Stopping tomcat6:                                          [  OK  ]
Terminating jabberd processes ...
Stopping s2s:                                              [  OK  ]
Stopping c2s:                                              [  OK  ]
Stopping sm:                                               [  OK  ]
Stopping router:                                           [  OK  ]

Find out the version of the current schema.

[root@rh_satellite1]# rhn-schema-version

Find out what is the “new” schema version waiting to be installed.

[root@rh_satellite1]# rpm -q satellite-schema

Upgrade the schema.

[root@rh_satellite1]# spacewalk-schema-upgrade
Schema upgrade: [satellite-schema-] -> [satellite-schema-]
Searching for upgrade path: [satellite-schema-] -> [satellite-schema-]
Searching for upgrade path: [satellite-schema-] -> [satellite-schema-]
The path: [satellite-schema-] -> [satellite-schema-]
Planning to run spacewalk-sql with [/var/log/spacewalk/schema-upgrade/20150505-130353-script.sql]

Please make sure you have a valid backup of your database before continuing.

Hit Enter to continue or Ctrl+C to interrupt:
Executing spacewalk-sql, the log is in [/var/log/spacewalk/schema-upgrade/20150505-130353-to-satellite-schema-].
The database schema was upgraded to version [satellite-schema-].

Verify the version.

[root@rh_satellite1]# rpm -q satellite-schema
[root@rh_satellite1]# rhn-schema-version

Start the Satellite.

[root@rh_satellite1]# rhn-satellite start --exclude postgresql
Starting spacewalk services...
Initializing jabberd processes ...
Starting router:                                           [  OK  ]
Starting sm:                                               [  OK  ]
Starting c2s:                                              [  OK  ]
Starting s2s:                                              [  OK  ]
Starting tomcat6:                                          [  OK  ]
Waiting for tomcat to be ready ...
Starting httpd:                                            [  OK  ]
Starting osa-dispatcher:                                   [  OK  ]
Starting Monitoring ...
[ OK ]
Starting MonitoringScout ...
[ OK ]
Starting rhn-search...
Starting cobbler daemon:                                   [  OK  ]
Starting RHN Taskomatic...

For good measure.

[root@rh_satellite1]# service rhn-search cleanindex
Stopping rhn-search...
Stopped rhn-search.
Starting rhn-search...

Posted in Linux.

Tagged with .

multiple search bases in ldap and AD

time required to search for a data is function of its repository size, right? The same applies to LDAP and AD – they are both data repositories. In their case, one may speed up the search using multiple search bases. Case in point, in my Active Directory users are defined in one of the following places:

OU=Secured,OU=Corporate Users,DC=wmd,DC=edu
OU=Users,OU=Ping,ou=Managed By Others,DC=wmd,DC=edu
OU=Users,OU=Research,ou=Managed By Others,DC=wmd,DC=edu
OU=ServiceAccounts,OU=Corporate Servers,DC=wmd,DC=edu

Their UNIX group definitions are stored here:

OU=Unix,OU=Security Groups,OU=Corporate Groups,DC=wmd,DC=edu

In case of Redhat, Centos, Scientific host using nslcd one limits the scope of LDAP searches adjusting the contents of the /etc/nslcd.conf:

# Customize certain database lookups.
base passwd OU=Secured,OU=Corporate Users,DC=wmd,DC=edu
base passwd OU=Users,OU=Ping,ou=Managed By Others,DC=wmd,DC=edu
base passwd OU=Users,OU=Research,ou=Managed By Others,DC=wmd,DC=edu
base passwd OU=ServiceAccounts,OU=Corporate Servers,DC=wmd,DC=edu
base group OU=Unix,OU=Security Groups,OU=Corporate Groups,DC=wmd,DC=edu 

Any change in this file must be associated with a mandatory nslcd daemon refresh.

# service nslcd restart

IN the case of AIX, one goes straight to /etc/security/ldap/ldap.cfg.

userbasedn OU=Secured,OU=Corporate Users,DC=wmd,DC=edu
userbasedn OU=Users,OU=Ping,ou=Managed By Others,DC=wmd,DC=edu
userbasedn OU=Users,OU=Research,ou=Managed By Others,DC=wmd,DC=edu
userbasedn OU=ServiceAccounts,OU=Corporate Servers,DC=wmd,DC=edu
groupbasedn OU=Unix,OU=Security Groups,OU=Corporate Groups,DC=wmd,DC=edu 

The change it this file must be followed with the refresh of ldap daemon. One way of doing it is shown bellow.

# restart-secldapclntd

Posted in Real life AIX.

repository (vtopt0) issue while patching vios

It is my turn to patch vio servers….. 🙂

vioaprpu001:/home/padmin>updateios -commit
User can not perform updates with media repository(s) loaded.
Please unload media images.

What is going on here? What is loaded and where?

VTD             Media                              Size(mb)
vtopt0          RH6.2.iso                          unknown

Really? Someone played with RedHat? Apparently so, except now the repo seems not to be really OK……… The next command should unload a repo.

vioaprpu001:/home/padmin>unloadopt -vtd vtopt0
Device "vtopt0" is not in AVAILABLE state.

We need to get more info about this repository like its adapters, state and whatever else we can find.

vioaprpu001:/home/padmin>lsmap -all
------------- -------------------------------------------------
vhost13              U9117.MMB.1060FFP-V1-C19      0x00000000

VTD                   vtopt0
Status                Defined
LUN                   0x8100000000000000
Backing device        /var/vio/VMLibrary/RH6.2.iso
Mirrored              N/A

It should be Available not Defnined. Definitely, someone built it, and dismembered it but partially. Well, let’s get rid of it.

vioaprpu001:/home/padmin>rmvdev -vtd vtopt0
vtopt0 deleted

Now, going back to the patching I should be doing today.

vioaprpu001:/home/padmin>updateios -commit
All updates have been committed.

……. and whatever steps follows……

Posted in Real life AIX.

user creation/password encryption in RedHat

To create a new user account simultaneously setting its password is a two step procedure.

a. encrypt the password (in this case the password is abc123)

# openssl passwd
Verifying - Password:

b. call the useradd command to do the rest

# useradd -c 'testing gecos' -g oinstall \
-m -d /home/testing -p 2n50KL0pCn096 newuser

In the last case, the newuser login account will be associated with the group/groups called oinstall.

# groups newuser
newuser : oinstall

But if we create it replacing -g with -G:

# useradd -c 'testing gecos' -G oinstall \
-m -d /home/testing -p 2n50KL0pCn096 newuser

The new account primary group will be created automatically and called newuser and its group membership will be the list showing the following two grous newuser and oinstall.

# groups newuser
newuser: newuser, oinstall

Posted in Real life AIX.

list packages installed (and not) from a specific repository

Sometimes you wonder what packages come from what repository …? Here is the answer:

# yumdb search from_repo RepoName

For example, to learn what packages come from the epel repository you execute:

# yumdb search from_repo epel | grep -v '='
Loaded plugins: product-id, rhnplugin
This system is receiving updates from RHN Classic or RHN Satellite.

How to list packages in a repository? First, generate the listing of repositories the host subscribes to with yum repolist.

# yum repolist
Loaded plugins: product-id, rhnplugin, security, subscription-manager
This system is receiving updates from RHN Classic or RHN Satellite.
repo id            repo name                              status
clone-rhel-x86_64-server-6            .....................
clone-rhel-x86_64-server-optional-6   .....................
clone-rhn-tools-rhel-x86_64-server-6 ......................
epel                                  .....................
vmware-tools                          .....................

To list what packages can be installed from the repository called vmware-tools:

# yum --disablerepo "*" --enablerepo "vmware-tools" list available
Loaded plugins: product-id, rhnplugin, security, subscription-manager
This system is receiving updates from RHN Classic or RHN Satellite.
Available Packages
vmware-tools-core.x86_64     8.6.5-2                    vmware-tools
vmware-tools-esx.x86_64      8.6.5-2                    vmware-tools

Posted in Real life AIX.

Tagged with , , , .

Copyright © 2016 - 2017 Waldemar Mark Duszyk. All Rights Reserved. Created by Blog Copyright.