AIX Migrating from AIX 53 to AIX 61 using nimadm
id : 7r4xaxfx7f
category : computer
blog : unix
created : 04/24/12 - 14:19:46
rsh configuration
  • enable rsh between nim master and client lpar :
lpar# chsubserver -a -v shell -p tcp6 -r inetd
lpar# refresh -s inetd
lpar# vi /root/.rhosts
nim_master root
lpar# chmod 600 .rhosts

  • test if from nim_master
nim_master# rsh lpar hostname
lpar

nimadm volume group on nim_master
  • on nim_master create a nimadmvg. This VG has to be larger or equal than LPAR rootvg.
  • nimadmvg has to be empty before migration is started, and will be empty after migration.
nim_master# mksvg -S -s 128M -y nimadmvg hdisk17 hdisk37 hdisk38 hdisk39
nim_master# lsvg -l nimadmvg
nimadmvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
nim_master# lsvg -p nimadmvg
nimadmvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk38           active            255         255         51..51..51..51..51
hdisk37           active            255         255         51..51..51..51..51
hdisk36           active            255         255         51..51..51..51..51
hdisk17           active            269         268         54..53..53..54..54

fileset on nim master and on destination spot
  • fileset bos.alt_disk_install.rte has to be installer on nim_master.
  • this fileset must have the exactly same level as destination spot
    • on nim_master :
nim_master# lslpp -l | grep -i bos.alt_disk_install.rte
  bos.alt_disk_install.rte   6.1.6.1  COMMITTED  Alternate Disk Installation
  bos.alt_disk_install.rte   6.1.6.1  COMMITTED  Alternate Disk Installation

    • on distination spot (in our case we updrage from 5.3TL12 to 6.1TL6) :
nim_master# infnim1c1@root /root #> nim -o showres 'spot_aix610TL06' | grep bos.alt_disk_install.rte
  bos.alt_disk_install.rte   6.1.6.1    A     F    Alternate Disk Installation

  • note the bos.alt_disk_install.rte are in same version on nim_master and on destination spot.
free pv on client lpar
  • on client lpar verify you have enough free pv to cater its own rootvg
lpar#  lspv | grep -w rootvg | awk '{print $1}'
hdisk3
hdisk4
hdisk5

running nimadm
  • run nimadm command from nim master with theses arguments :
    • -j : vg used on nim_master (use lsvg).
    • -c : lpar hostname to be updated.
    • -s : spot name (use lsnim -t spot).
    • -l : lpp_source name (use lsnim -t lpp_source).
    • -d : disks name used for alinst_rootvg on destination lpar.
    • -Y : agrees to the software license agreements
nim_master# nimadm -j nimadmvg -c lpar-s spot_aix610TL06 -l lpp_aix610TL06 -d "hdisk3 hdisk4 hdisk5" -Y

  • Log file is located on '''/var/adm/ras/alt_mig/lpar_alt_mig.log

phase 1

  • nim_master run alt_disk install on nim client, alternate root volume group (altinst_rootvg) is created during this phase.
  • Be carefull in name size must be lower or equal to 11 characters.
+-----------------------------------------------------------------------------+
Executing nimadm phase 1.
+-----------------------------------------------------------------------------+
Cloning altinst_rootvg on client, Phase 1.
Client alt_disk_install command: alt_disk_copy -j -M 6.1 -P1 -d "hdisk3 hdisk4 hdisk5"
Calling mkszfile to create new /image.data file.
Checking disk sizes.
Creating cloned rootvg volume group and associated logical volumes.
Creating logical volume alt_hd5.
Creating logical volume alt_paging00.
Creating logical volume alt_tivolilv.
Creating logical volume alt_hd8.
Creating logical volume alt_hd4.
Creating logical volume alt_hd2.
Creating logical volume alt_hd9var.
Creating logical volume alt_hd3.
Creating logical volume alt_hd1.
Creating logical volume alt_hd10opt.
Creating logical volume alt_devdumphd3.
Creating logical volume alt_cdc.
Creating logical volume alt_ccm63lv.
Creating logical volume alt_adminlv.
Creating logical volume alt_sinagioslv.
Creating logical volume alt_sysloadlv.
Creating logical volume alt_samlv.
Creating logical volume alt_tempolv.
Creating logical volume alt_omnilv.
Creating logical volume alt_db2v9lv.
Creating logical volume alt_orascriptlv.
Creating logical volume alt_ora1020lv.
Creating logical volume alt_orabdumplv.
Creating logical volume alt_oratarlv.
Creating logical volume alt_oranetlv.
Creating logical volume alt_oraadumplv.
Creating logical volume alt_oracdumplv.
Creating logical volume alt_oraudumplv.
Creating logical volume alt_tinalv.
Creating logical volume alt_hd11admin.
Creating /alt_inst/ file system.
Creating /alt_inst/admin file system.
Creating /alt_inst/home file system.
Creating /alt_inst/opt file system.
Creating /alt_inst/tempo file system.
Creating /alt_inst/tmp file system.
Creating /alt_inst/tools/list/admin file system.
Creating /alt_inst/tools/list/ccm63 file system.
Creating /alt_inst/tools/list/cdc file system.
Creating /alt_inst/tools/list/db2/V9.5FP8 file system.
Creating /alt_inst/tools/list/omnivision/collectnode file system.
Creating /alt_inst/tools/list/oracle/adump file system.
Creating /alt_inst/tools/list/oracle/bdump file system.
Creating /alt_inst/tools/list/oracle/cdump file system.
Creating /alt_inst/tools/list/oracle/network file system.
Creating /alt_inst/tools/list/oracle/product/10.2.0.4 file system.
Creating /alt_inst/tools/list/oracle/scripts file system.
Creating /alt_inst/tools/list/oracle/tar file system.
Creating /alt_inst/tools/list/oracle/udump file system.
Creating /alt_inst/tools/list/sam file system.
Creating /alt_inst/tools/list/sinagios file system.
Creating /alt_inst/tools/list/sysload file system.
Creating /alt_inst/tools/list/tina file system.
Creating /alt_inst/tools/list/tivoli file system.
Creating /alt_inst/usr file system.
Creating /alt_inst/var file system.
Generating a list of files
for backup and restore into the alternate file system...
Phase 1 complete.

phase 2

  • nim_master creates the cache file systems in the nimadmvg volume group. Some initial checks for the required migration disk space are performed.
+-----------------------------------------------------------------------------+
Executing nimadm phase 2.
+-----------------------------------------------------------------------------+
Creating nimadm cache file systems on volume group nimadmvg.
Checking for initial required migration space.
Creating cache file system /infdbdc1_alt/alt_inst
Creating cache file system /infdbdc1_alt/alt_inst/admin
Creating cache file system /infdbdc1_alt/alt_inst/home
Creating cache file system /infdbdc1_alt/alt_inst/opt
Creating cache file system /infdbdc1_alt/alt_inst/tempo
Creating cache file system /infdbdc1_alt/alt_inst/tmp
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/admin
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/ccm63
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/cdc
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/db2/V9.5FP8
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/omnivision/collectnode
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/adump
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/bdump
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/cdump
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/network
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/product/10.2.0.4
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/scripts
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/tar
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/udump
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/sam
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/sinagios
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/sysload
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/tina
Creating cache file system /infdbdc1_alt/alt_inst/tools/list/tivoli
Creating cache file system /infdbdc1_alt/alt_inst/usr
Creating cache file system /infdbdc1_alt/alt_inst/var

phase 3

  • nim_master creates the cache file systems in the nimadmvg volume group.
+-----------------------------------------------------------------------------+
Executing nimadm phase 3.
+-----------------------------------------------------------------------------+
Syncing client data to cache ...

phase 4

  • if a pre-migration script resource has been specified, it is executed at this time.
+-----------------------------------------------------------------------------+
Executing nimadm phase 4.
+-----------------------------------------------------------------------------+
nimadm: There is no user customization script specified for this phase.

phase 5

  • system configuration files are saved. Initial migration space is calculated and appropriate file system expansions are made. The bos image is restored and the device database is merged (similar to a conventional migration). All of the migration merge methods are executed, and some miscellaneous processing takes place.
+-----------------------------------------------------------------------------+
Executing nimadm phase 5.
+-----------------------------------------------------------------------------+
Saving system configuration files.
Checking for initial required migration space.
Setting up for base operating system restore.
/infdbdc1_alt/alt_inst
Restoring base operating system.
Merging system configuration files.
Running migration merge method: ODM_merge Config_Rules.
Running migration merge method: ODM_merge SRCextmeth.
Running migration merge method: ODM_merge SRCsubsys.
Running migration merge method: ODM_merge SWservAt.
Running migration merge method: ODM_merge pse.conf.
Running migration merge method: ODM_merge vfs.
Running migration merge method: ODM_merge xtiso.conf.
Running migration merge method: ODM_merge PdAtXtd.
Running migration merge method: ODM_merge PdDv.
Running migration merge method: convert_errnotify.
Running migration merge method: passwd_mig.
Running migration merge method: login_mig.
Running migration merge method: user_mrg.
Running migration merge method: secur_mig.
Running migration merge method: RoleMerge.
Running migration merge method: methods_mig.
Running migration merge method: mkusr_mig.
Running migration merge method: group_mig.
Running migration merge method: ldapcfg_mig.
Running migration merge method: ldapmap_mig.
Running migration merge method: convert_errlog.
Running migration merge method: ODM_merge GAI.
Running migration merge method: ODM_merge PdAt.
Running migration merge method: merge_smit_db.
Running migration merge method: ODM_merge fix.
Running migration merge method: merge_swvpds.
Running migration merge method: SysckMerge.

phase 6

  • all system filesets are migrated using installp. Any required RPM images are also installed during this phase.
+-----------------------------------------------------------------------------+
Executing nimadm phase 6.
+-----------------------------------------------------------------------------+
Installing and migrating software.
Updating install utilities.
[..]
+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+

installp: APPLYING software for:
        bos.rte.mlslib 6.1.6.0


. . . . . << Copyright notice for bos >> . . . . . . .
 Licensed Materials - Property of IBM

 5765G6200
   Copyright International Business Machines Corp. 1985, 2010.
   Copyright AT&T 1984, 1985, 1986, 1987, 1988, 1989.
   Copyright Regents of the University of California 1980, 1982, 1983, 1985, 1986, 1987, 1988, 1989.
   Copyright BULL 1993, 2010.
   Copyright Digi International Inc. 1988-1993.
   Copyright Interactive Systems Corporation 1985, 1991.
   Copyright ISQUARE, Inc. 1990.
   Copyright Innovative Security Systems, Inc. 2001-2006.
   Copyright Mentat Inc. 1990, 1991.
   Copyright Open Software Foundation, Inc. 1989, 1994.
   Copyright Sun Microsystems, Inc. 1984, 1985, 1986, 1987, 1988, 1991.

 All rights reserved.
 US Government Users Restricted Rights - Use, duplication or disclosure
 restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for bos >>. . . .

Filesets processed:  1 of 2  (Total time:  3 secs).

installp: APPLYING software for:
        X11.compat.lib.X11R6_motif 6.1.6.0


. . . . . << Copyright notice for X11.compat >> . . . . . . .
 Licensed Materials - Property of IBM

 5765G6200
   Copyright International Business Machines Corp. 2007, 2010.
   Copyright Massachusetts Institute of Technology, 1985, 1994.

 All rights reserved.
 US Government Users Restricted Rights - Use, duplication or disclosure
 restricted by GSA ADP Schedule Contract with IBM Corp.
. . . . . << End of copyright notice for X11.compat >>. . . .

Finished processing all filesets.  (Total time:  8 secs).
[..]

phase 7

  • if a post-migration script resource has been specified, it is executed at this time.
+-----------------------------------------------------------------------------+
Executing nimadm phase 7.
+-----------------------------------------------------------------------------+
nimadm: There is no user customization script specified for this phase.

phase 8

  • the bosboot command is run to create a client boot image, which is written to the client's alternate boot logical volume (alt_hd5).
+-----------------------------------------------------------------------------+
Executing nimadm phase 8.
+-----------------------------------------------------------------------------+
Creating client boot image.
bosboot: Boot image is 45273 512 byte blocks.
Writing boot image to client's alternate boot disk hdisk3.

phase 9

  • all the migrated data is now copied from the NIM master's local cache file and synced to the client's alternate rootvg via rsh.
+-----------------------------------------------------------------------------+
Executing nimadm phase 9.
+-----------------------------------------------------------------------------+
Adjusting client file system sizes ...
Adjusting size for /
Adjusting size for /admin
Adjusting size for /home
Adjusting size for /opt
Adjusting size for /tempo
Adjusting size for /tmp
Adjusting size for /tools/list/admin
Adjusting size for /tools/list/ccm63
Adjusting size for /tools/list/cdc
Adjusting size for /tools/list/db2/V9.5FP8
Adjusting size for /tools/list/omnivision/collectnode
Adjusting size for /tools/list/oracle/adump
Adjusting size for /tools/list/oracle/bdump
Adjusting size for /tools/list/oracle/cdump
Adjusting size for /tools/list/oracle/network
Adjusting size for /tools/list/oracle/product/10.2.0.4
Adjusting size for /tools/list/oracle/scripts
Adjusting size for /tools/list/oracle/tar
Adjusting size for /tools/list/oracle/udump
Adjusting size for /tools/list/sam
Adjusting size for /tools/list/sinagios
Adjusting size for /tools/list/sysload
Adjusting size for /tools/list/tina
Adjusting size for /tools/list/tivoli
Adjusting size for /usr
Adjusting size for /var
Syncing cache data to client ...

phase 10

  • the NIM master cleans up and removes the local cache file systems.
+-----------------------------------------------------------------------------+
Executing nimadm phase 10.
+-----------------------------------------------------------------------------+
Unmounting client mounts on the NIM master.
forced unmount of /infdbdc1_alt/alt_inst/var
forced unmount of /infdbdc1_alt/alt_inst/usr
forced unmount of /infdbdc1_alt/alt_inst/tools/list/tivoli
forced unmount of /infdbdc1_alt/alt_inst/tools/list/tina
forced unmount of /infdbdc1_alt/alt_inst/tools/list/sysload
forced unmount of /infdbdc1_alt/alt_inst/tools/list/sinagios
forced unmount of /infdbdc1_alt/alt_inst/tools/list/sam
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/udump
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/tar
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/scripts
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/product/10.2.0.4
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/network
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/cdump
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/bdump
forced unmount of /infdbdc1_alt/alt_inst/tools/list/oracle/adump
forced unmount of /infdbdc1_alt/alt_inst/tools/list/omnivision/collectnode
forced unmount of /infdbdc1_alt/alt_inst/tools/list/db2/V9.5FP8
forced unmount of /infdbdc1_alt/alt_inst/tools/list/cdc
forced unmount of /infdbdc1_alt/alt_inst/tools/list/ccm63
forced unmount of /infdbdc1_alt/alt_inst/tools/list/admin
forced unmount of /infdbdc1_alt/alt_inst/tmp
forced unmount of /infdbdc1_alt/alt_inst/tempo
forced unmount of /infdbdc1_alt/alt_inst/opt
forced unmount of /infdbdc1_alt/alt_inst/home
forced unmount of /infdbdc1_alt/alt_inst/admin
forced unmount of /infdbdc1_alt/alt_inst
Removing nimadm cache file systems.
Removing cache file system /infdbdc1_alt/alt_inst
Removing cache file system /infdbdc1_alt/alt_inst/admin
Removing cache file system /infdbdc1_alt/alt_inst/home
Removing cache file system /infdbdc1_alt/alt_inst/opt
Removing cache file system /infdbdc1_alt/alt_inst/tempo
Removing cache file system /infdbdc1_alt/alt_inst/tmp
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/admin
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/ccm63
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/cdc
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/db2/V9.5FP8
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/omnivision/collectnode
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/adump
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/bdump
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/cdump
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/network
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/product/10.2.0.4
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/scripts
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/tar
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/oracle/udump
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/sam
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/sinagios
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/sysload
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/tina
Removing cache file system /infdbdc1_alt/alt_inst/tools/list/tivoli
Removing cache file system /infdbdc1_alt/alt_inst/usr
Removing cache file system /infdbdc1_alt/alt_inst/var

phase 11

  • the alt_disk_install command is called again to make the final adjustments and put altinst_rootvg to sleep. The bootlist is set to the target disk.
+-----------------------------------------------------------------------------+
Executing nimadm phase 11.
+-----------------------------------------------------------------------------+
Cloning altinst_rootvg on client, Phase 3.
Client alt_disk_install command: alt_disk_copy -j -M 6.1 -P3 -d "hdisk3 hdisk4 hdisk5"
## Phase 3 ###################
Verifying altinst_rootvg...
Modifying ODM on cloned disk.
forced unmount of /alt_inst/var
forced unmount of /alt_inst/usr
forced unmount of /alt_inst/tools/list/tivoli
forced unmount of /alt_inst/tools/list/tina
forced unmount of /alt_inst/tools/list/sysload
forced unmount of /alt_inst/tools/list/sinagios
forced unmount of /alt_inst/tools/list/sam
forced unmount of /alt_inst/tools/list/oracle/udump
forced unmount of /alt_inst/tools/list/oracle/tar
forced unmount of /alt_inst/tools/list/oracle/scripts
forced unmount of /alt_inst/tools/list/oracle/product/10.2.0.4
forced unmount of /alt_inst/tools/list/oracle/network
forced unmount of /alt_inst/tools/list/oracle/cdump
forced unmount of /alt_inst/tools/list/oracle/bdump
forced unmount of /alt_inst/tools/list/oracle/adump
forced unmount of /alt_inst/tools/list/omnivision/collectnode
forced unmount of /alt_inst/tools/list/db2/V9.5FP8
forced unmount of /alt_inst/tools/list/cdc
forced unmount of /alt_inst/tools/list/ccm63
forced unmount of /alt_inst/tools/list/admin
forced unmount of /alt_inst/tmp
forced unmount of /alt_inst/tempo
forced unmount of /alt_inst/opt
forced unmount of /alt_inst/home
forced unmount of /alt_inst/admin
forced unmount of /alt_inst
Changing logical volume names in volume group descriptor area.
Fixing LV control blocks...
Fixing file system superblocks...
Bootlist is set to the boot disk: hdisk3 blv=hd5

phase 12

  • cleanup is executed to end the migration.
+-----------------------------------------------------------------------------+
Executing nimadm phase 12.
+-----------------------------------------------------------------------------+
Cleaning up alt_disk_migration on the NIM master.
Cleaning up alt_disk_migration on client infdbdc1

ending
  • on client lpar, check boolist is set on alt_inst_rootvg :
lpar# lspv
hdisk0          00cf55c68810289c                    rootvg
hdisk1          00cf55c6a65524a1                    datavg      	active
hdisk2          00cf55c6a655251d                    datavg		active
hdisk3          00cf55c6a6552594                    altinst_rootvg      active
hdisk4          00cf55c6a655264d                    altinst_rootvg      active
hdisk5          00cf55c6a65526f0                    altinst_rootvg      active
lpar# bootlist -o -m normal
hdisk3 blv=hd5 pathid=0
hdisk3 blv=hd5 pathid=1
hdisk3 blv=hd5 pathid=2
hdisk3 blv=hd5 pathid=3

  • reboot and check oslevel :
lpar# shutdown -Fr
lpar# oslevel -s
6100-06-06-1140



AIX PowerHA Node DOWN Do not plug both adapters on same network switch
id : dawg2dfqj3
category : computer
blog : unix
created : 04/18/12 - 18:13:43

Problem
One of our cluster node was down. This shutdown was not a human action, or human error.
Analysis
  • here are errpt entries showing the problem (C69F5C9B, 9DEC29E1, EC0BCCD4). Three of them can explain this problem.
# errpt | more
F3931284   0412224712 I H ent1           ETHERNET NETWORK RECOVERY MODE
C69F5C9B   0412224612 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED
6D19271E   0412224612 I O topsvcs        Topology Services daemon stopped
AA8AB241   0412224612 T O OPERATOR       OPERATOR NOTIFICATION
BC3BE5A3   0412224612 P S SRC            SOFTWARE PROGRAM ERROR
BC3BE5A3   0412224612 P S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0412224612 I S SRC            SOFTWARE PROGRAM ERROR
12081DC6   0412224612 P S haemd          SOFTWARE PROGRAM ERROR
9DEC29E1   0412224612 P O grpsvcs        Group Services daemon exit to merge doma
F3931284   0412224612 I H ent0           ETHERNET NETWORK RECOVERY MODE
173C787F   0412224012 I S topsvcs        Possible malfunction on local adapter
173C787F   0412224012 I S topsvcs        Possible malfunction on local adapter
EC0BCCD4   0412223912 T H ent0           ETHERNET DOWN
EC0BCCD4   0412223912 T H ent1           ETHERNET DOWN

    • first two entries are ETHERNET DOWN (EC0BCCD4).
    • second entrie is Group Services daemon exit to merge domain (9DEC29E1).
    • last entrie is SOFTWARE PROGRAM ABNORMALLY TERMINATED (C69F5C9B).
  • last entrie is in fact a cluster manager process (clstrmgr) CORE_DUMP.
# errpt -a -j C69F5C9B | more
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     C69F5C9B

Date/Time:       Thu Apr 12 22:46:59 2012
Sequence Number: 203279
Machine Id:      00C8A6104C00
Node Id:         proas6c2
Class:           S
Type:            PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
              1962144
FILE SYSTEM SERIAL NUMBER
           4
INODE NUMBER
           0         396
CORE FILE NAME
/var/hacmp/core
PROGRAM NAME
clstrmgr
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER

PROCESSOR ID
  hw_fru_id: N/A
  hw_cpu_id: N/A

ADDITIONAL INFORMATION
pthread_k 88
??
_p_raise 8C
raise 30
abort B8
die__Fi 5A8
announcem 330
kill_grp_ 158
ha_gs_dis 2E24
ha_gs_dis 50
DoMainLoo 768
main 804
__start 9C

Symptom Data
REPORTABLE
1
INTERNAL ERROR
1
SYMPTOM CODE
PIDS/5765E6200 LVLS/520 PCSS/SPI2 FLDS/clstrmgr SIG/6 FLDS/die__Fi VALU/5a8

  • Take a look on PROGRAM NAME, clstrmgr is the faulty process.
  • You can have a look in clstrmgr log file (/var/hacmp/log/clstrmgr.debug) :
# cd /var/hacmp/log
# tail -5 clstrmgr.debug.1
Thu Apr 12 22:46:57 announcementCb: GsToken 2, AdapterToken 3, rm_GsToken 1
Thu Apr 12 22:46:57 announcementCb: GRPSVCS announcment code=512; exiting
Thu Apr 12 22:46:57 CHECK FOR FAILURE OF RSCT SUBSYSTEMS (topsvcs or grpsvcs)
Thu Apr 12 22:46:57 die: clstrmgr on node 2 is exiting with code 4

  • Last two entries are most interessant :
    • why clstrmgr has CORE_DUMP : it's because an rsct subsystem was down, thus grpsvcs was down.
    • clstrmgr is exiting.
  • If you can have deeper look on errpt, check the entrie GS_DOM_MERGE_ER
# errpt -a -j 9DEC29E1 | more
---------------------------------------------------------------------------
LABEL:          GS_DOM_MERGE_ER
IDENTIFIER:     9DEC29E1

Date/Time:       Fri Apr 13 00:06:18 2012
Sequence Number: 15236
Machine Id:      00C8A6104C00
Node Id:         proas8c2
Class:           O
Type:            PERM
Resource Name:   grpsvcs

Description
Group Services daemon exit to merge domains

Probable Causes
Network between two node groups has repaired

Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.

        Recommended Actions
        Check the network connection.
Check the Topology Services.
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.49,4461
ERROR ID
6Vb0vR0O5pVD/wap09...4....................
REFERENCE CODE

DIAGNOSTIC EXPLANATION
NS::Ack(): The master requests to dissolve my domain because of the merge with other domain 1.15

  • Network communication has been blocked resulting in :
    • an exit of grpsvc,
    • then a clstrmgr CORE_DUMP,
    • then a node halt.
If the cluster manager exits abnormally, a machine will typically halt. The
majority of the time, some type of an exit message will be logged at the end of
this file. The message can give you or your support representatives an idea
as to the cause of the failure.
The AIX® resource controller subsystem monitors the cluster manager daemon process. If the controller detects that the Cluster Manager daemon has exited abnormally (without being shut down using the clstop command), it executes the /usr/es/sbin/cluster/utilities/clexit.rc script to halt the system. This prevents unpredictable behavior from corrupting the data on the shared disks.

  • ok, so what's in /usr/es/sbin/cluster/utilities/clexit.rc
[..]
# Do a sync, then a short sleep to attempt to flush the messages
# we just logged to disk, and allow background processes to complete.
# Because the secondary node will start taking over the resources
# very quickly, we can't wait indefinitely.  This node must be halted
# to avoid conflict over the resources.
sync &
sleep 2
# halt the node
[[ "$PLATFORM" = "__AIX__" ]] && halt -q
[..]

  • So is this halt -q normal ?
Solution
Yes, this halt is a normal PowerHA behaviour. Problem was not a problem.
Anyway, if you really want to change this behaviour (not recommanded), you can do it editing /etc/cluster/hacmp.term
Editing the /etc/cluster/hacmp.term file to change the default action after
an abnormal exit. The clexit.rc script checks for the presence of this file
and, if the file is executable, the script calls it instead of halting the
system automatically.

After checking with network teams both adapters were plugged on same network switch. Having network adapters plugged on differents switchs will avoid this problem.
Do not forget to run test cases before going in production with PowerHA.