Scenario:
Recently in our 2 Node RAC environment which was working fine. Linux team accidently change the ownership of grid and oracle directory in one of the node. Existing connections are able to connect but new connection not getting connected.
We stop the issue node using crsctl stop crs command after restarting. All crs services working fine getting error when starting instance on issue node.
After changing permission back still getting error. We done troubleshooting and reboot issue server several times but didn't get succeeded comparing directories we provided permission manually and started database instance.
Operating System : Suse 12 SP5
Version : 12.1.0.2.0 Enterprise edition
Changed original name and services
Node name: Srv1.Srv2
Instance name : Prod1, Prod2
Error :
Alert log error:
ORA-00449: background process 'RBAL' unexpectedly terminated with error 448
ORA-27121: unable to determine size of shared memory segment
Linux-x86_64 Error: 13: Permission denied
Srvctl command error:
PRCR-1013 : Failed to start resource ora.prod.db
PRCR-1064 : Failed to start resource ora.prod.db on node srv1
Troubleshooting scenario and solution:
grid@srv1:~> srvctl start instance -i PROD2 -d prod
Invalid instance name(s): PROD2
grid@srv1:~> srvctl status database -d prod
Instance PROD1 is running on node srv2
Instance PROD2 is not running on node srv1
grid@srv1:~> srvctl start instance -i PROD2 -d prod
PRCR-1013 : Failed to start resource ora.prod.db
PRCR-1064 : Failed to start resource ora.prod.db on node srv1
CRS-5017: The resource action "ora.prod.db start" encountered the following error:
ORA-00449: background process 'RBAL' unexpectedly terminated with error 448
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/srv1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.prod.db' on 'srv1' failed
CRS-2674: Start of 'ora.prod.db' on 'srv1' failed
oracle@srv1:/home/grid> cd
oracle@srv1:~> . oraenv
ORACLE_SID = [+ASM2] ? PROD2
The Oracle base remains unchanged with value /u01/app/oracle
oracle@srv1:~> sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Thu May 27 16:29:22 2021
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00449: background process 'RBAL' unexpectedly terminated with error 448
- Check cluster services
grid@srv1:~> crsctl check cluster -all
**************************************************************
srv1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv1:/u01/app/12.1.0/grid/crs/install # crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.DATA.dg
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.FRA1.dg
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.NEWFRA2.dg
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.asm
ONLINE ONLINE srv1 Started,STABLE
ONLINE ONLINE srv2 Started,STABLE
ora.net1.network
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
ora.ons
ONLINE ONLINE srv1 STABLE
ONLINE ONLINE srv2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE srv1 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE srv2 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE srv2 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE srv2 169.254.77.49 192.16
8.0.64,STABLE
ora.cvu
1 ONLINE ONLINE srv2 STABLE
ora.mgmtdb
1 ONLINE ONLINE srv2 Open,STABLE
ora.oc4j
1 ONLINE ONLINE srv2 STABLE
ora.scan1.vip
1 ONLINE ONLINE srv1 STABLE
ora.scan2.vip
1 ONLINE ONLINE srv2 STABLE
ora.scan3.vip
1 ONLINE ONLINE srv2 STABLE
ora.srv1.vip
1 ONLINE ONLINE srv1 STABLE
ora.srv2.vip
1 ONLINE ONLINE srv2 STABLE
ora.prod.db
1 ONLINE ONLINE srv2 Open,STABLE
2 ONLINE OFFLINE Instance Shutdown,ST
ABLE
ora.prod.prodb.svc
1 ONLINE ONLINE srv2 STABLE
ora.prod.prodb_preconnect.svc
1 ONLINE OFFLINE STABLE
--------------------------------------------------------------------------------
- SRDC - How to Collect Standard Information for Clusterware Startup Issues ( Doc ID 2766730.1 )
- Download the attached script tar startUpCheck_Linux.tar.gz (This script will work only on Linux platform)
$ cd /tmp/
$ tar -zxvf startUpCheck_Linux.tar.gz
$ chmod +x startUpCheck_Linux.{sh,py}
You would find 2 files startUpCheck_Linux.sh and startUpCheck_Linux.py. Execute the script as "root" user as follows
# ./startUpCheck_Linux.sh -n <node list> -i <private/asm interface list>
- Getting cluster_interconnect information name
srv1:~ # /u01/app/12.1.0/grid/bin/gpnptool get 2>/dev/null | xmllint --format - | egrep 'cluster_interconnect' | awk '{print $4}'| awk -F '"' '{print $2}'
p8p2
- Checking startup check sequences it looks fine.
- Suse 12SP5 having issue on oracle RAC suse 15 it work well
srv1:/tmp # ./startUpCheck_Linux.sh -n srv1 -i p8p2
Logfile location : /tmp/srv1_2021-5-27_14-38-26.log
Verifying if script is executed by root user ...PASSED
Verifying runlevel ...PASSED
Verifying HA stack status ...Traceback (most recent call last):
File "startUpCheck_Linux.py", line 2759, in <module>
HA_STACK_UP = validateHAStack()
File "startUpCheck_Linux.py", line 1062, in validateHAStack
BOOTSTRAP_LOC = BOOTSTRAP_DICT[OS_VERSION]
KeyError: '12-SP5'
srv1:/tmp # ./startUpCheck_Linux.sh -n srv1,srv2 -i p8p2
Logfile location : /tmp/srv1_2021-5-27_14-39-58.log
Verifying if script is executed by root user ...PASSED
Verifying runlevel ...PASSED
Verifying HA stack status ...Traceback (most recent call last):
File "startUpCheck_Linux.py", line 2759, in <module>
HA_STACK_UP = validateHAStack()
File "startUpCheck_Linux.py", line 1062, in validateHAStack
BOOTSTRAP_LOC = BOOTSTRAP_DICT[OS_VERSION]
KeyError: '12-SP5'
srv1:/tmp #
- We run rootcrs.sh initially at our end but getting same error . Run rootcrs.sh -prepatch and rootcrs.sh -postpatch but instance didn't start.
Location : $GI_HOME/crs/install/
rootcrs.sh -prepatch
rootcrs.sh -postpatch
srv1:/u01/app/12.1.0/grid/crs/install # ./rootcrs.sh -prepatch
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2919480821].
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'srv1'
CRS-2673: Attempting to stop 'ora.crsd' on 'srv1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'srv1'
CRS-2673: Attempting to stop 'ora.CRS.dg' on 'srv1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'srv1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'srv1'
CRS-2673: Attempting to stop 'ora.NEWFRA2.dg' on 'srv1'
CRS-2673: Attempting to stop 'ora.FRA1.dg' on 'srv1'
CRS-2677: Stop of 'ora.CRS.dg' on 'srv1' succeeded
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'srv1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.srv1.vip' on 'srv1'
CRS-2677: Stop of 'ora.FRA1.dg' on 'srv1' succeeded
CRS-2677: Stop of 'ora.NEWFRA2.dg' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'srv1'
CRS-2677: Stop of 'ora.DATA.dg' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'srv1'
CRS-2677: Stop of 'ora.asm' on 'srv1' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'srv2'
CRS-2677: Stop of 'ora.srv1.vip' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.srv1.vip' on 'srv2'
CRS-2676: Start of 'ora.scan1.vip' on 'srv2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'srv2'
CRS-2676: Start of 'ora.srv1.vip' on 'srv2' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'srv2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'srv1'
CRS-2677: Stop of 'ora.ons' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'srv1'
CRS-2677: Stop of 'ora.net1.network' on 'srv1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'srv1' has completed
CRS-2677: Stop of 'ora.crsd' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'srv1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'srv1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'srv1'
CRS-2677: Stop of 'ora.storage' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'srv1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'srv1'
CRS-2673: Attempting to stop 'ora.evmd' on 'srv1'
CRS-2673: Attempting to stop 'ora.asm' on 'srv1'
CRS-2677: Stop of 'ora.mdnsd' on 'srv1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'srv1' succeeded
CRS-2677: Stop of 'ora.crf' on 'srv1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'srv1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'srv1' succeeded
CRS-2677: Stop of 'ora.asm' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'srv1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'srv1'
CRS-2677: Stop of 'ora.cssd' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'srv1'
CRS-2677: Stop of 'ora.gipcd' on 'srv1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'srv1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2021/05/28 13:15:13 CLSRSC-347: Successfully unlock /u01/app/12.1.0/grid
srv1:/u01/app/12.1.0/grid/crs/install # ./rootcrs.sh -postpatch
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2021/05/28 13:25:06 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
2021/05/28 13:25:33 CLSRSC-4003: Successfully patched Oracle Trace File Analyzer (TFA) Collector.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'srv1'
CRS-2672: Attempting to start 'ora.evmd' on 'srv1'
CRS-2676: Start of 'ora.mdnsd' on 'srv1' succeeded
CRS-2676: Start of 'ora.evmd' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'srv1'
CRS-2676: Start of 'ora.gpnpd' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'srv1'
CRS-2676: Start of 'ora.gipcd' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'srv1'
CRS-2676: Start of 'ora.cssdmonitor' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'srv1'
CRS-2672: Attempting to start 'ora.diskmon' on 'srv1'
CRS-2676: Start of 'ora.diskmon' on 'srv1' succeeded
CRS-2676: Start of 'ora.cssd' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'srv1'
CRS-2672: Attempting to start 'ora.ctssd' on 'srv1'
CRS-2676: Start of 'ora.ctssd' on 'srv1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'srv1'
CRS-2676: Start of 'ora.asm' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'srv1'
CRS-2676: Start of 'ora.storage' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'srv1'
CRS-2676: Start of 'ora.crf' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'srv1'
CRS-2676: Start of 'ora.crsd' on 'srv1' succeeded
CRS-6017: Processing resource auto-start for servers: srv1
CRS-2672: Attempting to start 'ora.net1.network' on 'srv1'
CRS-2676: Start of 'ora.net1.network' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'srv1'
CRS-2673: Attempting to stop 'ora.srv1.vip' on 'srv2'
CRS-2677: Stop of 'ora.srv1.vip' on 'srv2' succeeded
CRS-2672: Attempting to start 'ora.srv1.vip' on 'srv1'
CRS-2676: Start of 'ora.ons' on 'srv1' succeeded
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'srv2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'srv2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'srv2'
CRS-2677: Stop of 'ora.scan1.vip' on 'srv2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'srv1'
CRS-2676: Start of 'ora.srv1.vip' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'srv1'
CRS-2676: Start of 'ora.scan1.vip' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'srv1'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'srv1' succeeded
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.prod.db' on 'srv1'
CRS-5017: The resource action "ora.prod.db start" encountered the following error:
ORA-00449: background process 'RBAL' unexpectedly terminated with error 448
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/srv1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.prod.db' on 'srv1' failed
CRS-2679: Attempting to clean 'ora.prod.db' on 'srv1'
CRS-2681: Clean of 'ora.prod.db' on 'srv1' succeeded
CRS-2672: Attempting to start 'ora.prod.db' on 'srv1'
CRS-5017: The resource action "ora.prod.db start" encountered the following error:
ORA-00449: background process 'RBAL' unexpectedly terminated with error 448
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/srv1/crs/trace/crsd_oraagent_oracle.trc".
CRS-2674: Start of 'ora.prod.db' on 'srv1' failed
CRS-2679: Attempting to clean 'ora.prod.db' on 'srv1'
CRS-2681: Clean of 'ora.prod.db' on 'srv1' succeeded
===== Summary of resource auto-start failures follows =====
CRS-2807: Resource 'ora.prod.db' failed to start automatically.
CRS-2807: Resource 'ora.prod.prodb_preconnect.svc' failed to start automatically.
CRS-6016: Resource auto-start has completed for server srv1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2919480821].
PRCC-1010 : _mgmtdb was already enabled
PRCR-1002 : Resource ora.mgmtdb is already enabled
srv1:/u01/app/12.1.0/grid/crs/install #
- Checking component software. Before running it was working fine.
grid@srv1:~> cluvfy comp software -n all
Component: crs
Node Name: srv1
/u01/app/12.1.0/grid/lib/acfslib.pm..."Permissions" did not match reference
/u01/app/12.1.0/grid/lib/okalib.pm..."Permissions" did not match reference
/u01/app/12.1.0/grid/lib/afdlib.pm..."Permissions" did not match reference
/u01/app/12.1.0/grid/bin/oerr.pl..."Permissions" did not match reference
3165 files verified
Software check failed
Verification of software was unsuccessful on all the specified nodes.
grid@srv2:~>
alert log error in PROD2 instance :
Fri May 28 16:14:05 2021
NOTE: failed to open +DATA/prod/PARAMETERFILE/spfile.289.1051537879
ORA-01034: ORACLE not available
ORA-27121: unable to determine size of shared memory segment
Linux-x86_64 Error: 13: Permission denied
################################################
- Tried doc but didn't succeed. To check and fix file permissions on Grid Infrastructure environment ( Doc ID 1931142.1 )
grid@srv1:~>
srv1:/u01/app/12.1.0/grid/crs/install # ./rootcrs.sh
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2021/05/31 08:20:35 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.
2021/05/31 08:22:21 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.
2021/05/31 08:22:21 CLSRSC-456: The Oracle Grid Infrastructure has already been configured.
srv1:/u01/app/12.1.0/grid/crs/install #
srv1:/u01/app/12.1.0/grid/crs/install # ./rootcrs.sh -init
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
grid@srv1:~> crsctl check cluster -all
**************************************************************
srv1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
grid@srv1:~> date
Mon May 31 08:49:08 CEST 2021
grid@srv1:~>
- Finally after verifying manually and changing permission chmod 6751 on issue mode instance got started.
ls -las $ORACLE_HOME(RDBMS)/bin/oracle
ls -las $ORACLE_GRID(RDBMS)/bin/oracle
chmod 6751 /u01/app/12.1.0/grid/bin/oracle
srv1
grid@srv1:~> ls -las /u01/app/oracle/product/12.1.0/dbhome_2/bin/oracle
316688 -rwsr-s--x 1 oracle asmadmin 324287048 Sep 28 2019 /u01/app/oracle/product/12.1.0/dbhome_2/bin/oracle
grid@srv2:~> ls -las /u01/app/12.1.0/grid/bin/oracle
285260 -rwxr-x--x 1 grid oinstall 292103400 Sep 22 2019 /u01/app/12.1.0/grid/bin/oracle
srv2
grid@srv2:~> ls -las /u01/app/oracle/product/12.1.0/dbhome_2/bin/oracle
316688 -rwsr-s--x 1 oracle asmadmin 324287048 Sep 28 2019 /u01/app/oracle/product/12.1.0/dbhome_2/bin/oracle
grid@srv2:~> ls -las /u01/app/12.1.0/grid/bin/oracle
285260 -rwsr-s--x 1 grid oinstall 292103400 Sep 22 2019 /u01/app/12.1.0/grid/bin/oracle
chmod 6751 /u01/app/12.1.0/grid/bin/oracle
chmod 6751 /u01/app/oracle/product/12.1.0/dbhome_2/bin/oracle
- We stopped the crs and started again on issue node, Instance got started.
No comments:
Post a Comment