Currently running v7.6 of NetBackup client on Solaris 10, 4-node RAC (Oracle 11.2.0.3). Backup (restore) validation script is failing, with errors similar to
channel ch03: ORA-19870: error while restoring backup piece bk_10601_1_858286314 ORA-19507: failed to retrieve sequential file, handle="bk_10601_1_858286314", parms="" ORA-27029: skgfrtrv: sbtrestore returned error ORA-19511: Error received from media manager layer, error text: Backup file <bk_10601_1_858286314> not found in NetBackup catalog
but bplist command indicates NetBackup has the missing file(s).
Validation script is kicked off by OEM12 using "clustered database" target so that if any node is up the backup job should still run.
The channel allocation for the backup scripts uses an RMAN_SVC (srvctl) that's tied to the SCAN (rather than tying to a particular host that may be down).
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE' CONNECT='sys/pwd@RMAN_SVC';
RMAN_SVC =
(DESCRIPTION =
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan.cis.ccsd.net)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = RMAN_SVC)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 20)
(DELAY = 1)
)
)
)
The issue appears to be an inability to retrieve a file that was backed up on Node1 if the restore operation is run on Node2..NodeN.