RMAN-20036 During Recovery Catalog Resync in Data Guard
- Radoslaw Kut
- Database , RMAN , Recovery Catalog , Data Guard
- 23 May, 2026
When using Oracle Data Guard together with an RMAN recovery catalog, one of the operations you may run regularly is:
RESYNC CATALOG FROM DB_UNIQUE_NAME ALL;
In a healthy configuration this command should synchronize the recovery catalog with all known DB_UNIQUE_NAME entries for the Data Guard configuration.
In my case, however, the command failed with:
ORA-20036: Invalid record order
RMAN-20036: invalid record order
Worth to mention is that: the recovery catalog registration was correct, the Data Guard sites were visible and RMAN connections were successful.
The actual problem was hiding elsewhere.
Environment
Oracle version : 19.30
Primary DB_UNIQUE_NAME : DBPRMY
Standby DB_UNIQUE_NAME : DBSTBY
Recovery catalog : RCATDB
Recovery catalog owner : RCAT
Backup user : RMANBKP with SYSBACKUP
Connection method : Secure External Password Store / Oracle wallet
RMAN was started from the standby host using wallet credentials:
rman target "'/@dbstby as sysbackup'" catalog /@rcatdb
The backup strategy was being moved to the physical standby database.
The Symptom
The connection succeeded:
Recovery Manager: Release 19.0.0.0.0 - Production on Sun May 17 02:48:10 2026
Version 19.30.0.0.0
connected to target database: DBSTBY (DBID=802992507)
connected to recovery catalog database
But the resync failed:
RMAN> resync catalog from db_unique_name all;
Error stack:
resync catalog from db_unique_name all;
resyncing from database with DB_UNIQUE_NAME ADMORA
starting full resync of recovery catalog
got ORA-20036: Invalid record order
ORA-06512: at "RCAT.DBMS_RCVCAT", line 12315 during resync
retrying with snapshot controlfile
starting full resync of recovery catalog
RMAN Command Id : 2026-05-17T02:48:14
RMAN Command Id : 2026-05-17T02:48:14
RMAN Command Id : 2026-05-17T02:48:14
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of resync from db_unique_name command at 05/17/2026 02:48:19
RMAN-20036: invalid record order
At first glance, this looked like a recovery catalog metadata problem. The error came from:
RCAT.DBMS_RCVCAT
So the obvious suspects were catalog metadata, control file records, Data Guard site metadata, or an RMAN catalog bug.
But the catalog registration looked fine.
Step 1: Validate Data Guard Registration in the Catalog
Querying the recovery catalog showed both Data Guard members:
SELECT db_key,
site_key,
database_role,
db_unique_name
FROM rcat.rc_site
WHERE db_key = 174061;
Result:
DB_KEY SITE_KEY DATABASE_ROLE DB_UNIQUE_NAME
--------- ----------- ---------------- -----------------
174061 174063 PRIMARY DBPRMY
174061 174230 STANDBY DBSTBY
RMAN also showed both databases:
LIST DB_UNIQUE_NAME OF DATABASE;
Expected output:
List of Databases
DB Key DB Name DB ID Database Role Db_unique_name
------- ------- ----------------- --------------- ------------------
174061 DBPRMY 802992507 PRIMARY DBPRMY
174061 DBPRMY 802992507 STANDBY DBSTBY
The connect identifiers were configured explicitly:
CONFIGURE DB_UNIQUE_NAME 'DBPRMY' CONNECT IDENTIFIER 'dbprmy';
CONFIGURE DB_UNIQUE_NAME 'DBSTBY' CONNECT IDENTIFIER 'dbstby';
And verified with:
SHOW ALL FOR DB_UNIQUE_NAME 'DBPRMY';
SHOW ALL FOR DB_UNIQUE_NAME 'DBSTBY';
So Data Guard catalog metadata looked correct.
Step 2: Check V$RMAN_STATUS
The clue appeared in V$RMAN_STATUS on Primary database:
SELECT recid,
stamp,
status,
start_time,
end_time
FROM v$rman_status
WHERE status NOT IN (
'COMPLETED',
'FAILED',
'COMPLETED WITH WARNINGS',
'COMPLETED WITH ERRORS'
);
Result:
RECID STAMP STATUS START_TIME END_TIME
--------- ------------- ---------- ------------- ------------
178197 1232732813 RUNNING 08-MAY-26 17-MAY-26
A job marked RUNNING since May 8, while troubleshooting happened on May 17, was a strong hint that something had been left behind.
An old RMAN process from several days earlier was still alive - a quiet zombie in the machinery.
Step 3: Find the Old RMAN Process
On the operating system:
ps -ef | grep rman
Returned:
oracle 2030922 3879325 0 May08 pts/5 00:00:02 rman
The process tree showed child Oracle server processes:
pstree -ap 2030922
Output:
rman,2030922
├─oracle_2030929_,2030929 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2031225_,2031225 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037543_,2037543 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037545_,2037545 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037547_,2037547 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037549_,2037549 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037557_,2037557 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037559_,2037559 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
├─oracle_2037563_,2037563 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
└─oracle_2037565_,2037565 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Then database sessions confirmed the same picture:
SELECT s.sid,
s.serial#,
s.username,
s.status,
s.program,
s.module,
s.client_info,
s.logon_time,
p.spid
FROM v$session s
JOIN v$process p ON p.addr = s.paddr
WHERE s.program LIKE '%rman%'
OR s.module LIKE '%rman%'
OR s.client_info LIKE '%rman%'
OR p.spid = '2030922';
SID SERIAL# USERNAME STATUS PROGRAM MODULE CLIENT_INFO LOGON_TIME SPID
_______ __________ ___________ ___________ _________________________________________ _________________________________________ __________________________ _____________ __________
1175 39329 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) rman@rac1.ora.host.local (TNS V1-V3) 08-MAY-26 2030929
589 51504 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) rman@rac1.ora.host.local (TNS V1-V3) 08-MAY-26 2031225
1400 34838 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_1 08-MAY-26 2037543
1515 36580 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_2 08-MAY-26 2037545
18 58055 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_3 08-MAY-26 2037547
134 41345 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_4 08-MAY-26 2037549
254 15343 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_5 08-MAY-26 2037557
363 21269 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_6 08-MAY-26 2037559
473 24328 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_7 08-MAY-26 2037563
598 39225 SYS INACTIVE rman@rac1.ora.host.local (TNS V1-V3) restore archivelog rman channel=ORA_DISK_8 08-MAY-26 2037565
The sessions were old and inactive, with module information showing:
restore archivelog
and RMAN channels such as:
rman channel=ORA_DISK_1
rman channel=ORA_DISK_2
...
rman channel=ORA_DISK_8
It must be an old backup operation that had never fully finished.
RC_RMAN_STATUS stores historical RMAN operation records and does not represent current RMAN sessions. However, RESYNC CATALOG reads metadata from the target control file. In this case V$RMAN_STATUS, which reflects RMAN status records from the control file, still showed an old operation as RUNNING. The still-existing RMAN client process and its server sessions likely left the control file RMAN status hierarchy in a state that DBMS_RCVCAT could not process during full Data Guard resync, resulting in RMAN-20036: invalid record order.
Step 4: Kill the Old RMAN Client Gracefully
The cleanest first step was not to kill database sessions directly, but to terminate the RMAN client process:
kill -15 2030922
After a few seconds:
ps -fp 2030922
pstree -ap 2030922
The process disappeared. No kill -9 was required.
If the process had survived, the next escalation would have been:
kill -9 2030922
If database sessions had remained after killing the client, they could be removed with:
ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;
or in RAC:
ALTER SYSTEM KILL SESSION 'sid,serial#,@inst_id' IMMEDIATE;
In this case, graceful termination was enough.
Other option is described in this Oracle Note: Resync Catalog fails with “ORA-20036: Invalid Record Order” during “RMANSTATUSRESYNC” - KB145166.
Step 5: Retest the Catalog Resync
After killing the old RMAN process, the same RMAN command was tested again from the standby:
rman target "'/@dbstby as sysbackup'" catalog /@rcatdb
RMAN connected successfully:
Recovery Manager: Release 19.0.0.0.0 - Production on Sun May 17 13:42:46 2026
Version 19.30.0.0.0
connected to target database: DBSTBY (DBID=802992507)
connected to recovery catalog database
Then:
RESYNC CATALOG FROM DB_UNIQUE_NAME ALL;
This time it worked:
resyncing from database with DB_UNIQUE_NAME DBPRMY
starting full resync of recovery catalog
full resync complete
full resync from standby disabled, attempting a partial resync
starting resync of recovery catalog
resync complete
Problem solved.
Summary
Sometimes RMAN-20036 looks like catalog corruption, but the problem may be much more physical: a forgotten RMAN process still sitting on the host.
Before rebuilding metadata, unregistering databases, or blaming the catalog, check for old RMAN sessions.
Sometimes the most effective fix is simply:
kill -15 <old_rman_pid>
In this case, the ghost left politely. The catalog resynced. The backup could continue.