RMAN-20036 During Recovery Catalog Resync in Data Guard

Radoslaw Kut
Database , RMAN , Recovery Catalog , Data Guard
23 May, 2026

When using Oracle Data Guard together with an RMAN recovery catalog, one of the operations you may run regularly is:

RESYNC CATALOG FROM DB_UNIQUE_NAME ALL;

In a healthy configuration this command should synchronize the recovery catalog with all known DB_UNIQUE_NAME entries for the Data Guard configuration.

In my case, however, the command failed with:

ORA-20036: Invalid record order
RMAN-20036: invalid record order

Worth to mention is that: the recovery catalog registration was correct, the Data Guard sites were visible and RMAN connections were successful.
The actual problem was hiding elsewhere.

Environment

Oracle version          : 19.30
Primary DB_UNIQUE_NAME  : DBPRMY
Standby DB_UNIQUE_NAME  : DBSTBY
Recovery catalog        : RCATDB
Recovery catalog owner  : RCAT
Backup user             : RMANBKP with SYSBACKUP
Connection method       : Secure External Password Store / Oracle wallet

RMAN was started from the standby host using wallet credentials:

rman target "'/@dbstby as sysbackup'" catalog /@rcatdb

The backup strategy was being moved to the physical standby database.

The Symptom

The connection succeeded:

Recovery Manager: Release 19.0.0.0.0 - Production on Sun May 17 02:48:10 2026
Version 19.30.0.0.0

connected to target database: DBSTBY (DBID=802992507)
connected to recovery catalog database

But the resync failed:

RMAN> resync catalog from db_unique_name all;

Error stack:

resync catalog from db_unique_name all;

resyncing from database with DB_UNIQUE_NAME ADMORA
starting full resync of recovery catalog
got ORA-20036: Invalid record order
ORA-06512: at "RCAT.DBMS_RCVCAT", line 12315 during resync
retrying with snapshot controlfile
starting full resync of recovery catalog
RMAN Command Id : 2026-05-17T02:48:14
RMAN Command Id : 2026-05-17T02:48:14
RMAN Command Id : 2026-05-17T02:48:14
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of resync from db_unique_name command at 05/17/2026 02:48:19
RMAN-20036: invalid record order

At first glance, this looked like a recovery catalog metadata problem. The error came from:

RCAT.DBMS_RCVCAT

So the obvious suspects were catalog metadata, control file records, Data Guard site metadata, or an RMAN catalog bug.
But the catalog registration looked fine.

Step 1: Validate Data Guard Registration in the Catalog

Querying the recovery catalog showed both Data Guard members:

SELECT db_key,
       site_key,
       database_role,
       db_unique_name
FROM   rcat.rc_site
WHERE  db_key = 174061;

Result:

   DB_KEY    SITE_KEY    DATABASE_ROLE    DB_UNIQUE_NAME
--------- ----------- ---------------- -----------------
   174061      174063 PRIMARY          DBPRMY
   174061      174230 STANDBY          DBSTBY

RMAN also showed both databases:

LIST DB_UNIQUE_NAME OF DATABASE;

Expected output:

List of Databases
DB Key  DB Name  DB ID            Database Role    Db_unique_name
------- ------- ----------------- ---------------  ------------------
174061  DBPRMY   802992507        PRIMARY          DBPRMY
174061  DBPRMY   802992507        STANDBY          DBSTBY

The connect identifiers were configured explicitly:

CONFIGURE DB_UNIQUE_NAME 'DBPRMY' CONNECT IDENTIFIER 'dbprmy';
CONFIGURE DB_UNIQUE_NAME 'DBSTBY' CONNECT IDENTIFIER 'dbstby';

And verified with:

SHOW ALL FOR DB_UNIQUE_NAME 'DBPRMY';
SHOW ALL FOR DB_UNIQUE_NAME 'DBSTBY';

So Data Guard catalog metadata looked correct.

Step 2: Check V$RMAN_STATUS

The clue appeared in V$RMAN_STATUS on Primary database:

SELECT recid,
       stamp,
       status,
       start_time,
       end_time
FROM   v$rman_status
WHERE  status NOT IN (
         'COMPLETED',
         'FAILED',
         'COMPLETED WITH WARNINGS',
         'COMPLETED WITH ERRORS'
       );

Result:

    RECID         STAMP     STATUS    START_TIME     END_TIME
--------- ------------- ---------- ------------- ------------
   178197    1232732813 RUNNING    08-MAY-26     17-MAY-26

A job marked RUNNING since May 8, while troubleshooting happened on May 17, was a strong hint that something had been left behind.
An old RMAN process from several days earlier was still alive - a quiet zombie in the machinery.

Step 3: Find the Old RMAN Process

On the operating system:

ps -ef | grep rman

Returned:

oracle   2030922 3879325  0 May08 pts/5    00:00:02 rman

The process tree showed child Oracle server processes:

pstree -ap 2030922

Output:

rman,2030922
  ├─oracle_2030929_,2030929 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2031225_,2031225 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037543_,2037543 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037545_,2037545 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037547_,2037547 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037549_,2037549 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037557_,2037557 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037559_,2037559 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  ├─oracle_2037563_,2037563 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
  └─oracle_2037565_,2037565 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Then database sessions confirmed the same picture:

SELECT s.sid,
       s.serial#,
       s.username,
       s.status,
       s.program,
       s.module,
       s.client_info,
       s.logon_time,
       p.spid
FROM   v$session s
JOIN   v$process p ON p.addr = s.paddr
WHERE  s.program LIKE '%rman%'
   OR  s.module  LIKE '%rman%'
   OR  s.client_info LIKE '%rman%'
   OR  p.spid = '2030922';

 SID    SERIAL#    USERNAME      STATUS                                   PROGRAM                                    MODULE                CLIENT_INFO    LOGON_TIME       SPID
_______ __________ ___________ ___________ _________________________________________ _________________________________________ __________________________ _____________ __________
   1175      39329 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    rman@rac1.ora.host.local (TNS V1-V3)                               08-MAY-26     2030929
    589      51504 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    rman@rac1.ora.host.local (TNS V1-V3)                               08-MAY-26     2031225
   1400      34838 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_1    08-MAY-26     2037543
   1515      36580 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_2    08-MAY-26     2037545
     18      58055 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_3    08-MAY-26     2037547
    134      41345 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_4    08-MAY-26     2037549
    254      15343 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_5    08-MAY-26     2037557
    363      21269 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_6    08-MAY-26     2037559
    473      24328 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_7    08-MAY-26     2037563
    598      39225 SYS         INACTIVE    rman@rac1.ora.host.local (TNS V1-V3)    restore archivelog                        rman channel=ORA_DISK_8    08-MAY-26     2037565

The sessions were old and inactive, with module information showing:

restore archivelog

and RMAN channels such as:

rman channel=ORA_DISK_1
rman channel=ORA_DISK_2
...
rman channel=ORA_DISK_8

It must be an old backup operation that had never fully finished.
RC_RMAN_STATUS stores historical RMAN operation records and does not represent current RMAN sessions. However, RESYNC CATALOG reads metadata from the target control file. In this case V$RMAN_STATUS, which reflects RMAN status records from the control file, still showed an old operation as RUNNING. The still-existing RMAN client process and its server sessions likely left the control file RMAN status hierarchy in a state that DBMS_RCVCAT could not process during full Data Guard resync, resulting in RMAN-20036: invalid record order.

Step 4: Kill the Old RMAN Client Gracefully

The cleanest first step was not to kill database sessions directly, but to terminate the RMAN client process:

kill -15 2030922

After a few seconds:

ps -fp 2030922
pstree -ap 2030922

The process disappeared. No kill -9 was required.
If the process had survived, the next escalation would have been:

kill -9 2030922

If database sessions had remained after killing the client, they could be removed with:

ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;

or in RAC:

ALTER SYSTEM KILL SESSION 'sid,serial#,@inst_id' IMMEDIATE;

In this case, graceful termination was enough.
Other option is described in this Oracle Note: Resync Catalog fails with “ORA-20036: Invalid Record Order” during “RMANSTATUSRESYNC” - KB145166.

Step 5: Retest the Catalog Resync

After killing the old RMAN process, the same RMAN command was tested again from the standby:

rman target "'/@dbstby as sysbackup'" catalog /@rcatdb

RMAN connected successfully:

Recovery Manager: Release 19.0.0.0.0 - Production on Sun May 17 13:42:46 2026
Version 19.30.0.0.0

connected to target database: DBSTBY (DBID=802992507)
connected to recovery catalog database

Then:

RESYNC CATALOG FROM DB_UNIQUE_NAME ALL;

This time it worked:

resyncing from database with DB_UNIQUE_NAME DBPRMY
starting full resync of recovery catalog
full resync complete

full resync from standby disabled, attempting a partial resync
starting resync of recovery catalog
resync complete

Problem solved.

Summary

Sometimes RMAN-20036 looks like catalog corruption, but the problem may be much more physical: a forgotten RMAN process still sitting on the host.
Before rebuilding metadata, unregistering databases, or blaming the catalog, check for old RMAN sessions.
Sometimes the most effective fix is simply:

kill -15 <old_rman_pid>

In this case, the ghost left politely. The catalog resynced. The backup could continue.

RMAN-20036 During Recovery Catalog Resync in Data Guard

Environment

The Symptom

Step 1: Validate Data Guard Registration in the Catalog

Step 2: Check V$RMAN_STATUS

Step 3: Find the Old RMAN Process

Step 4: Kill the Old RMAN Client Gracefully

Step 5: Retest the Catalog Resync

Summary

Tags :

Share :

Related Posts

JAS-MIN, Part 1 — Digging Deep into AWR & STATSPACK

JAS-MIN, Part 2 — Digging Deep into AWR & STATSPACK

Non standard aproach to STANDARD