Monday, August 11, 2014

OGG-01154 OGG-01296 OGG-01668 Oracle GoldenGate Delivery for Oracle: PROCESS ABENDING

We have a golden gate setup; it had 10 replicate on the target side. Out of 10, 6 belong to the single group where the tables are split with the range as specified below.

Example of mapping

map BHUVAN.HONEY_JEST, TARGET BHUVAN.HONEY_JEST, KEYCOLS (CLIENT, PRODUCT_ID), FILTER (@RANGE(1,6,CLIENT, PRODUCT_ID));

Suddenly two replicate ABENDED from the group with the below error message. Both the replicates showing the same error message
========================================================
Message from the GGSCI> "view report LP1R33"
========================================================

2014-08-06 14:31:52  WARNING OGG-00869  Aborting BATCHSQL transaction. Detected inconsistent result: executed 274 operations in batch, resulting in 273 affected rows.
2014-08-06 14:31:52  WARNING OGG-01137  BATCHSQL suspended, continuing in normal mode.
2014-08-06 14:31:52  WARNING OGG-01003  Repositioning to rba 36663714 in seqno 8280.
2014-08-06 14:31:52  WARNING OGG-01004  Aborted grouped transaction on 'BHUVAN.HONEY_JEST', Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE "BHUVAN".
"HONEY_JEST" SET "INACT" = :a3,"CHGNR" = :a4 WHERE "MANDT" = :b0 AND "OBJNR" = :b1 AND "STAT" = :b2>).
2014-08-06 14:31:52  WARNING OGG-01003  Repositioning to rba 36663714 in seqno 8280.
2014-08-06 14:31:52  WARNING OGG-01154  SQL error 1403 mapping BHUVAN.HONEY_JEST to BHUVAN.HONEY_JEST OCI Error ORA-01403: no data found, SQL <UPDATE "BHUVAN"."HONEY_JEST" SE
T "INACT" = :a3,"CHGNR" = :a4 WHERE "MANDT" = :b0 AND "OBJNR" = :b1 AND "STAT" = :b2>.
2014-08-06 14:31:52  WARNING OGG-01003  Repositioning to rba 36663714 in seqno 8280.

Source Context :
  SourceModule            : [er.errors]
  SourceID                : [/scratch/aime1/adestore/views/aime1_adc4150267/oggcore/OpenSys/src/app/er/errors.cpp]
  SourceFunction          : [take_rep_err_action]
  SourceLine              : [632]
  ThreadBacktrace         : [12] elements
                          : [/oracle/product/ggs_1121019/GIS/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x2ae44beeb28e]]
                          : [/oracle/product/ggs_1121019/GIS/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x2ec) [0x2ae44bee3dec]]
                          : [/oracle/product/ggs_1121019/GIS/libgglog.so(_MSG_ERR_MAP_TO_TANDEM_FAILED(CSourceContext*, ggs::gglib::ggapp::CQualDBObjName<(DBObjType)
1> const&, ggs::gglib::ggapp::CQualDBObjName<(DBObjType)1> const&, CMessageFactory::MessageDisposition)+0x53) [0x2ae44bedfdc1]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(take_rep_err_action(short, int, char const*, extr_ptr_def*, __std_rec_hdr*, char*, file_def*, b
ool)+0xaa7) [0x5275b1]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(process_extract_loop()+0x26ce) [0x54187e]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(replicat_main(int, char**)+0x759) [0x553ce9]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x4f) [0x5f240f]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+
0x104) [0x5f2664]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x8b) [0x5f276b]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(main+0x2c) [0x552e8c]]
                          : [/lib64/libc.so.6(__libc_start_main+0xf4) [0x38a481d9c4]]
                          : [/oracle/product/ggs_1121019/GIS/replicat(__gxx_personality_v0+0x342) [0x4c4caa]]
2014-08-06 14:31:52  ERROR   OGG-01296  Error mapping from BHUVAN.HONEY_JEST to BHUVAN.HONEY_JEST.
========================================================
Message from ggserr.log
========================================================
2014-08-06 14:31:52  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  REPLICAT LP1R33 starting.
2014-08-06 14:31:53  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Aborting BATCHSQL transaction. Detected inconsistent result: executed 279 operations in batch, resulting in 278 affected rows.
2014-08-06 14:31:54  WARNING OGG-01137  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  BATCHSQL suspended, continuing in normal mode.
2014-08-06 14:31:54  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Repositioning to rba 36663714 in seqno 8280.
2014-08-06 14:31:54  WARNING OGG-01004  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Aborted grouped transaction on 'BHUVAN.HONEY_JEST', Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE "BHUVAN"."HONEY_JEST" SET "INACT" = :a3,"CHGNR" = :a4 WHERE "MANDT" = :b0 AND "OBJNR" = :b1 AND "STAT" = :b2>).
2014-08-06 14:31:54  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Repositioning to rba 36663714 in seqno 8280.
2014-08-06 14:31:54  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  SQL error 1403 mapping BHUVAN.HONEY_JEST to BHUVAN.HONEY_JEST OCI Error ORA-01403: no data found, SQL <UPDATE "BHUVAN"."HONEY_JEST" SET "INACT" = :a3,"CHGNR" = :a4 WHERE "MANDT" = :b0 AND "OBJNR" = :b1 AND "STAT" = :b2>.
2014-08-06 14:31:54  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Repositioning to rba 36663714 in seqno 8280.
2014-08-06 14:31:54  ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  Error mapping from BHUVAN.HONEY_JEST to BHUVAN.HONEY_JEST.
2014-08-06 14:31:54  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, LP1R33.prm:  PROCESS ABENDING.

Finding
1)  Error message guides me as a possible mapping issue.
a)  It might be miss match in the structure of the table between the source & target
b)  It might be miss match with the constraints
c)  Might be missing trandata on the source
d)  Might be missing some indexes on the target
2)  Since it has thrown with data missing error message; i thought it might be the problem with missing data on the target side.
3)  I have reviewed the sourcedefs file.
4)  Reviewed the replicate param file of working replicate & non-working replicate. No difference found.
5)  Any additional or hiddle character in it;- nothing was there

Solution
Note: DONT TRY IN THE PRODUCTION ENVIRONMENT DIRECTLY; I AM GIVING IT FOR AN EDUCTIONAL PURPOSE ONLY.
Since Handle collision was not enabled before, couple of my DML statement were ignored. When a trail file trying to update a statement, it is missing in the target table; so it started to throw error and stopped the replicate.

1)  I have identified the first time when then problem occurred.
In my case it is 2014-08-06 14:31:52
2)  I have checked whether trail file are available 15 min prior to the above timing.
3)  I have include the “HANDLECOLLISIONS” parameter in the replicat.
4)  Change the start time of replicat
GGSCI (wyclorap006) 3> edit param LP1R33

è Include “HANDLECOLLISIONS”
è Alter the replicat start time
GGSCI (wyclorap006) 4> alter REPLICAT LP1R33, BEGIN 2014-08-06 14:25:00
REPLICAT altered.

è Started the replicat

GGSCI (wyclorap006) 6>  start REPLICAT LP1R33

Sending START request to MANAGER ...
REPLICAT LP1R33 starting
5)  Review the replicat process through “info LP1R33” command; i am find the RBA is changing & checkpoint is happening.
6)  Once the replicat catch up the delay; remove the “HANDLECOLLISIONS” from the replicat. You can have exception table to collect all the handle collisions.

Wednesday, August 6, 2014

ADDING NEW DISK TO THE EXISTING DISK GROUP/REBALANCING EXISTING DISK GROUP

Things to consider while adding new disk/ Rebalancing existing Disk Group

1)         Make sure the disk are visible on all the cluster nodes; Disk readiness purely depend on the Os & version;

For ex:
Linux 5 – we will be using oracleasm as utility
Linux 6 – we will be using udev utility

2)         When you are planning to add disk to the disk group on the primary database, try to do it when database is having less load.

3)         Increase the ASM power limit to 9 or 10, depending on your environment. Once the rebalancing is completed, bring it to the original value.
Parameter è asm_power_limit
To increase or decrease the asm_power_limit

SQL> alter system set asm_power_limit=9; -- you need to issue it in the ASM instance.

4)         Add disks to one disk group at a time because rebalancing might cause the slowness to the database.

5)         Have a eye on the alert log of the ASM instance. If you are using cluster databases then you need to have a eye on all the instances in the cluster.

6)         You can monitor the rebalancing on the ASM instance using the below query
select group_number, operation, state, power, actual, sofar, est_work, est_rate, est_minutes from gv$asm_operation;

7)         you can check the size of the diskgroup before & after adding the Diskgroup in the ASM.
$ asmcmd –p
ASMCMD> lsdg BHU_B_SYSTEM

8)         Messages in the alert, when the rebalancing starts & when it completes
Disk added on the Node#1 and final status updates on the Node#2

-On Node One
Wed Aug 06 17:09:43 2014
SQL> ALTER DISKGROUP BHU_B_SYSTEM ADD  DISK '/dev/mapper/BHU_B_SYSTEM_05' SIZE 51200M /* ASMCA */
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (21,4) to disk (/dev/mapper/BHU_B_SYSTEM_05)
NOTE: requesting all-instance membership refresh for group=21
NOTE: initializing header on grp 21 disk BHU_B_SYSTEM_0004
NOTE: requesting all-instance disk validation for group=21
Wed Aug 06 17:09:45 2014
NOTE: skipping rediscovery for group 21/0xdb38ead3 (BHU_B_SYSTEM) on local instance.
NOTE: requesting all-instance disk validation for group=21
NOTE: skipping rediscovery for group 21/0xdb38ead3 (BHU_B_SYSTEM) on local instance.
NOTE: initiating PST update: grp = 21
Wed Aug 06 17:09:51 2014
GMON updating group 21 at 9546 for pid 73, osid 16865
NOTE: PST update grp = 21 completed successfully
NOTE: membership refresh pending for group 21/0xdb38ead3 (BHU_B_SYSTEM)
GMON querying group 21 at 9547 for pid 18, osid 101970
NOTE: cache opening disk 4 of grp 21: BHU_B_SYSTEM_0004 path:/dev/mapper/BHU_B_SYSTEM_05
Wed Aug 06 17:09:56 2014
NOTE: Attempting voting file refresh on diskgroup BHU_B_SYSTEM
GMON querying group 21 at 9548 for pid 18, osid 101970
SUCCESS: refreshed membership for 21/0xdb38ead3 (BHU_B_SYSTEM)
Wed Aug 06 17:09:56 2014
SUCCESS: ALTER DISKGROUP BHU_B_SYSTEM ADD  DISK '/dev/mapper/BHU_B_SYSTEM_05' SIZE 51200M /* ASMCA */
NOTE: starting rebalance of group 21/0xdb38ead3 (BHU_B_SYSTEM) at power 9
Starting background process ARB0
Wed Aug 06 17:09:56 2014
ARB0 started with pid=74, OS id=91751
NOTE: assigning ARB0 to group 21/0xdb38ead3 (BHU_B_SYSTEM) with 9 parallel I/Os
NOTE: Attempting voting file refresh on diskgroup BHU_B_SYSTEM
Wed Aug 06 17:11:01 2014

-On Second One
NOTE: disk validation pending for group 21/0xdb38cf46 (BHU_B_SYSTEM)
SUCCESS: validated disks for 21/0xdb38cf46 (BHU_B_SYSTEM)
NOTE: disk validation pending for group 21/0xdb38cf46 (BHU_B_SYSTEM)
NOTE: Assigning number (21,4) to disk (/dev/mapper/BHU_B_SYSTEM_05)
SUCCESS: validated disks for 21/0xdb38cf46 (BHU_B_SYSTEM)
NOTE: membership refresh pending for group 21/0xdb38cf46 (BHU_B_SYSTEM)
Wed Aug 06 17:09:54 2014
GMON querying group 21 at 9721 for pid 18, osid 84704
NOTE: cache opening disk 4 of grp 21: BHU_B_SYSTEM_0004 path:/dev/mapper/BHU_B_SYSTEM_05
GMON querying group 21 at 9722 for pid 18, osid 84704
SUCCESS: refreshed membership for 21/0xdb38cf46 (BHU_B_SYSTEM)
Wed Aug 06 17:09:58 2014
NOTE: Attempting voting file refresh on diskgroup BHU_B_SYSTEM

Adding disk
1)         I am using asmca to add the disk to the existing Diskgroup.
2)         You need to plan what kind of redundancy you are opting for, if you are going for a new diskgroup.
3)   As soon as you added the disk in the asmca, you will be shown an output as “Disks added successfully to disk group DISK_GROUP_NAME”. You need to see it in the sql prompt(on ASM instance) to know about the rebalancing status.

Happy Learning :)