Monday, July 30, 2018

resmgr:cpu quantum and oracle 12.1.0.2 standard edition

Post upgrade to Oracle database to 12.1.* , we are facing lot of issue, specially when database version is standard edition.

There are multiple reason for that .
1. Oracle allows maximum 16 CPU threads at any time.
2. With earlier versions of oracle and with enterprise edition, we can disable resource manager but not with oracle SE2. Our application sessions get stuck on  resmgr:cpu quantum for long and it creates lot of perofrmance issues. 

To mitigate performance issue related to resource manager, we keep resource manager disabled. But with Oracle SE2 , even if we set resource manager plan null and disable all default maintenance windows, still we can see same wait event. 

This becomes even more critical when even single application query gets stuck on resource manager and when check in background , you will see CPU utilization on DB server is very less. 

Ex: 
Cpu(s):  1.9%us,  0.6%sy,  0.0%ni, 97.0%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st

Above is the cpu utilization on database sever and query was getting stuck for almost 50 minutes. 

On further analysis found that even a single core(single thread) consumes ~100% CPU that thread with get caged by instance. 

Ex: 
top - 03:38:16 up 21 days,  2:36,  1 user,  load average: 2.48, 4.06, 4.38
Tasks: 481 total,   2 running, 479 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.6%us,  0.3%sy,  0.0%ni, 90.0%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  74252448k total, 52846708k used, 21405740k free,  1178888k buffers
Swap: 58720212k total,    24576k used, 58695636k free, 45195604k cached

90% ideal


but still query waiting on cpu
from last 285 seconds
query plan is perfect
Plan hash value: 2943048428

------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                     |       |       |   393 (100)|          |
|   1 |  SORT AGGREGATE                      |                     |     1 |   450 |            |          |
|   2 |   NESTED LOOPS                       |                     |       |       |            |          |
|   3 |    NESTED LOOPS                      |                     |     1 |   450 |   393   (1)| 00:00:01 |
|   4 |     VIEW                             |                     |     1 |   408 |   390   (1)| 00:00:01 |
|   5 |      HASH UNIQUE                     |                     |     1 |    75 |   390   (1)| 00:00:01 |
|*  6 |       FILTER                         |                     |       |       |            |          |
|   7 |        NESTED LOOPS                  |                     |       |       |            |          |
|   8 |         NESTED LOOPS                 |                     |     1 |    75 |   389   (0)| 00:00:01 |
|*  9 |          HASH JOIN                   |                     |     1 |    61 |   388   (0)| 00:00:01 |
|  10 |           TABLE ACCESS BY INDEX ROWID| LOC                 |   450 |  8550 |    16   (0)| 00:00:01 |
|* 11 |            INDEX RANGE SCAN          | LOC_STORESTATUS_IDX |   450 |       |     1   (0)| 00:00:01 |
|  12 |           TABLE ACCESS BY INDEX ROWID| SOURCING            |   374 | 15708 |   372   (0)| 00:00:01 |
|* 13 |            INDEX RANGE SCAN          | XIE3SOURCING        |   374 |       |     5   (0)| 00:00:01 |
|* 14 |          INDEX UNIQUE SCAN           | ITEM_PK             |     1 |       |     0   (0)|

On further drill down Identified that one of the core was utilizing 100% CPU.

Tasks: 494 total,   3 running, 491 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.3%us,  0.3%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  1.7%us,  0.0%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.7%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.7%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  1.0%us,  0.3%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  1.3%us,  0.0%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.3%us,  0.3%sy,  0.0%ni, 99.0%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

So any thread which tries to consume 100% CPU will get caged by resource manager. 

So to help overcome this issue , only option to run smaller queries those does not consume more CPU. Else you will be doomed. 

Note: In another environment , where we were getting same issue and process was taking 4-5 hours to complete, after increasing CPU cores from 12 to 16 and enabling huge pages helped complete process in 40 minutes.

Useful links: https://blog.dbi-services.com/oracle-standard-edition-two/

No comments:

Post a Comment