进行一次大批量插入后cpu一直很高查看等待事件 ’cursor: mutex S’的处理


1 故障现象

cpu 持续接近100%,前端应用几乎无法执行,其cpu状态如下:

CPU User% Kern% Wait% Idle% Reads 668 Rawin 0
ALL 91.4 8.6 0.0 0.0 Writes 659 Ttyout 283
Forks 0 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 Namei 33
Total 265.1 735.6 656.6 143.8 121.3 Runqueue 93.5 Dirblk 0
Waitqueue 0.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 8.0 1630.8 17.0 20.0 1610.8 PAGING Real,MB 126464
Faults 380 % Comp 17
FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 31
Total 3.8 9.5 3.8 0.0 PgspIn 0 % Client 31
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
oracle 5834746 2.9 11.7 oracle PageOut 0 Size,MB 32768
oracle 5505964 2.9 11.7 oracle Sios 0 % Used 0
oracle 8061986 2.9 11.7 oracle % Free 100
oracle 6816722 2.9 11.7 oracle NFS (calls/sec)
oracle 1574850 2.9 11.7 oracle SerV2 0 WPAR Activ 0
oracle 3146246 2.9 11.7 oracle CliV2 0 WPAR Total 0
oracle 984976 2.9 11.6 oracle SerV3 0 Press: "h"-help
oracle 852794 2.9 11.6 oracle CliV3 0 "q"-quit
oracle 7209388 2.9 11.6 oracle
oracle 2557246 2.9 11.6 oracle
oracle 1770876 2.9 11.6 oracle
oracle 2687246 2.9 11.6 oracle
oracle 7669190 2.9 11.6 oracle
oracle 9503500 2.9 11.6 oracle
oracle 6423588 2.9 11.6 oracle
oracle 8324232 2.9 11.6 oracle
oracle 2228902 2.9 11.6 oracle
oracle 7078532 2.9 11.6 oracle
oracle 4719422 2.9 11.7 oracle
oracle 1377004 2.9 11.6 oracle

通过plsql 查看,得出等待的sql 一直是这条:

"INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7".

 

通过以下语句查出

select sql_id,sql_text,version_count from v$sqlarea where version_count>5
order by 3 desc 

得出语句“frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7 ”的version_count值是8922,可见这条语句是不停的解析而没有共享。

 

查看一个小时内的awr的报告如下:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
cursor: mutex S 1,227,565,156 166,211 0 77.95 Concurrency
DB CPU   50,504   23.69  
kksfbc child completion 64,124 3,345 52 1.57 Other
library cache lock 35,338 1,593 45 0.75 Concurrency
log file sync 17,543 388 22 0.18 Commit

通过以上分析,怀疑是bug,向oracle 提了SR,最终确认是遇到了Bug 10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH。

得出应该打patch patch 10636231.,升级了opatch到OPatch patch of version 11.2.0.3.0,

地址是:

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome(page=KBHome&id=()),(page=KBNavigator&id=(bmDocTitle=How%20To%20Download%20And%20Install%20The%20Latest%20OPatch%20Version&from=BOOKMARK&bmDocID=274526.1&bmDocDsrc=KB&bmDocType=HOWTO&viewingMode=1143))

然后打了patch 10636231。问题解决。

 

sr (3-5553680601)内容如下:

LIUZIJIE@CATTSOFT.COM - 2012年4月6日 下午03时12分40秒 [Close Requested By Customer]
hi
i installed it,and it is ok, now.
thank you very much

Oracle 技术支持 - 2012年4月6日 上午11时02分24秒 [Notes]
Hi,
I see that there is no interim patch applied on this Oracle home. Hence there is no possibility for conflicts. Please go ahead and download the 
patch 10636231 from MOS and install it.
Regards,
Kathirvel
Cluster Technical Coordinator (Performance)

LIUZIJIE@CATTSOFT.COM - 2012年4月6日 上午09时58分46秒 [Update from Customer]
i have upload opatch detail
thank you

LIUZIJIE@CATTSOFT.COM - 2012年4月6日 上午09时56分16秒 [Update from Customer]
Upload to gtcr successful for the file opatch.txt.

LIUZIJIE@CATTSOFT.COM - 2012年4月6日 上午09时54分20秒 [Update from Customer]
$ORACLE_HOME/OPatch/opatch lsinventory -detail
Invoking O
Patch 11.2.0.1.1
Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation. All rights reserved.
Oracle Home : /u01/app/oracle/product/11.2.0
Central Inventory : /u01/app/oraInventory
from : /etc/oraInst.loc
OPatch version : 11.2.0.1.1
OUI version : 11.2.0.2.0
OUI location : /u01/app/oracle/product/11.2.0/oui
Log file location : /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/opatch2012-04-06_09-53-14AM.log
Patch history file: /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/lsinv/lsinventory2012-04-06_09-53-14AM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Database 11g 11.2.0.2.0
There are 1 products installed in this Oracle Home.
Installed Products (136): 
Agent Required Support Files 10.2.0.4.3
Assistant Common Files 11.2.0.2.0
Bali Share 1.1.18.0.0
Buildtools Common Files 11.2.0.2.0
Character Set Migration Utility 11.2.0.2.0
Cluster Verification Utility Common Files 11.2.0.2.0
Database Configuration and Upgrade Assistants 11.2.0.2.0
Database SQL Scripts 11.2.0.2.0
Database Workspace Manager 11.2.0.2.0
Deinstallation Tool 11.2.0.2.0
Enterprise Edition Options 11.2.0.2.0
Enterprise Manager Agent 10.2.0.4.3
Enterprise Manager Agent Core Files 10.2.0.4.3
Enterprise Manager Common Core Files 10.2.0.4.3
Enterprise Manager Common Files 10.2.0.4.3
Enterprise Manager Database Plugin -- Agent Support 11.2.0.2.0
Enterprise Manager Database Plugin -- Repository Support 11.2.0.2.0
Enterprise Manager Grid Control Core Files 10.2.0.4.3
Enterprise Manager plugin Common Files 11.2.0.2.0
Enterprise Manager Repository Core Files 10.2.0.4.3
Exadata Storage Server 11.2.0.1.0
Expat libraries 2.0.1.0.1
Generic Connectivity Common Files 11.2.0.2.0
HAS Common Files 11.2.0.2.0
HAS Files for DB 11.2.0.2.0
Installation Common Files 11.2.0.2.0
Installation Plugin Files 11.2.0.2.0
Installer SDK Component 11.2.0.2.0
JAccelerator (COMPANION) 11.2.0.2.0
LDAP Required Support Files 11.2.0.2.0
OLAP SQL Scripts 11.2.0.2.0
Oracle 11g Warehouse Builder Required Files 11.2.0.2.0
Oracle Advanced Security 11.2.0.2.0
Oracle Application Express 11.2.0.2.0
Oracle Call Interface (OCI) 11.2.0.2.0
Oracle Clusterware RDBMS Files 11.2.0.2.0
Oracle Code Edito

Oracle 技术支持 - 2012年4月6日 上午02时34分39秒 [ODM Action Plan]
Zijie,
In order to fix the issue you have to apply the patch for bug#10636231 which is available on MOS for your platform.
Before you do that though, upload the current patchlist of your environment so I can check if there are any conflicts 
with already installed patches.
OS>$ORACLE_HOME/OPatch/opatch lsinventory -detail
Thank you.
Demetre.

Oracle 技术支持 - 2012年4月6日 上午02时30分18秒 [ODM Research]
Zijie,
The high CPU with very high wait on ’cursor: mutex S’ is caused by the very high child cursors of the following insert( 7728 at times).
SQL ordered by Version Count
* Only Statements with Version Count greater than 20 are displayed
Version Count Executions SQL Id SQL Module SQL Text
4,096 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S...
The root cause of the high child cursors is due to the following bug:
Bug 10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH
Demetre.

Oracle 技术支持 - 2012年4月6日 上午02时24分38秒 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===
SSD
---------
System state dump
--------------------------------------
97: 
Cmd: Insert
98: 
Cmd: Insert
99: 0: waiting for ’cursor: mutex S’ 
Cmd: Insert
100:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
101:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
102:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
134:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
135:0: waiting for ’SQL*Net message from client’ 
136:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
137:0: waiting for ’SQL*Net message from client’ 
138: 
Cmd: Insert
139:0: waiting for ’SQL*Net message from client’ 
140:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
141:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
142:0: waiting for ’SQL*Net message from client’ 
143: 
Cmd: Insert
144:0: waiting for ’cursor: mutex S’ 
Cmd: Insert
The insert the above processes are running is:
INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7 
ChildTable: size=’7728’ 
The reason the cursors were not shared is:
reason=Insert Direct Load mismatch(2)
FileName
----------------
FileComment
----------------------

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午11时11分27秒 [Update from Customer]
Upload to gtcr successful for the file hfm_ora_3670970.trc.gz.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午10时41分17秒 [Update from Customer]
1 the issue is still.
2 report 

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午10时37分52秒 [Update from Customer]
Upload to gtcr successful for the file hfm_ora_9699998.trc.

Oracle 技术支持 - 2012年4月5日 下午09时23分31秒 [ODM Action Plan]
Hello
1. Please trey and flush shared pool
and post us if this relieves the issue
If not please when the isssue shows
2. System state
to see if we can catch some holders:
To diagnose this issue we require some trace gathered at the time of the problem. 
Please provide 2 hanganalyze and 2 systemstate dumps.
Hanganalyze will confirm if the db is really hung or just slow. 
Systemstate dump will shows what each process on the database is doing.
* Beware of taking systemstates on very large systems with large numbers of process. 
Systemstates can be very slow and the trace file can be very large.
Using SQL*Plus connect as SYSDBA using the following command:
sqlplus " / as sysdba"
Do this 2 times in 2 separate windows, creating 2 sqlplus sessions (SQL1 and SQL2)
In SQL1 gather the hanganalyze by executing the following:
SQL1> oradebug setmypid
SQL1> oradebug unlimit;
SQL1> oradebug hanganalyze 3
In SQL2 gather the systemstates by executing the following:
SQL2> oradebug setmypid
SQL2> oradebug unlimit;
SQL2> oradebug dump systemstate 266
Gather a second hang analyze having waited at least 2 minutes to give time 
to identify process state changes. 
In SQL1 execute the following:
SQL1> oradebug hanganalyze 3
In SQL2 execute the following to collect a second systemstate dump:
SQL3> oradebug setmypid
SQL3> oradebug unlimit;
SQL3> oradebug dump systemstate 266
If you are using systemstate level 266 and it is taking much longer than 
expected to generate the dump file, then end this systemstate dump and try level 258.
If connection to the system is not possible in any form, then please refer 
to the following article which describes how to collect systemstates in that situation:
<Note:121779.1> Taking a SYSTEMSTATE dump when you cannot CONNECT to Oracle.
For more general hang diagnostics see:
<Note:452358.1> Database Hangs: What to collect for support.
The trace files produced will be sent to the user_dump_dest
Regards Sijmen

Oracle 技术支持 - 2012年4月5日 下午09时21分59秒 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===
awrrpt_1_1172_1173.html Good Perf
Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 1172 30-Mar-12 09:00:12 810 2.0
End Snap: 1173 30-Mar-12 10:00:14 817 2.0
Elapsed: 60.04 (mins) 
DB Time: 4.64 (mins) 
.
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU 164 58.74 
log file sync 17,584 115 7 41.33 Commit
db file scattered read 1,228 3 3 1.12 User I/O
db file sequential read 1,221 2 2 0.88 User I/O
SQL*Net message to client 132,873 0 0 0.05 Network
.
awrrpt_1_1316_1317.html BD
.
Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 1316 05-Apr-12 09:00:46 404 1.9
End Snap: 1317 05-Apr-12 10:00:50 486 2.0
Elapsed: 60.06 (mins) 
DB Time: 3,553.69 (mins) 
.
SQL ordered by Version Count
* Only Statements with Version Count greater than 20 are displayed
Version Count Executions SQL Id SQL Module SQL Text
4,096 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S...
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
cursor: mutex S 1,227,565,156 166,211 0 77.95 Concurrency
DB CPU 50,504 23.69 
kksfbc child completion 64,124 3,345 52 1.57 Other
library cache lock 35,338 1,593 45 0.75 Concurrency
log file sync 17,543 388 22 0.18 Commit
.
=> there are no Memory Resize Ops
.
SQL ordered by Version Count
* Only Statements with Version Count greater than 20 are displayed
Version Count Executions SQL Id SQL Module SQL Text
667 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S...
=> very high version_count for sql_id frdqd1ydt7cu2
=> this is not the cause as the good report also has a very high version_count for sql_id frdqd1ydt7cu2
.
ashrpt_1_.html
has no data
.
none of the uploaded ASH reports has any data
=> we didnot catch any waiters to find holders in the ASH reports
.
@ Bug 9591812: INCORRECT WAIT EVENTS IN 11.2 ("CURSOR: MUTEX S" INSTEAD OF "CURSOR: MUTEX X")
FileName
----------------
FileComment
----------------------

Oracle 技术支持 - 2012年4月5日 下午08时53分05秒 [Notes]
I will check the upload and post as soon as I have relevant issues
Regards Sijmen

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午07时56分47秒 [Update from Customer]
i done,
thanks

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午07时56分05秒 [Update from Customer]
Upload to gtcr successful for the file ashrpt_3_.html.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午07时55分48秒 [Update from Customer]
Upload to gtcr successful for the file ashrpt_2_.html.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午07时55分35秒 [Update from Customer]
Upload to gtcr successful for the file ashrpt_01_.html.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午07时52分13秒 [Update from Customer]
Upload to gtcr successful for the file awrrpt_1_1172_1173.html.

Oracle 技术支持 - 2012年4月5日 下午07时22分44秒 [ODM Action Plan]
While we check the upload, please meanwhile send
1.AWR : please send a report of time interval of 1 hour when the performance was ok ( tagged GOOD) 
which we can use as baseline to compare for differences ( we assume the performance was ok some time ago)
For more details on AWR, See:
Note:1363422.1 AWR Reports - Information Center
2. Please also generate and upload a couple (3) of ASH report covering the same bad period ( as awrrpt_1_1316_1317.html ) for interval of 5 minutes. To get the ASH report,
execute:
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql
Regards Sijmen

Oracle 技术支持 - 2012年4月5日 下午07时18分05秒 [Notes]
Hello
I will check the upload and post as soon as I have relevant issues
Regards Sijmen

Oracle 技术支持 - 2012年4月5日 下午07时17分36秒 [ODM Issue Verification]
awrrpt_1_1316_1317.html

Oracle 技术支持 - 2012年4月5日 下午07时17分25秒 [ODM Issue Clarification]
SUMMARY
=========== 
rdbms 11.2.0.2 / IBM AIX on POWER Systems (64-bit)
cursor: mutex S and high cpu

Oracle 技术支持 - 2012年4月5日 下午07时17分01秒 [Call - Outbound]
Call - Outbound
------------------------------
<Record the details of the call>
called 86 13488763914
Agreed : to update the sr

Oracle 技术支持 - 2012年4月5日 下午07时04分06秒 [Notes]
Hello Zijie
Your Service Request has been assigned to me
I am reviewing the information in the Service Request and will get back to you
Regards Sijmen

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午06时47分40秒 [Update from Customer]
Upload to gtcr successful for the file ashrpt_1_.html.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午06时47分37秒 [Customer Problem Description]
** 客户主管的全天候联系人姓名: liuzijie
** 客户主管的全天候联系电话: 13488763914
** 客户主要联系人姓名: zijie liu
** 当前客户全天候语音电话号码: (134) 887-6391 x4
** 当前客户电子邮件地址: LIUZIJIE@CATTSOFT.COM
** 当前客户寻呼机/传真号码: 13581640145
DB is down?
Yes
Data corruption ? physical or logical?
Yes
System crashes repeatedly?
Yes
Critical functionality is not available?
Yes
Severe performance impact or system hangs?
Yes
Based on the above responses, do you want to proceed with a severity 1 service request?
Yes
1) ### Impact on Business ###
2) Additional Information

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午06时47分13秒 [Update from Customer]
Upload to gtcr successful for the file awrrpt_1_1316_1317.html.

LIUZIJIE@CATTSOFT.COM - 2012年4月5日 下午06时47分37秒 [Customer Problem Description]
问题说明: there is top evetn cursor: mutex S and cpu information as follow
:
CPU User% Kern% Wait% Idle% Reads 668 Rawin 0
ALL 91.4 8.6 0.0 0.0 Writes 659 Ttyout 283
Forks 0 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 Namei 33
Total 265.1 735.6 656.6 143.8 121.3 Runqueue 93.5 Dirblk 0
Waitqueue 0.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 8.0 1630.8 17.0 20.0 1610.8 PAGING Real,MB 126464
Faults 380 % Comp 17
FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 31
Total 3.8 9.5 3.8 0.0 PgspIn 0 % Client 31
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
oracle 5834746 2.9 11.7 oracle PageOut 0 Size,MB 32768
oracle 5505964 2.9 11.7 oracle Sios 0 % Used 0
oracle 8061986 2.9 11.7 oracle % Free 100
oracle 6816722 2.9 11.7 oracle NFS (calls/sec)
oracle 1574850 2.9 11.7 oracle SerV2 0 WPAR Activ 0
oracle 3146246 2.9 11.7 oracle CliV2 0 WPAR Total 0
oracle 984976 2.9 11.6 oracle SerV3 0 Press: "h"-help
oracle 852794 2.9 11.6 oracle CliV3 0 "q"-quit
oracle 7209388 2.9 11.6 oracle
oracle 2557246 2.9 11.6 oracle
oracle 1770876 2.9 11.6 oracle
oracle 2687246 2.9 11.6 oracle
oracle 7669190 2.9 11.6 oracle
oracle 9503500 2.9 11.6 oracle
oracle 6423588 2.9 11.6 oracle
oracle 8324232 2.9 11.6 oracle
oracle 2228902 2.9 11.6 oracle
oracle 7078532 2.9 11.6 oracle
oracle 4719422 2.9 11.7 oracle
oracle 1377004 2.9 11.6 oracle


Whatever is worth doing is worth doing well.