1 故障现象
cpu 持续接近100%,前端应用几乎无法执行,其cpu状态如下:
CPU User% Kern% Wait% Idle% Reads 668 Rawin 0
ALL 91.4 8.6 0.0 0.0 Writes 659 Ttyout 283
Forks 0 Igets 0
Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 Namei 33
Total 265.1 735.6 656.6 143.8 121.3 Runqueue 93.5 Dirblk 0
Waitqueue 0.0
Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY
Total 8.0 1630.8 17.0 20.0 1610.8 PAGING Real,MB 126464
Faults 380 % Comp 17
FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 31
Total 3.8 9.5 3.8 0.0 PgspIn 0 % Client 31
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
oracle 5834746 2.9 11.7 oracle PageOut 0 Size,MB 32768
oracle 5505964 2.9 11.7 oracle Sios 0 % Used 0
oracle 8061986 2.9 11.7 oracle % Free 100
oracle 6816722 2.9 11.7 oracle NFS (calls/sec)
oracle 1574850 2.9 11.7 oracle SerV2 0 WPAR Activ 0
oracle 3146246 2.9 11.7 oracle CliV2 0 WPAR Total 0
oracle 984976 2.9 11.6 oracle SerV3 0 Press: "h"-help
oracle 852794 2.9 11.6 oracle CliV3 0 "q"-quit
oracle 7209388 2.9 11.6 oracle
oracle 2557246 2.9 11.6 oracle
oracle 1770876 2.9 11.6 oracle
oracle 2687246 2.9 11.6 oracle
oracle 7669190 2.9 11.6 oracle
oracle 9503500 2.9 11.6 oracle
oracle 6423588 2.9 11.6 oracle
oracle 8324232 2.9 11.6 oracle
oracle 2228902 2.9 11.6 oracle
oracle 7078532 2.9 11.6 oracle
oracle 4719422 2.9 11.7 oracle
oracle 1377004 2.9 11.6 oracle
通过plsql 查看,得出等待的sql 一直是这条:
"INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7".
通过以下语句查出
select sql_id,sql_text,version_count from v$sqlarea where version_count>5
order by 3 desc
得出语句“frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7 ”的version_count值是8922,可见这条语句是不停的解析而没有共享。
查看一个小时内的awr的报告如下:
Top 5 Timed Foreground Events
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
cursor: mutex S | 1,227,565,156 | 166,211 | 0 | 77.95 | Concurrency |
DB CPU | 50,504 | 23.69 | |||
kksfbc child completion | 64,124 | 3,345 | 52 | 1.57 | Other |
library cache lock | 35,338 | 1,593 | 45 | 0.75 | Concurrency |
log file sync | 17,543 | 388 | 22 | 0.18 | Commit |
通过以上分析,怀疑是bug,向oracle 提了SR,最终确认是遇到了Bug 10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH。
得出应该打patch patch 10636231.,升级了opatch到OPatch patch of version 11.2.0.3.0,
地址是:
然后打了patch 10636231。问题解决。
sr (3-5553680601)内容如下:
[email protected] - 2012年4月6日 下午03时12分40秒 [Close Requested By Customer] |
hi i installed it,and it is ok, now. thank you very much |
Oracle 技术支持 - 2012年4月6日 上午11时02分24秒 [Notes] |
Hi, I see that there is no interim patch applied on this Oracle home. Hence there is no possibility for conflicts. Please go ahead and download the patch 10636231 from MOS and install it. Regards, Kathirvel Cluster Technical Coordinator (Performance) |
[email protected] - 2012年4月6日 上午09时58分46秒 [Update from Customer] |
i have upload opatch detail thank you |
[email protected] - 2012年4月6日 上午09时56分16秒 [Update from Customer] |
Upload to gtcr successful for the file opatch.txt. |
[email protected] - 2012年4月6日 上午09时54分20秒 [Update from Customer] |
$ORACLE_HOME/OPatch/opatch lsinventory -detail Invoking OPatch 11.2.0.1.1 Oracle Interim Patch Installer version 11.2.0.1.1 Copyright (c) 2009, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/oracle/product/11.2.0 Central Inventory : /u01/app/oraInventory from : /etc/oraInst.loc OPatch version : 11.2.0.1.1 OUI version : 11.2.0.2.0 OUI location : /u01/app/oracle/product/11.2.0/oui Log file location : /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/opatch2012-04-06_09-53-14AM.log Patch history file: /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/opatch_history.txt Lsinventory Output file location : /u01/app/oracle/product/11.2.0/cfgtoollogs/opatch/lsinv/lsinventory2012-04-06_09-53-14AM.txt -------------------------------------------------------------------------------- Installed Top-level Products (1): Oracle Database 11g 11.2.0.2.0 There are 1 products installed in this Oracle Home. Installed Products (136): Agent Required Support Files 10.2.0.4.3 Assistant Common Files 11.2.0.2.0 Bali Share 1.1.18.0.0 Buildtools Common Files 11.2.0.2.0 Character Set Migration Utility 11.2.0.2.0 Cluster Verification Utility Common Files 11.2.0.2.0 Database Configuration and Upgrade Assistants 11.2.0.2.0 Database SQL Scripts 11.2.0.2.0 Database Workspace Manager 11.2.0.2.0 Deinstallation Tool 11.2.0.2.0 Enterprise Edition Options 11.2.0.2.0 Enterprise Manager Agent 10.2.0.4.3 Enterprise Manager Agent Core Files 10.2.0.4.3 Enterprise Manager Common Core Files 10.2.0.4.3 Enterprise Manager Common Files 10.2.0.4.3 Enterprise Manager Database Plugin -- Agent Support 11.2.0.2.0 Enterprise Manager Database Plugin -- Repository Support 11.2.0.2.0 Enterprise Manager Grid Control Core Files 10.2.0.4.3 Enterprise Manager plugin Common Files 11.2.0.2.0 Enterprise Manager Repository Core Files 10.2.0.4.3 Exadata Storage Server 11.2.0.1.0 Expat libraries 2.0.1.0.1 Generic Connectivity Common Files 11.2.0.2.0 HAS Common Files 11.2.0.2.0 HAS Files for DB 11.2.0.2.0 Installation Common Files 11.2.0.2.0 Installation Plugin Files 11.2.0.2.0 Installer SDK Component 11.2.0.2.0 JAccelerator (COMPANION) 11.2.0.2.0 LDAP Required Support Files 11.2.0.2.0 OLAP SQL Scripts 11.2.0.2.0 Oracle 11g Warehouse Builder Required Files 11.2.0.2.0 Oracle Advanced Security 11.2.0.2.0 Oracle Application Express 11.2.0.2.0 Oracle Call Interface (OCI) 11.2.0.2.0 Oracle Clusterware RDBMS Files 11.2.0.2.0 Oracle Code Edito |
Oracle 技术支持 - 2012年4月6日 上午02时34分39秒 [ODM Action Plan] |
Zijie, In order to fix the issue you have to apply the patch for bug#10636231 which is available on MOS for your platform. Before you do that though, upload the current patchlist of your environment so I can check if there are any conflicts with already installed patches. OS>$ORACLE_HOME/OPatch/opatch lsinventory -detail Thank you. Demetre. |
Oracle 技术支持 - 2012年4月6日 上午02时30分18秒 [ODM Research] |
Zijie, The high CPU with very high wait on ’cursor: mutex S’ is caused by the very high child cursors of the following insert( 7728 at times). SQL ordered by Version Count * Only Statements with Version Count greater than 20 are displayed Version Count Executions SQL Id SQL Module SQL Text 4,096 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S... The root cause of the high child cursors is due to the following bug: Bug 10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH Demetre. |
Oracle 技术支持 - 2012年4月6日 上午02时24分38秒 [ODM Data Collection] |
Name -------- === ODM Data Collection === SSD --------- System state dump -------------------------------------- 97: Cmd: Insert 98: Cmd: Insert 99: 0: waiting for ’cursor: mutex S’ Cmd: Insert 100:0: waiting for ’cursor: mutex S’ Cmd: Insert 101:0: waiting for ’cursor: mutex S’ Cmd: Insert 102:0: waiting for ’cursor: mutex S’ Cmd: Insert 134:0: waiting for ’cursor: mutex S’ Cmd: Insert 135:0: waiting for ’SQL*Net message from client’ 136:0: waiting for ’cursor: mutex S’ Cmd: Insert 137:0: waiting for ’SQL*Net message from client’ 138: Cmd: Insert 139:0: waiting for ’SQL*Net message from client’ 140:0: waiting for ’cursor: mutex S’ Cmd: Insert 141:0: waiting for ’cursor: mutex S’ Cmd: Insert 142:0: waiting for ’SQL*Net message from client’ 143: Cmd: Insert 144:0: waiting for ’cursor: mutex S’ Cmd: Insert The insert the above processes are running is: INSERT INTO "HFM_ERRORLOG" ("SGUID","LLOGTYPE","DTIMESTAMP","SSERVERNAME","SAPPNAME","SXMLERROR") VALUES (:1,:2,:3,:4,:5,:6) RETURNING ROWID into :7 ChildTable: size=’7728’ The reason the cursors were not shared is: reason=Insert Direct Load mismatch(2) FileName ---------------- FileComment ---------------------- |
[email protected] - 2012年4月5日 下午11时11分27秒 [Update from Customer] |
Upload to gtcr successful for the file hfm_ora_3670970.trc.gz. |
[email protected] - 2012年4月5日 下午10时41分17秒 [Update from Customer] |
1 the issue is still. 2 report |
[email protected] - 2012年4月5日 下午10时37分52秒 [Update from Customer] |
Upload to gtcr successful for the file hfm_ora_9699998.trc. |
Oracle 技术支持 - 2012年4月5日 下午09时23分31秒 [ODM Action Plan] |
Hello 1. Please trey and flush shared pool and post us if this relieves the issue If not please when the isssue shows 2. System state to see if we can catch some holders: To diagnose this issue we require some trace gathered at the time of the problem. Please provide 2 hanganalyze and 2 systemstate dumps. Hanganalyze will confirm if the db is really hung or just slow. Systemstate dump will shows what each process on the database is doing. * Beware of taking systemstates on very large systems with large numbers of process. Systemstates can be very slow and the trace file can be very large. Using SQL*Plus connect as SYSDBA using the following command: sqlplus " / as sysdba" Do this 2 times in 2 separate windows, creating 2 sqlplus sessions (SQL1 and SQL2) In SQL1 gather the hanganalyze by executing the following: SQL1> oradebug setmypid SQL1> oradebug unlimit; SQL1> oradebug hanganalyze 3 In SQL2 gather the systemstates by executing the following: SQL2> oradebug setmypid SQL2> oradebug unlimit; SQL2> oradebug dump systemstate 266 Gather a second hang analyze having waited at least 2 minutes to give time to identify process state changes. In SQL1 execute the following: SQL1> oradebug hanganalyze 3 In SQL2 execute the following to collect a second systemstate dump: SQL3> oradebug setmypid SQL3> oradebug unlimit; SQL3> oradebug dump systemstate 266 If you are using systemstate level 266 and it is taking much longer than expected to generate the dump file, then end this systemstate dump and try level 258. If connection to the system is not possible in any form, then please refer to the following article which describes how to collect systemstates in that situation: <Note:121779.1> Taking a SYSTEMSTATE dump when you cannot CONNECT to Oracle. For more general hang diagnostics see: <Note:452358.1> Database Hangs: What to collect for support. The trace files produced will be sent to the user_dump_dest Regards Sijmen |
Oracle 技术支持 - 2012年4月5日 下午09时21分59秒 [ODM Data Collection] |
Name -------- === ODM Data Collection === awrrpt_1_1172_1173.html Good Perf Snap Id Snap Time Sessions Cursors/Session Begin Snap: 1172 30-Mar-12 09:00:12 810 2.0 End Snap: 1173 30-Mar-12 10:00:14 817 2.0 Elapsed: 60.04 (mins) DB Time: 4.64 (mins) . Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) % DB time Wait Class DB CPU 164 58.74 log file sync 17,584 115 7 41.33 Commit db file scattered read 1,228 3 3 1.12 User I/O db file sequential read 1,221 2 2 0.88 User I/O SQL*Net message to client 132,873 0 0 0.05 Network . awrrpt_1_1316_1317.html BD . Snap Id Snap Time Sessions Cursors/Session Begin Snap: 1316 05-Apr-12 09:00:46 404 1.9 End Snap: 1317 05-Apr-12 10:00:50 486 2.0 Elapsed: 60.06 (mins) DB Time: 3,553.69 (mins) . SQL ordered by Version Count * Only Statements with Version Count greater than 20 are displayed Version Count Executions SQL Id SQL Module SQL Text 4,096 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S... Top 5 Timed Foreground Events Event Waits Time(s) Avg wait (ms) % DB time Wait Class cursor: mutex S 1,227,565,156 166,211 0 77.95 Concurrency DB CPU 50,504 23.69 kksfbc child completion 64,124 3,345 52 1.57 Other library cache lock 35,338 1,593 45 0.75 Concurrency log file sync 17,543 388 22 0.18 Commit . => there are no Memory Resize Ops . SQL ordered by Version Count * Only Statements with Version Count greater than 20 are displayed Version Count Executions SQL Id SQL Module SQL Text 667 frdqd1ydt7cu2 INSERT INTO "HFM_ERRORLOG" ("S... => very high version_count for sql_id frdqd1ydt7cu2 => this is not the cause as the good report also has a very high version_count for sql_id frdqd1ydt7cu2 . ashrpt_1_.html has no data . none of the uploaded ASH reports has any data => we didnot catch any waiters to find holders in the ASH reports . @ Bug 9591812: INCORRECT WAIT EVENTS IN 11.2 ("CURSOR: MUTEX S" INSTEAD OF "CURSOR: MUTEX X") FileName ---------------- FileComment ---------------------- |
Oracle 技术支持 - 2012年4月5日 下午08时53分05秒 [Notes] |
I will check the upload and post as soon as I have relevant issues Regards Sijmen |
[email protected] - 2012年4月5日 下午07时56分47秒 [Update from Customer] |
i done, thanks |
[email protected] - 2012年4月5日 下午07时56分05秒 [Update from Customer] |
Upload to gtcr successful for the file ashrpt_3_.html. |
[email protected] - 2012年4月5日 下午07时55分48秒 [Update from Customer] |
Upload to gtcr successful for the file ashrpt_2_.html. |
[email protected] - 2012年4月5日 下午07时55分35秒 [Update from Customer] |
Upload to gtcr successful for the file ashrpt_01_.html. |
[email protected] - 2012年4月5日 下午07时52分13秒 [Update from Customer] |
Upload to gtcr successful for the file awrrpt_1_1172_1173.html. |
Oracle 技术支持 - 2012年4月5日 下午07时22分44秒 [ODM Action Plan] |
While we check the upload, please meanwhile send 1.AWR : please send a report of time interval of 1 hour when the performance was ok ( tagged GOOD) which we can use as baseline to compare for differences ( we assume the performance was ok some time ago) For more details on AWR, See: Note:1363422.1 AWR Reports - Information Center 2. Please also generate and upload a couple (3) of ASH report covering the same bad period ( as awrrpt_1_1316_1317.html ) for interval of 5 minutes. To get the ASH report, execute: SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql Regards Sijmen |
Oracle 技术支持 - 2012年4月5日 下午07时18分05秒 [Notes] |
Hello I will check the upload and post as soon as I have relevant issues Regards Sijmen |
Oracle 技术支持 - 2012年4月5日 下午07时17分36秒 [ODM Issue Verification] |
awrrpt_1_1316_1317.html |
Oracle 技术支持 - 2012年4月5日 下午07时17分25秒 [ODM Issue Clarification] |
SUMMARY =========== rdbms 11.2.0.2 / IBM AIX on POWER Systems (64-bit) cursor: mutex S and high cpu |
Oracle 技术支持 - 2012年4月5日 下午07时17分01秒 [Call - Outbound] |
Call - Outbound ------------------------------ <Record the details of the call> called 86 13488763914 Agreed : to update the sr |
Oracle 技术支持 - 2012年4月5日 下午07时04分06秒 [Notes] |
Hello Zijie Your Service Request has been assigned to me I am reviewing the information in the Service Request and will get back to you Regards Sijmen |
[email protected] - 2012年4月5日 下午06时47分40秒 [Update from Customer] |
Upload to gtcr successful for the file ashrpt_1_.html. |
[email protected] - 2012年4月5日 下午06时47分37秒 [Customer Problem Description] |
** 客户主管的全天候联系人姓名: liuzijie ** 客户主管的全天候联系电话: 13488763914 ** 客户主要联系人姓名: zijie liu ** 当前客户全天候语音电话号码: (134) 887-6391 x4 ** 当前客户电子邮件地址: [email protected] ** 当前客户寻呼机/传真号码: 13581640145 DB is down? Yes Data corruption ? physical or logical? Yes System crashes repeatedly? Yes Critical functionality is not available? Yes Severe performance impact or system hangs? Yes Based on the above responses, do you want to proceed with a severity 1 service request? Yes 1) ### Impact on Business ### 2) Additional Information |
[email protected] - 2012年4月5日 下午06时47分13秒 [Update from Customer] |
Upload to gtcr successful for the file awrrpt_1_1316_1317.html. |
[email protected] - 2012年4月5日 下午06时47分37秒 [Customer Problem Description] |
问题说明: there is top evetn cursor: mutex S and cpu information as follow : CPU User% Kern% Wait% Idle% Reads 668 Rawin 0 ALL 91.4 8.6 0.0 0.0 Writes 659 Ttyout 283 Forks 0 Igets 0 Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 Namei 33 Total 265.1 735.6 656.6 143.8 121.3 Runqueue 93.5 Dirblk 0 Waitqueue 0.0 Disk Busy% KBPS TPS KB-Read KB-Writ MEMORY Total 8.0 1630.8 17.0 20.0 1610.8 PAGING Real,MB 126464 Faults 380 % Comp 17 FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 31 Total 3.8 9.5 3.8 0.0 PgspIn 0 % Client 31 PgspOut 0 Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE oracle 5834746 2.9 11.7 oracle PageOut 0 Size,MB 32768 oracle 5505964 2.9 11.7 oracle Sios 0 % Used 0 oracle 8061986 2.9 11.7 oracle % Free 100 oracle 6816722 2.9 11.7 oracle NFS (calls/sec) oracle 1574850 2.9 11.7 oracle SerV2 0 WPAR Activ 0 oracle 3146246 2.9 11.7 oracle CliV2 0 WPAR Total 0 oracle 984976 2.9 11.6 oracle SerV3 0 Press: "h"-help oracle 852794 2.9 11.6 oracle CliV3 0 "q"-quit oracle 7209388 2.9 11.6 oracle oracle 2557246 2.9 11.6 oracle oracle 1770876 2.9 11.6 oracle oracle 2687246 2.9 11.6 oracle oracle 7669190 2.9 11.6 oracle oracle 9503500 2.9 11.6 oracle oracle 6423588 2.9 11.6 oracle oracle 8324232 2.9 11.6 oracle oracle 2228902 2.9 11.6 oracle oracle 7078532 2.9 11.6 oracle oracle 4719422 2.9 11.7 oracle oracle 1377004 2.9 11.6 oracle |