Oracle

게시글 보기
작성자 유건데이타 등록일 2015-05-18
제목 Recommended Diagnostic Practices
Subject: Recommended Diagnostic Practices
--------------------------------------------------
The diagnostic procedures outlined here are intended as guidelines
to dealing with various categories of errors. This is by no means
an exhaustive list as the nature of the specific error invariably dictates
the type and amount of diagnostic tracing that is required.
Based on the information gathered via these guidelines it may be necessary
to successively obtain further traces, as required by Oracle Development.
This may be an iterative process.


1. DATA CORRUPTIONS

1.1 Description of Category

This category includes all block format corruptions, invalid index entries,
and corruptions of meta-data (eg.data dictionary).

1.2 Example:

ORA-600 [3339] on System datafile

1.3 Typical Diagnostic Actions:

(a) Get trace file(s), if the corruption reported as an internal error.
(b) For table corruptions, attempt to use index (if available) to extract
data from uncorrupted blocks.
(c) Obtain redo dumps corresponding to the time of corruption.
(d) If there is reason to suspect vendor OS problem, complete H/W
diagnostics need to be carried out.
(e) Where appropriate, determine if generic or port-specific issue.

2. LOGICAL CORRUPTIONS

2.1 Description of Category

This category refers to cases where the data stored or retrieved by
a query is incorrect although it isn't necessary that an error is returned
externally.
It also includes data dictionary inconsistencies without any detectable block corruption.

2.2 Example:

Query returns different results with CBO vs. Rule-based

2.3 Typical Diagnostic Actions:

(a) Obtain a reproducible test case or dial-in information.
(b) Record any visible error messages, get trace files.
(c) Where appropriate, determine if generic or port-specific issue.


3. SYSTEM HANGS

3.1 Description of Category

This category includes the cases where:
(i) the database hangs on open after media or instance recovery,
(ii) the system hangs with users unable to logon or execute operations.

3.2 Example:

Database spinning in transaction recovery, on attempting to open.

3.3 Typical Diagnostic Actions:

(a) Stack and process state dumps of all Oracle processes, including hung
processes using the ORADBX utility.
(b) Obtain complete ALERT.log to establish history of events leading to hang.
(c) In the case of a hang on database open:
(i) Set events to determine which stage of recovery is stuck.
(ii) Dumps of the controlfile, datafile headers, logfile headers, buffers, enqueues and latches.
(iii) Situation-specific dumps of archive/online logfiles.
(d) In the case of a system hang, take multiple systemstate dumps at
intervals.
(e) Monitor CPU and I/O activity during the hang using o/s utilities.
(f) Obtain a reproducible test case or dial-in information for development.


4. PERFORMANCE PROBLEMS

4.1 Description of Category

This category comprises (i) General cases of deteriorations in response time
or batch completion times (ii) performance degradtion on increase in concurrent activity.

4.2 Example

System hangs if more than 96 concurrent users.

4.3 Typical Diagnostic Actions:

(a) Document performance degradation in terms of specific indicators (response
time, batch completion time, number of concurrent logins supported,
efficiency of shared pool management).
(b) Provide a reproducible test case where possible, or document in detail
the environment and factors leading to poor performance (for example,
in the cases where reproducibility depends on concurrency in a production
environment, document circumstances surrounding degradation like #logins,
average memory usage, typical functionality invoked, IO activity,
dynamic statistics on Oracle activity.)
(c) Where appropriate, determine if generic or port-specific issue.
(d) If not reproducible in-house, but reproducible with reasonable
frequency on customer site provide dial-in information for development.


5. SYSTEM CRASHES

5.1 Description of Category

This includes all cases where the database crashes, possibly due to
one of the background processes dying.

5.2 Example:

DBWR crashes periodically during heavy activity

5.3 Typical Diagnostic Actions:

(a) Request trace files, ALERT.log and information on circumstances
surrounding the crash.
(b) Construct reproducible test case where possible.
(c) Where relevant, determine if generic or port-specific issue.


6. CRITICAL FUNCTIONALITY NOT AVAILABLE

6.1 Description of Category

This refers to all situations where functionality or vital features relied
on by a production application becomes unavailable, typically because
of a bug in the feature. This includes cases where Oracle utilities core dump,
applications error out due to bugs and recovery is stuck.

6.2 Example:

ORA-600[3020] during media recovery application of archive log

6.3 Typical Diagnostic Actions:

(a) Where an error is reported, get any trace files produced and relevant
redo log dumps if necessary. Document completely the circumstances
leading up to the error.
(b) Provide a reproducible test case or dial-in information to development.
(d) Where relevant, determine if generic or port-specific issue.

7. MEMORY CORRUPTIONS

7.1 Description of Category

This includes internal errors signalling memory leaks, corruptions of memory
data strutures and cache corruptions.

7.2 Example:

ORA-600[17271][INSTANTION SPACE LEAK] during pl/sql execution

7.3 Typical Diagnostic Actions:

(a) Request trace files when error causes a trace file to be produced.
(b) Provide reproducible test case where possible; else document circumstances
of error including (i) Details of OCI or Oracle tool, utility or pre-compiler
used in application (ii) OS tools or third-party tools used in conjunction
with the application (iii) Triggers fired by application
(iv) Packages or procedures executed.
Comment
등록된 코멘트가 없습니다.