Detection of Excessive Interrupt Disablement
余年寄山水
533次浏览
2020年07月30日 17:23
最佳经验
本文由作者推荐
伟大的意思-玄清
Technote (FAQ)
Problem
Detection of Excessive Interrupt Disablement
Solution
Functional description
Example error log
Controlling disablement detection
Detection threshold
Error disposition
Limiting error logging
Exemption
AIX 5L Version 5.3 ML3 introduces a new feature which can detect a period of excessive interrupt disablement on a CPU, and create an error log record to report it. This allows you to know if privileged code running on a system is unduly (and silently) impacting performance. It also helps to identify and improve such offending code paths before the problems manifest in ways that have proven very difficult to diagnose in the past.
--------------------------------------------------------------------------------
Functional description
Use a kernel profiling approach to detect disabled code that runs for too long. The basic idea is to take advantage of the regularly scheduled clock "ticks" that generally occur every 10 milliseconds, using them to approximately measure continuously disabled stretches of CPU time individually on each logical processor in the configuration.
NOTE: This is a statistical sampling approach, so resolution is limited to avoid excessive false positives.
This approach will alert you to partially disabled code sequences by logging one or more hits within the offending code. It will alert you to fully disabled code sequences by logging the i_enable that terminates them. In the special case of timer request block (trb) callouts, the possible detection is triggered by controlling the disablement state within the clock routine, which invokes registered trb handlers in succession.
NOTE: See the tstart kernel service for more information.
The primary detail data logged is a stack trace for the interrupted context. This will reveal one point from the offending code path and the call sequence that got you there. For example, a heavy user of bzero will be easily identified even though bzero may have received the interrupt. Due to the sampling implementation, it is likely that the same excessively and partially disabled code will log a different IAR and traceback each time it is detected. Judgement is required to determine if two such detection's are representative of the same underlying problem or not.
To get function names displayed in the traceback, it is necessary that the subject code be built with the proper traceback tables. AIX kernel code and extensions are compiled with -q tbtable=full to ensure this.
The most recent Lightweight Memory Trace (LMT) entries have been copied to the error log record.
--------------------------------------------------------------------------------
Example error log
To see how this actually works and what is reported, a deliberate disablement excession was coded up as a kernel extension, dtest_ext. Its busy loop looks like this:
int dtest_ext()
{
...
/* partially disabled loop */
ipri = i_disable(INTOFFL0);
looper(looptime);
i_enable(ipri);
...
}
void looper(int uSec)
...
while( 1 ) {
curtime(&ctime);
if( ntimercmp(ctime, etime, >) )
break;
}
}
which loops until the current time, returned by curtime, passes a predetermined end time, set for this example to cause the loop to run for 550ms. In particular, the culprit code is looper, which calls curtime repeatedly in its busy loop.
Here is what you will see in the error log summary after running this program:
# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
A2205861 P S SYSPROC Excessive interrupt disablement time
The detail looks like this:
LABEL: DELAYED_INTS
IDENTIFIER: A2205861
Date/Time: Tue Sep 20 14:24:22 2005
Sequence Number: 102
Machine Id: 0001288F4C00
Node Id: victim64
Class: S
Type: PERF
Resource Name: SYSPROC
Description
Excessive interrupt disablement time
Probable Causes
SOFTWARE PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
REPORT DETAILED DATA
Detail Data
TICKS 51 (3) decr, IAR CD204 convert_tb+170, LR CD270 curtime+24
convert_tb+170
curtime+20
looper+E8
dtest_ext+A8
[37F4]
--unknown--
LMT
0000 0000 0000 0001 0000 0000 0000 B017 0000 004B 7535 DE29 8000 0028 10C0 0000
0000 0000 0000 0000 0000 0000 0008 B02D 0000 0085 1DA1 F27E 0000 0028 4AF0 0000
When dtest_ext was run and called looper, requesting a 55 tick busy loop, it became eligible for detection after running disabled for more than 500ms. The detector consequently reacted when the busy loop exceeded the default 50 tick threshold, and created an error log record. There are 3 significant parts to this record:
TICKS 51 (3) tells you that excessive disablement was detected on the 51st timer tick, and that the minimum count of total tick interrupts of 3 was reached, revealing that the culprit was only partially disabled.
convert_tb+170 is the start of the traceback, and tells you the convert_tb function was called by curtime which was called by looper which was called by dtest_ext. In this case, it is obvious that looper was the culprit, and that curtime and its subroutine convert_tb were innocent. In general, this may not always be so obvious.
The LMT recent entries, which can be analyzed to gain more information about the path leading up to the detection. The most recent traced event is last in this area, and will always be the 4AF hookid, which is that of disablement detection itself.
In the case where the detected code has been running fully disabled, the ticks information will be slightly different:
The TICKS value will report the (tick-accurate) actual length of the disabled interval, since it could not be detected until it was over.
The (count) value
will be 1.
The top of the traceback chain will likely be i_enable.
When the excessive disablement is detected in a trb handler, an extra line of detail data precedes the traceback to identify the responsible routine. When a trb routine has run fully disabled, this is the only way to identify it. For example:
Detail Data
TICKS 55 (1) decr, IAR 9474 i_enable+174, LR 73A88 clock+1D0
trb_ 3DA65DC trb_looper
i_enable+174
clock+1CC
i_softmod+2BC
[DF64C]
In this case, the busy loop was planted in a trb callout function, trb_looper. Because the excession could not be detected until after the trb routine returned to clock, and clock enabled, there is no direct evidence of it in the traceback, which merely shows clock enabling for interrupts.
--------------------------------------------------------------------------------
Controlling disablement detection
The only externally documented interface to this function is through the standard RAS framework. You can turn error checking persistently off at the system level with
errctrl errcheckoff -c all
or persistently on with
errctrl errcheckon -c all
The disablement detector registers as a child of the proc component, with the name , which can be specified as the component name in place of all. To affect just the disablement detector, for example:
errctrl errcheckoff -c
This level of control is not fully supported, and is not persistent over system reboot. More granular internal-use-only controls are also present as described below.
Detection threshold
You can use the error checking level of the RAS framework to imply the threshold on the number of ticks that must be accumulated to trigger an error report. These values are as follows:
disabled n/a
minimal 50 ticks or 1/2 second
normal 10 ticks or .1 second
detail 4 ticks or .04 second
Since the default error checking level of a debug-off kernel is minimal, the default disablement detection threshold is 50 ticks. One way to override this would be errctrl errchecknormal -c , which would change the threshold to 10 ticks. Any of the errchecknormal, errcheckminimal or errcheckdetail subcommands can be specified to change the threshold as indicated in the table above.
NOTE: A debug-on kernel has a default error checking level of detail, and consequently a default disablement detection threshold of only 4.
A "private" errctrl subcommand can also be used to set the threshold as an alternative to using the values implied by the error checking level.
errctrl threshold=n -c
Valid thresholds are in the range of 3-62 clock ticks (63 is used to indicate that detection is suspended as described below). Results may not be useful if the threshold is set too low, so values less than 3 are rejected. Setting the threshold to 0 will disable checking. The actual threshold will be determined by whichever of the checking level or the explicit
threshold was last set. Higher threshold values will detect disablement excession more accurately, while lower threshold will detect more potential offenders.
NOTE: Ticks are not an absolute measure of CPU time or CPU cycles. They are a measure of elapsed time. For this reason, detection is disabled by default in shared LPAR partitions.
Error disposition
The framework can also communicate an error disposition for each error severity. In this case, there is only one error being detected, whose severity will be considered MEDIUM_SEVERITY. Only the following error dispositions are supported, when set for medium severity with the medsevdisposition subcommand of errctrl:
ERROR_IGNORE - do not log the error, medsevdisposition=48
ERROR_LOG - log the error, medsevdisposition=64
ERROR_SYSTEM_DUMP - abend, medsevdisposition=112
Other dispositions are not applicable here.
If an event is ignored, either because of the error disposition (here), or the maxlog limit (see below), the event will still be recorded using LMT. Only the IAR, LR, and current trb callout address will be traced.
Limiting error logging
Another private subcommand errctrl -c maxlog=n will set a strict limit on the number of errors you will log, PER BOOT. The default will be 1, meaning that, over the life of a system, only one excessive disablement entry will be logged. If more are desired, this private command must be used after each system boot to allow them.
All events will be logged in the normal LMT buffer. This implies that all events are also logged to the system trace, when enabled. Only the binary IAR, LR, and trb callout addresses will be traced.
Exemption
In special cases it may be impractical to immediately rewrite an algorithm that contains excessive disablement. For this reason, two kernel services are exported to kernel extensions.
long disablement_checking_suspend(void)
A call to this service will temporarily disable the detection of excessive disablement, just for the duration of the current critical section. This call should be inserted at the beginning of the exempt critical section, immediately after it disables if this is base level code, or as soon as possible within interrupt handling code. The temporary exemption automatically lapses either when the program re-enables to INTBASE, or when the interrupt handling completes.
To cancel the exemption explicitly, perhaps because the exempt code is one of potentially many interrupt level callout routines, you will also have:
void disablement_checking_resume(long)
This resume function is called after re-enabling at the end of the critical section, not within the critical section. This is necessary because, in the case of an INTMAX critical section, the tick counting will have been deferred by the disablement until the moment of enablement. You want to still be in suspend mode at this instant.
The suspend function returns the previous suspension state t
o the caller. This is later passed to the paired resume function, which will restore that state. This enables nesting of exempted critical sections. As an example, to specifically exempt just the looper function of the previous example:
int ipri, dc;
ipri = i_disable(INTOFFL0);
dc = disablement_checking_suspend();
looper(looptime);
i_enable(ipri);
disablement_checking_resume(dc);
These functions are documented externally in the Kernel Services section of the Kernel and Subsystems Technical Reference:
/pseries/.
Historical Number
isg1pTechnote1489