(2) Complete Separation filed 18 Feb 1972
(44) Complex Species published 23 Jan. 1974
(31) international Classification GosF11/00
( Index a space
GIA 18 12D 12N 12P 16G 16H 16 1F 4R (72) taveniors CHARLES SAMUEL REPTON
PETER CHARLES VENTON and KENNETH JAMES HAMER-HODGES
(54) FAULT DETECTION AND HANDLING ARRANGEMENTS FOR USE IN DATA PROCESSING SYSTEMS
(71) We. THE PLESSEY COMPANY dundancy. This is particularly relevant in LIMITED, a British Company of 3/60 so called multi-processor systems where the
Vicarage Lane Ilford, Essex, do hereby removal of one of the programos severely declare the invention for which we pray that A tent may be granted to us and the method by which it to be performed, to be particularly described in and by the follow ing statement
restricts the spare capacity of the processor system. The rejection of the faulty equipment leaves the operational system in a critical atate until some reconfiguration mechanism is activated to replace the faulty equipment by a spare cupient.
The present invention relates to fault 10 detection and handling arrangements for we Upon in real-time data processing apstems and is more particularly although not exclusively concerned with the use of such arrangements in so-called multiprocessor systems.
In real-time processor environments, such as multi-processor controlled telecommunica tien systems, it is vital to ensure that malfunctioning of one of the processor equip- mens is detected and compensated for a 20 soca as possible. Both hardware and so-called
"softwear" (programming error) faults must be detected and acted upon, however, it is reasonable to suppose that the majority of software faults will be removed before the 25 processor system becomes operational by the incorporation of thorough and comprehensive
testing of the application and supervisor pro grams of the system prior to its operational cut-over. These software faults which remain 30 when the system becomes operational must be handled, when detected, as for solid and transient hardware faults.
In many prior art systems the detection of fault simply causes the equipment in which 35 the fault has been detected to be rejected (ie placed off-line) from the on-line system, Hardware faults, however, may be classified as sold or transient and it is commonly
Recepted that significantly more transient 40 faults than solid faults occur and indeed the ratio of transient to solid faults may be of the order of some five transient to one solid fault. The simple rejection of a faulty equip.
ment from the operational system has the 45 immediate effect of reducing the operational security of the remaining system by the removal of part or even all of its "fail saie"
detection of a fault it is vital in
any inulti processor wstem to ensure that the effects of the faule do net spread through out the rest of the data processing system. The effects of the fault must be confined to as limiting an arca as possible so that cor rectly functioning equipment is not corrupted by the effects of the fault. It is therefore an object of the present invention to confine the functions of a faulty device to those fune tions which will be harmless to the rest of the on-line system when a fault is detected.
According to the invention there is pro vided a data processing system including a common memory, providing storage for all the information relative to application and supervisory programs together with all the information relative to a fault check-out pro gram, and at least one processor module characterised in that the or cach processor module includes memory protection arrange ments and fault detection and handling means arranged upon detection of a fault condition to become immedintely operative within the processor module to condition said memory protection arrangements to inhibit the faulty processor module from accessing any common memory location stor ing information relative to said application and supervisory programs and (i to permit the faulty processor module to access only those locations in said common memory in which the information relative to said fault
check-out program resides As stated above all hardware fauits fall into one of two categories (ie. solid or transient and therefore the detection of a fault on many occasions will leave the