Computer bugs and the Titanic effect

The severity with which a system fails is directly proportional to the intensity of the designer’s belief that it cannot.

The flow of data within a computer and between computers is the life’s blood of computing. Therefore, it is extremely important that the data is not corrupted in any way. This realization leads to strategies known as error-detecting and error-correcting codes. “My hard disk crashed”.”The file server was down.” “My e-mail went down last night.” Any computing instructor has heard these tales hundreds of time, as they are used to explain (excuse?) late assignments. Of course, if an assignment is started when it is handed out rather than the day it is due, these failures can be overcome. However, the problems of hardware failure do exist: Disks do crash, file servers do go down, and networks do fail. The Titanic effect, which states that :

“The severity with which a system fails is directly proportional to the intensity of the designer’s belief that it cannot”.

Hardware faillures do occur: The best solution is preventive maintenance. In computing this means periodic tests to detect problems and replacement of worn parts. Preventive maintenance also means that the physical environment in which a computer is housed is appropriate. Large mainframe computers often require air conditioned, dust-free rooms. PCs should not be set up under leakprone plumbing. Alas, not all situations can be anticipated.

One such situation occurred during pre-integrated circuit days. A machine that had been working correctly started producing erratic results. The problem was finally traced to a moth that had gotten into the cabinet of the machine. This incident led to the computer term bug for a computer error. A more recent incident involved a DSL line that intermittently disconnected itself. The trouble was finally traced to faulty telephone lines on which the squirrels had enjoyed munching. Of course, any discussion of component limits assumes that the computer hardware has been thoroughly tested at the design stage and during manufacturing.

A major scandal in 1994 was the circuit flaw in the Intel’s Pentium processor. The Pentium chip was installed in millions of computers manufactured by IBM, Compaq, Dell, Gateway 2000, and others. The circuit flaw wasa design error in the floating-point unit that caused certain types of division problems involving more than five significant digits to give the wrong answer. How often would the error affect a calculation? IBM predicted that spreadsheet users would experience an error every 24 days, Intel asserted that that it would occur every 27,000 years. The chip was corrected, but Intel did not recall all flawed chips. The experience was a public relations disaster for Intel, but they remain one of the leading chip manufacturers today.

Source: Privacy Online