ECC memory

4 stars based on 46 reviews

Semiconductor memory subsystems using large scale integrated circuit LSI techniques have proven to be cost-effective for certain applications of storing digital information. Most semiconductor memory subsystems are comprised of a plurality of similar memory storage devices or bit planes each of which is organized to contain as many memory storage cells or bits as feasible in order to reduce per bit costs and to also contain addressing and read and write circuits in order to minimize the number of connections to each memory storage device.

In many designs, this has resulted in an optimum memory storage device or bit plane that is organized as 2 W words of one bit each, typicallyor Certain contemporary technologies produce memory storage devices of 2 14 or more bits. Because of the one bit organization of the memory storage device, single bit error correction as described by Hamming in the publication "Error Detecting and Correcting Codes", R.

Because the memory storage devices are quite complex, and because many are used in a semiconductor memory subsystem, they usually represent the predominate component failure in a semiconductor memory subsystem. Consequently, it is common practice to employ some form of single bit error correction along the line described in Hamming. While single bit error correction allows for tolerance of single bit memory storage cell failure, as more malfunctions occur, the statistical chance of finding two of them, i.

While the method to accomplish double bit error correction as suggested by Hamming has been known in the art for some time, the cost of the additional circuitry required has made the technique economically unfeasible for most commercial applications. Recent work in the art has taught methods for the logging of errors for scheduling maintenance before the occurrence of a double bit error as disclosed by Petschauer in U.

The present invention utilizes a read-only-memory device ROM which contains each combination of two bits to be complemented within the data word to be accessed from the semiconductor memory subsystem. The error status of each data word accessed from the semiconductor memory subsystem is communicated to the memory subsystem timing and control circuitry which inhibits transfer of any data word to the semiconductor memory subsystem interface register containing an uncorrected multiple bit error.

This causes the double bit patterns stored within the ROM to sequentially complement, within the accessed data word, each combination of two bits. In this manner, each data word accessed from the semiconductor memory subsystem is automatically corrected for multiple bit errors. The basic word width of the semiconductor memory subsystem i. The 2 W total memory storage cells or addressable locations of of M data bits plus N-M coding bits may be referenced by an address word of W bits.

For the purpose of further discussion, the values of M, N, R and W will be referenced as logical quantities. The value of W is determined solely by the capacity i. As explained before, it is defined as W wherein the capacity of the semiconductor memory subsystem is 2 W storage cells. As also explained above, this includes the six bits required for SBC see Hamming and a parity bit to distinguish between one and two failures.

If no errors are present, the symdrome message is zero. If an error is present, the syndrome message does not equal zero. The parity bit will provide odd parity when no or an even number of errors is present. Even parity will be found if an odd number of errors is present. Therefore, the possible error conditions may be described by Table A.

It also provides a requested address word of W-bits on line 52 and the M-bit requested write data word on line The semiconductor memory subsystem timing and control circuitry 17 initiates access to MEM 14 via line The requested address, as received via line 52, is enabled into Address Register 18 by timing and control 17 via line 43 and presented to MEM 14 as a W-bit address word via line The requested write data received as an M-bit word via line 51 is enabled by timing and control 17, via line 45, into Write Data Register This N-bit word is transferred to MEM 14 for storage via line Should the data processing system desire to read the data word previously stored within MEM 14, it transfers and access request to timing and control 17 via line 42 and the W-bit requested address to Address Register 18 via line Timing and Control 17 enables the W-bit requested address into Address Register 18 via line 43 and initiates reference to MEM 14 via line Timing and control 17 enables that N-bit word into MDR 13 via line The accessed word is transferred to the correction selector, SEL 12, via line Timing and control 17 enables the M-bit accessed data word into I.

Reg 10 via line Timing and Control 17 notifies the data processing system via line 38 that the M-bit accessed data word is available to it via line The M-bit accessed data word is then made available to the data processing system as explained above. Therefore, the timing of the data word access with no errors and with one error are identical except that in the latter case, lines 40 and 41 notify timing and control 17 of the single bit error condition.

This material was presented by way of definition only. As shown in Table A, a double bit error will produce a non-zero syndrome message with odd parity.

As shown in FIG. At this point, the timing of the data read access becomes different from the situation wherein no errors or only one are found. Upon being notified of a non-zero syndrome message via line 41 and odd parity via line 40, timing and control 17 inhibits transfer of the load interface register signal via line As illustrated in FIG.

During the time wherein the load interface register signal is inhibited from line 39, a series of advance signal pulses are transmitted via line As stated above, each advance signal pulse received by ROM Address Register 15 causes the address stored therein to be incremented by one.

This incrementation is the addition of a binary one 1 to the address stored within ROM Address Register Therefore, the address presented to ROM 16 via line 53 is increased by one with each additional advance signal pulse. As shown, the first storage cell contains all ones, the second storage cell contains zeroes only in the first two bit positions, and each succeeding storage cell contains only two zeroes arranged in a different combination of bit positions.

Therefore, ROM 16 contains all combinations of zeroes taken two at a time for an N-bit word within its 2 R storage cells. Double-gating is a technique well known in the art wherein both sides of a flip-flop are transferred in a register-to-register transfer.

Its principle advantage is speed in that a clear signal is not required to set each flip-flop of the receiving register to a known state before gating the data from the first register to the second.

The complementing of each pair of bit positions in the accessed read data word is accomplished by SEL The transfer of an opposite state is called complementing. Timing and control 17 will notify the data processing system that the accessed read data word is available by transmitting a request ackowledge signal via line To reset the ROM Address Register 15 to zero, timing and control 17 transmits a clear signal via line As stated above, storage cell zero contains all ones which are transferred to SEL 12 via line Notice that the load interface register signal transferred via line 39, the request acknowledge signal transferred via line 38, and the clear signal transferred via line 46 are delayed for some multiple of the cycle time of the advance signal pulse transferred via line By this technique the present invention automatically corrects double bit errors within the semiconductor memory subsystem.

What is claimed is: In a semiconductor memory subsystem according to claim 1, said method for complementing a different combination of two of said bit positions of said N-bit word comprising: In a semiconductor memory subsystem according to claim 3, said complementing means comprising: US USA en Bit steering apparatus and method for correcting errors in stored data, storing the address of the corrected data and using the address to maintain a correct data condition.

Method and apparatus for determining the source and nature of an error within a computer system. Method and apparatus for indicating the severity of a fault within a computer system. Method and apparatus for detecting errors in a system that employs multi-bit wide memory elements.

Apparatus and method for merging data blocks with error correction code protection. Error correction system for single-error correction, related-double-error correction and unrelated-double-error detection.

Optimum apparatus and method for check bit generation and error detection, location and correction. Error-correcting codes for semiconductor memory applications: Method and apparatus for substituting spare memory chip for malfunctioning memory chip with scrubbing. Technique for correcting single-bit errors in caches with sub-block parity bits.

Error detection structure and method for serial or parallel data stream using partial polynomial check. Single in-line DRAM memory module including a memory controller and cross bar switches. High capacity disk storage system having unusually high fault tolerance level and bandpass.

System and method for detecting double-bit errors and for correcting errors due to component failures. Method and apparatus for transferring data between a data bus and a data storage device.

Poloniex ethereum deposit not showing

  • Bitcoin conference 2015 amsterdam

    Grotzooka bitstamp

  • Bitgold referral pending decisions

    Litecoin vs bitcoin investment goods

Bitcoin mining machine 2016 ncaa basketball rankings

  • Out of sync wallet dogecoin mineral

    South korea plans to ban bitcoin trading

  • Forumdeluxx bitcoin wallet

    Moon bitcoin bot bitcoin news and updates

  • Blockchain got hacked how to get gold

    Bitstamp xrp ripple

Best gpu for bitcoin mining 2012 olympics

40 comments Symbiont bitcoin mineral

270x litecoin toxique

This invention relates to correcting and detecting errors that may occur within a computer system particularly within a memory device, and more particularly to systems where a single bit correction supplemented with familial 1 through 4 bit correction and double bit word-wide detection are preferred, and even more particularly to bit data words stored in 4 bit RAM devices.

It is expensive to dedicate memory to error correction code ECC space, therefore, compromises in the desire for perfect error correction and detection are needed. For sustainable commercial viability, one must still provide the largest computer systems particularly, and other RAM using data storage systems generally, with appropriate compromises in error detection and correction. Using some ECC to make memory subsystems more reliable by providing the capability to allow a single multi-bit RAM device to fail and dynamically correcting that failure and also providing the same capability for any 1, 2, 3, or 4 bits within a 4 bit RAM family and further providing for detection of any 2 bits of non-familial error anywhere in the word is the path we chose.

This capacity will correct all single-bit and in-family 2, 3, or 4 bit errors on the fly, to produce a corrected data word, and identifies as unfixed unfixable and corrupted those data words with any other errors or error types. It is our belief that these are the most likely errors and that therefore our selected compromise is valuable.

As RAM device densities and memory subsystem bandwidth requirements increased over time, there was more pressure on the memory subsystem designers to use multi-data-bit RAM devices to meet their requirements.

As RAM device geometries become smaller and device failure rates increase, data words become more susceptible to failures that affect more than one bit in the device. Also, even though single bit errors are still the most predominant failure mode of RAM devices, soft single-bit failure rates are increasing do to the shrinking of the geometries and reliability characteristics of these devices.

So it becomes more important to at least detect double bit errors from multiple devices, so that data corruption can be detected and safely handled. This invention provides for that protection. Providing enhanced error detection and enhanced error correction without substantial cost increases, due to increased ratio of redundant Error Correction Code ECC bits versus information data bits are additional goals of this invention.

There were two main methods of handling error correction and detection in the past. The issue with this method is the additional costs of the RAMs to support the extra check bits. For very large memories, the extra cost of that extra RAM is significant if not commercially prohibitive. This method would also need 4 RAM devices to implement the 2 groups of 8 check bits, and therefore would have the same cost.

However, within each of the ECC fields, not all two-bit errors across multiple devices are detected. Therefore the cost is the same, but it doesn't have the same reliability characteristics. The multi-bit adjacent error correction or Chip Kill is merged with double bit nonadjacent error detection. This entails the ability to detect and correct failures within a single RAM device, and to further detect failures that have resulted from soft or hard errors of any single bit in any two RAM devices within the bit word.

No other solution has ever achieved this. A unique ECC table is used in our invention in conjunction with a specific RAM error definition table for syndrome decode , neither of which are in the prior art.

Prior inventions did not allow for the level of reliability that is present with an error code correction feature which combines single bit error correction and multi-bit adjacent correction with double bit non-adjacent error detection, at least not with a small number of additional ECC-type bits. Thus, there is a need for error correction and detection at low memory cost and high reliability, and providing familial error correction allows for capturing the most likely to occur of the multi-bit within a word errors, those that occur within a single DRAM or RAM device.

Accordingly, by thinking of the problem in this way, instead of trying to correct every possible error, we have designed an inventive and low cost error detection and correction system as set forth below. There have been similar systems in the art, but these do not have all the advantages or requirements of our invention. Perhaps the closest reference in a U. Compared to either embodiment of Chen ', our invention seems to produce more error checking and also possibly more error correction while requiring less ECC bits.

The specific code to support the 12 ECC bit code appears to be described in U. The cost savings related to an additional third of savings over Chen ' will be appreciated by those of experience in these arts. An additional patent of interest includes Blake et al, U. Finally, Adboo et al. However Adboo requires that the check bits be produced by two identical parity trees for each 64 bits, wherein each parity tree has the same number of inputs, and the outputs are paired to correct up to four bit errors within a single DRAM or RAM device.

Perhaps more importantly, Adboo can only detect and correct one single bit error in a word or one two adjacent-bit errors in a word, or four adjacent bit errors in a word. Adboo cannot detect two unrelated single bit errors or a single bit error outside of a familial group having up to 4 bit errors, which our invention can do. As can be clearly seen with reference to Adboo's FIG. For an example of this failing of Adboo, note that the code for bit C 4 is and the code for C 7 is XORing these two values leads to the result , which indicates that bit 0 is in error!

Thus if both C 4 and C 7 are in error, the syndrome will indicate that bit 0 is in error, an unacceptable situation, even if such an occurrence may be a rare event, because it missed two single bit errors.

Accordingly there is a need for stronger detection and correction of errors to improve the reliability of computer system memories and to do so with a minimal amount of data. An error correction system and chip-kill type system together with double bit non-familial error detection will provide a commercially most useful solution to this technical problem.

We describe our invention with reference to the drawings in the summary and detailed description sections below, but limit its scope only by the appended claims. The invention employs 16 such branches in any embodiment as described herein to generate the 16 check bits for a bit word. The invention employs 16 such branches to generate the 16 syndrome code bits employed as described further herein.

A highly complex code sequence has been discovered which provides an opportunity to correct multi-bit errors within a bit family, while at the same time providing an opportunity to also detect all additional single bit errors outside of that bit family, and further providing an opportunity to detect many other multi-bit uncorrectable errors.

The same generator or an identical one organized by the same code regenerates the 16 check bits when a bit memory word is read out of main memory and a comparison with the originally generated check bits is made by XORing the saved check bits with the output of the regenerator to produce a syndrome code. This is the same, mathematically, as putting the data bits through the same XOR tree configuration and adding in the check bit for each branch of the tree, which in practice is how we prefer to produce the syndrome because less cycle time is required.

The resulting syndrome is decoded, again employing the code sequence to organize the decode gates, to identify all the correctable errors of them and to identify most conditions of uncorrectable errors, and to indicate good data if there is no detectable error or corrupted data if errors are detected but they are uncorrectable. The preferred component concepts and parts are described first, and then the preferred functioning of the invention is described.

Please refer first to FIG. These four bits are said to be family bits or familial, because they are within a single RAM device. So, for purposes of discussion within this patent, we say that all bits within a DRAM or RAM device with be considered familial bits and those outside are not familial or are not part of the same family.

We label the rows 0 - 16 of FIG. In a bit word there are 32 such devices, RAMs 0 - 31 , and in our preferred inventive system, there would be an additional 4 devices, making 36 such RAM X devices in total per bit-data-plusbit-ECC word.

The column ETC indicates the error type code for each error type. A D 2 indicates one possible two-bit, in-family error state with bits 2 and 1 of RAM X being in error. A T indicates one of the four possible three-bit in-family error states for RAM X, and the Q Q 0 indicates that all four bits are in error. Note that the arrangement of 1's in the table 10 is arbitrary and that one of ordinary skill in this art will be able to place the fifteen 1's in other locations so that a unique table identifying all possible errors but having them in different locations would result.

Any such table could be substituted for this preferred table of FIG. For example, the diagonal of 1's in the first four rows could be reversed so that column 0 , row 0 has a 1, column 1 , row 1 has a 1, column 2 , row 2 has a 1 and row 3 , column 3 as a 1, and the remainder of the table could remain the same, thus producing another possible variation of the inventions, as will be fully understood with reference to the remainder of this disclosure.

This table is for consideration when assessing each family of bits i. As mentioned in the Summary section, one could modify this invention by shifting a family of bits to another location, or shifting many of the families to different locations and, if one shifted the other components of the invention with reference to the code specified by this shifted table, one could reproduce the invention in another form.

What is meant by that is that if for example, the family of bits - were to have their ECC table rows ECC 0 - 15 for each column - shifted to be under bits 72 - 75 , and the ECC table for the bit pattern of ECC bits currently under columns 72 - 75 were shifted to replace the ECC bits under - , the invention would still work.

The code discovered is not, therefore, unique to the representation in FIGS. Thus each number in the table specifies 4 bits of the syndrome code needed to indicate a particular error within a family. There are 36 families 0 - 35 since there are 4 families for the check bits 32 - The 15 possible error codes are specified in the left-most column and the family DRAM number is specified along the top.

This table thus specifies the correctable errors the preferred embodiment of the invention can handle and how they are based on the syndrome generated in accord with the preferred form of the invention.

Diagram 40 shows the path of data into the DRAM or RAM device 31 , in which 16 check bits are generated in block 42 , and both the check bits and the original bits of data are sent to the memory device on lines 43 , and 44 , respectively.

Retrieving the word after being stored in memory involves check bit regeneration and comparison 45 , 46 , and based on the syndrome produced, decoding for correctible errors in the data word and the check bits, 47 , 48 , along with production of a tentative no-error signal The bit data word is corrected if it can be in data correction block 35 where one of the syndrome codes is produced to specify which bits need to be corrected.

Also, error status detection is performed in block 36 , generating an indicator signal showing that there was an error, and whether it was correctable and corrected or uncorrectable.

The syndrome can be reported out to indicate which bits are bad if desired. The preferred embodiment works with memory that uses two standard bit DIMMs. These common DIMMs provides straightforward implementation for having a bit word where there are data bits and 16 check bits.

Utilizing standard DIMMs reduces the cost of the system greatly, adding to the value of this invention. Under this two DIMM organization 16 check bits are generated for every bit word that is sent into the memory. Check bits are the calculated odd parity across a specific pattern of RAM bits. After the 16 check bits are generated, using the error correction code table in the table 20 of FIGS.

Table 20 shows the inventive ECC code that is used in the form described by the preferred embodiment. When retrieving data words the process of generating check bits is repeated with a twist. Check bit regeneration occurs using Read data bits [ These regenerated check bits are compared bit-for-bit to the stored check bits, bits [ The comparison, using an XOR function results in a bit syndrome code. A determination is made of which bits or family of bits in the bit data-word may be in correctable error when the syndrome code is decoded.

Refer now to FIG. Thus, for the first XOR gate branch 51 for the tree 50 , bits 0 , 1 , 4 , 8 , and so on, to bit of the bit data word, as specified by the top line of FIG.

A branch is constructed in this manner for each bit of the 16 ECC bits. For heuristic purposes only branches comprising XOR gates 51 and 52 that produce check bits 0 and 15 , respectively, are illustrated. Thus, from an input line 53 of bits 0 - , 16 output bits are produced on line This accomplishes the function of block 42 of FIG.

As mentioned previously, the code word of FIGS. For this check generation module, such shifting to produce a code of the same effect but different form than the one of the preferred embodiment would be reflected in a changed distribution of the inputs to the 16 branches of the tree corresponding to the swapped values of the code. In all events, the check bit code after being generated, should be stored in RAM memory devices related to the ones used for storing this particular memory data word, thus requiring plus 16 bits per word for storage.

When one wants to retrieve the word from the memory, the process employs the pieces described with reference to FIGS. We illustrate alternate embodiments in FIGS. In 6 A, the parity tree is labeled 60 A, having two gates 61 A and 62 A. Mathematically these are the same but the variation of FIG. Again, as in FIG.