Best gpu for bitcoin mining 2012 olympics
40 comments270x litecoin toxique
This invention relates to correcting and detecting errors that may occur within a computer system particularly within a memory device, and more particularly to systems where a single bit correction supplemented with familial 1 through 4 bit correction and double bit word-wide detection are preferred, and even more particularly to bit data words stored in 4 bit RAM devices.
It is expensive to dedicate memory to error correction code ECC space, therefore, compromises in the desire for perfect error correction and detection are needed. For sustainable commercial viability, one must still provide the largest computer systems particularly, and other RAM using data storage systems generally, with appropriate compromises in error detection and correction. Using some ECC to make memory subsystems more reliable by providing the capability to allow a single multi-bit RAM device to fail and dynamically correcting that failure and also providing the same capability for any 1, 2, 3, or 4 bits within a 4 bit RAM family and further providing for detection of any 2 bits of non-familial error anywhere in the word is the path we chose.
This capacity will correct all single-bit and in-family 2, 3, or 4 bit errors on the fly, to produce a corrected data word, and identifies as unfixed unfixable and corrupted those data words with any other errors or error types. It is our belief that these are the most likely errors and that therefore our selected compromise is valuable.
As RAM device densities and memory subsystem bandwidth requirements increased over time, there was more pressure on the memory subsystem designers to use multi-data-bit RAM devices to meet their requirements.
As RAM device geometries become smaller and device failure rates increase, data words become more susceptible to failures that affect more than one bit in the device. Also, even though single bit errors are still the most predominant failure mode of RAM devices, soft single-bit failure rates are increasing do to the shrinking of the geometries and reliability characteristics of these devices.
So it becomes more important to at least detect double bit errors from multiple devices, so that data corruption can be detected and safely handled. This invention provides for that protection. Providing enhanced error detection and enhanced error correction without substantial cost increases, due to increased ratio of redundant Error Correction Code ECC bits versus information data bits are additional goals of this invention.
There were two main methods of handling error correction and detection in the past. The issue with this method is the additional costs of the RAMs to support the extra check bits. For very large memories, the extra cost of that extra RAM is significant if not commercially prohibitive. This method would also need 4 RAM devices to implement the 2 groups of 8 check bits, and therefore would have the same cost.
However, within each of the ECC fields, not all two-bit errors across multiple devices are detected. Therefore the cost is the same, but it doesn't have the same reliability characteristics. The multi-bit adjacent error correction or Chip Kill is merged with double bit nonadjacent error detection. This entails the ability to detect and correct failures within a single RAM device, and to further detect failures that have resulted from soft or hard errors of any single bit in any two RAM devices within the bit word.
No other solution has ever achieved this. A unique ECC table is used in our invention in conjunction with a specific RAM error definition table for syndrome decode , neither of which are in the prior art.
Prior inventions did not allow for the level of reliability that is present with an error code correction feature which combines single bit error correction and multi-bit adjacent correction with double bit non-adjacent error detection, at least not with a small number of additional ECC-type bits. Thus, there is a need for error correction and detection at low memory cost and high reliability, and providing familial error correction allows for capturing the most likely to occur of the multi-bit within a word errors, those that occur within a single DRAM or RAM device.
Accordingly, by thinking of the problem in this way, instead of trying to correct every possible error, we have designed an inventive and low cost error detection and correction system as set forth below. There have been similar systems in the art, but these do not have all the advantages or requirements of our invention. Perhaps the closest reference in a U. Compared to either embodiment of Chen ', our invention seems to produce more error checking and also possibly more error correction while requiring less ECC bits.
The specific code to support the 12 ECC bit code appears to be described in U. The cost savings related to an additional third of savings over Chen ' will be appreciated by those of experience in these arts. An additional patent of interest includes Blake et al, U. Finally, Adboo et al. However Adboo requires that the check bits be produced by two identical parity trees for each 64 bits, wherein each parity tree has the same number of inputs, and the outputs are paired to correct up to four bit errors within a single DRAM or RAM device.
Perhaps more importantly, Adboo can only detect and correct one single bit error in a word or one two adjacent-bit errors in a word, or four adjacent bit errors in a word. Adboo cannot detect two unrelated single bit errors or a single bit error outside of a familial group having up to 4 bit errors, which our invention can do. As can be clearly seen with reference to Adboo's FIG. For an example of this failing of Adboo, note that the code for bit C 4 is and the code for C 7 is XORing these two values leads to the result , which indicates that bit 0 is in error!
Thus if both C 4 and C 7 are in error, the syndrome will indicate that bit 0 is in error, an unacceptable situation, even if such an occurrence may be a rare event, because it missed two single bit errors.
Accordingly there is a need for stronger detection and correction of errors to improve the reliability of computer system memories and to do so with a minimal amount of data. An error correction system and chip-kill type system together with double bit non-familial error detection will provide a commercially most useful solution to this technical problem.
We describe our invention with reference to the drawings in the summary and detailed description sections below, but limit its scope only by the appended claims. The invention employs 16 such branches in any embodiment as described herein to generate the 16 check bits for a bit word. The invention employs 16 such branches to generate the 16 syndrome code bits employed as described further herein.
A highly complex code sequence has been discovered which provides an opportunity to correct multi-bit errors within a bit family, while at the same time providing an opportunity to also detect all additional single bit errors outside of that bit family, and further providing an opportunity to detect many other multi-bit uncorrectable errors.
The same generator or an identical one organized by the same code regenerates the 16 check bits when a bit memory word is read out of main memory and a comparison with the originally generated check bits is made by XORing the saved check bits with the output of the regenerator to produce a syndrome code. This is the same, mathematically, as putting the data bits through the same XOR tree configuration and adding in the check bit for each branch of the tree, which in practice is how we prefer to produce the syndrome because less cycle time is required.
The resulting syndrome is decoded, again employing the code sequence to organize the decode gates, to identify all the correctable errors of them and to identify most conditions of uncorrectable errors, and to indicate good data if there is no detectable error or corrupted data if errors are detected but they are uncorrectable. The preferred component concepts and parts are described first, and then the preferred functioning of the invention is described.
Please refer first to FIG. These four bits are said to be family bits or familial, because they are within a single RAM device. So, for purposes of discussion within this patent, we say that all bits within a DRAM or RAM device with be considered familial bits and those outside are not familial or are not part of the same family.
We label the rows 0 - 16 of FIG. In a bit word there are 32 such devices, RAMs 0 - 31 , and in our preferred inventive system, there would be an additional 4 devices, making 36 such RAM X devices in total per bit-data-plusbit-ECC word.
The column ETC indicates the error type code for each error type. A D 2 indicates one possible two-bit, in-family error state with bits 2 and 1 of RAM X being in error. A T indicates one of the four possible three-bit in-family error states for RAM X, and the Q Q 0 indicates that all four bits are in error. Note that the arrangement of 1's in the table 10 is arbitrary and that one of ordinary skill in this art will be able to place the fifteen 1's in other locations so that a unique table identifying all possible errors but having them in different locations would result.
Any such table could be substituted for this preferred table of FIG. For example, the diagonal of 1's in the first four rows could be reversed so that column 0 , row 0 has a 1, column 1 , row 1 has a 1, column 2 , row 2 has a 1 and row 3 , column 3 as a 1, and the remainder of the table could remain the same, thus producing another possible variation of the inventions, as will be fully understood with reference to the remainder of this disclosure.
This table is for consideration when assessing each family of bits i. As mentioned in the Summary section, one could modify this invention by shifting a family of bits to another location, or shifting many of the families to different locations and, if one shifted the other components of the invention with reference to the code specified by this shifted table, one could reproduce the invention in another form.
What is meant by that is that if for example, the family of bits - were to have their ECC table rows ECC 0 - 15 for each column - shifted to be under bits 72 - 75 , and the ECC table for the bit pattern of ECC bits currently under columns 72 - 75 were shifted to replace the ECC bits under - , the invention would still work.
The code discovered is not, therefore, unique to the representation in FIGS. Thus each number in the table specifies 4 bits of the syndrome code needed to indicate a particular error within a family. There are 36 families 0 - 35 since there are 4 families for the check bits 32 - The 15 possible error codes are specified in the left-most column and the family DRAM number is specified along the top.
This table thus specifies the correctable errors the preferred embodiment of the invention can handle and how they are based on the syndrome generated in accord with the preferred form of the invention.
Diagram 40 shows the path of data into the DRAM or RAM device 31 , in which 16 check bits are generated in block 42 , and both the check bits and the original bits of data are sent to the memory device on lines 43 , and 44 , respectively.
Retrieving the word after being stored in memory involves check bit regeneration and comparison 45 , 46 , and based on the syndrome produced, decoding for correctible errors in the data word and the check bits, 47 , 48 , along with production of a tentative no-error signal The bit data word is corrected if it can be in data correction block 35 where one of the syndrome codes is produced to specify which bits need to be corrected.
Also, error status detection is performed in block 36 , generating an indicator signal showing that there was an error, and whether it was correctable and corrected or uncorrectable.
The syndrome can be reported out to indicate which bits are bad if desired. The preferred embodiment works with memory that uses two standard bit DIMMs. These common DIMMs provides straightforward implementation for having a bit word where there are data bits and 16 check bits.
Utilizing standard DIMMs reduces the cost of the system greatly, adding to the value of this invention. Under this two DIMM organization 16 check bits are generated for every bit word that is sent into the memory. Check bits are the calculated odd parity across a specific pattern of RAM bits. After the 16 check bits are generated, using the error correction code table in the table 20 of FIGS.
Table 20 shows the inventive ECC code that is used in the form described by the preferred embodiment. When retrieving data words the process of generating check bits is repeated with a twist. Check bit regeneration occurs using Read data bits [ These regenerated check bits are compared bit-for-bit to the stored check bits, bits [ The comparison, using an XOR function results in a bit syndrome code. A determination is made of which bits or family of bits in the bit data-word may be in correctable error when the syndrome code is decoded.
Refer now to FIG. Thus, for the first XOR gate branch 51 for the tree 50 , bits 0 , 1 , 4 , 8 , and so on, to bit of the bit data word, as specified by the top line of FIG.
A branch is constructed in this manner for each bit of the 16 ECC bits. For heuristic purposes only branches comprising XOR gates 51 and 52 that produce check bits 0 and 15 , respectively, are illustrated. Thus, from an input line 53 of bits 0 - , 16 output bits are produced on line This accomplishes the function of block 42 of FIG.
As mentioned previously, the code word of FIGS. For this check generation module, such shifting to produce a code of the same effect but different form than the one of the preferred embodiment would be reflected in a changed distribution of the inputs to the 16 branches of the tree corresponding to the swapped values of the code. In all events, the check bit code after being generated, should be stored in RAM memory devices related to the ones used for storing this particular memory data word, thus requiring plus 16 bits per word for storage.
When one wants to retrieve the word from the memory, the process employs the pieces described with reference to FIGS. We illustrate alternate embodiments in FIGS. In 6 A, the parity tree is labeled 60 A, having two gates 61 A and 62 A. Mathematically these are the same but the variation of FIG. Again, as in FIG.