Taiwan engineers defeat limits of flash memory

Dec 02, 2012 by Nancy Owano weblog
Credit: MemoTrek

(Phys.org)—Taiwan-based Macronix has found a solution for a weakness in flash memory fadeout. A limitation of flash memory is simply that eventually it cannot be used; the more cells in the memory chips are erased, the less useful to store data. The write-erase cycles degrade insulation; eventually the cell fails. "Flash wears out after being programmed and erased about 10,000 times," said the IEEE Spectrum. Engineers at Macronix have a solution that moves flash memory over to a new life. They propose a "self-healing" NAND flash memory solution that can survive over 100 million cycles.

News of their findings appears in the IEEE Spectrum, discussing 's limitations and the Taiwan company's solution. Macronix is a manufacturer in the (NVM) market, with a NOR Flash, NAND Flash, and ROM products. Before their solution announcement, though, many engineers inside and outside of Macronix were aware of a life-giving workaround: heat. The snag is that applying heat was not found to be practical. As the Macronix team put it, the "long baking time is impractical for real time operation." Although subjecting the cells to high heat could return memory, the process was problematic; the entire memory chip would need heating for hours at around 250 °C.


They redesigned a flash memory chip to include onboard heaters to anneal small groups of . Applying a brief jolt of heat to a very restricted area within the chip (800 degrees C) returns the cell to a "good" state. They said that the process does not have to be run all that often. According to project member Hang‑Ting Lue, the annealing can be done infrequently and on one sector at a time while the device is inactive but still connected to the . It would not drain a cellphone battery, he added.


Macronix estimates that the flash memory cells could beat the 10,000 cycle limit by lasting for as much as for 100 million cycles but a commercial product is not imminent. Instead, Macronix will present their approach—very high temperature in a very short time— this month at the IEEE International Electron Devices Meeting (IEDM) from December 10 to 12 in San Francisco. This is the forum for presenting breakthroughs in semiconductor and electronic device technology. Lue observed that in coming up with the approach, his team would not be able to lay claim to any new physics principle. "We could have done this ten years ago." He said it took merely a leap of imagination into a different "regime."

For their upcoming IEEE presentation, they said they propose and demonstrate a novel flash, where a high temperature (>800°C), and short time annealing are generated by a built-in heater. "We discover that a BE-SONOS charge-trapping NAND Flash device can be quickly annealed within a few milliseconds," they said. Their presentation is titled "Radically Extending the Cycling Endurance of Flash Memory (to > 100M Cycles) by Using Built-in Thermal Annealing to Self-heal the Stress-Induced Damage." The authors are H.-T. Lue, P.-Y. Du, C.-P. Chen, W.-C. Chen, C.-C. Hsieh, Y.-H. Hsiao, Y.-H. Shih, and C.-Y. Lu.

Explore further: Samsung mass produces industry's first 8-gigabit DDR4 based on 20 nanometer process technology

More information: m.spectrum.ieee.org/semiconduc… s-100-million-cycles
www.his.com/~iedm/program/sessions/s9.html

Related Stories

Toshiba launches 19nm process NAND flash memory

Apr 21, 2011

(PhysOrg.com) -- Toshiba Corporation today announced that it has fabricated NAND flash memories with 19nm process technology, the finest level yet achieved. This latest technology advance has already been applied to 2-bit-per-cell ...

Micron, Intel try out 50 nm NAND memory

Jul 25, 2006

Semiconductor giants Micron and Intel said Tuesday they were sampling the first NAND flash memory chips built on 50-nanometer processing technology.

Samsung Produces 60-Nanometer 8-Gigabit NAND Flash Memory

Jul 19, 2006

Samsung Electronics today announced that it has begun mass producing an 8-Gigabit (Gb) NAND flash memory device, providing a much larger and more affordable storage density for consumer and mobile applications ...

Elpida and Spansion develop 4-Gigabit NAND flash memory

Sep 02, 2010

Elpida Memory and Spansion today announced they have created the industry's first charge-trapping 1.8 V, 4-gigabit SLC (Single Level Cell) NAND Flash memory. This NAND memory, based on Spansion's MirrorBit charge-trapping ...

Recommended for you

Tech firm fined for paying workers $1.21 per hour

1 hour ago

A Silicon Valley company is paying more than $43,000 in back wages and penalties after labor regulators found eight employees imported from India were being treated like they were in an overseas sweat shop while they were ...

Some online shoppers pay more than others, study shows

2 hours ago

Internet users regularly receive all kinds of personalized content, from Google search results to product recommendations on Amazon. This is thanks to the complex algorithms that produce results based on users' profiles and ...

User comments : 24

Adjust slider to filter visible comments by rank

Display comments: newest first

eachus
3.6 / 5 (7) Dec 02, 2012
This is a huge breakthrough for SSDs (solid-state disk drives). The drive can use some of the "extra" storage to count the number of times each area is updated, and after a few thousand stores to one area, move the data elsewhere and thermal cycle to refresh that area. Since current SSDs move data around for load balancing, the only new feature is the ability to refresh the memory, area by area.

And now you have an SSD for servers that can be warrantied for decades. Magnetic disks will still be used for bulk storage (if you need to store terabytes or exabytes of data) but there is no reason for consumers to hesitate. (Actually, I have an SSD in both my laptop and in my desktop systems. Rather than use SSD RAIDs, I backup to a magnetic RAID. But I am an eary adopter. ;-)
nSpectre
1.3 / 5 (3) Dec 02, 2012
...but there is no reason for consumers to hesitate.

There already is no reason to hesitate. My 80GB Intel X25-M already has an MTBF of 1,200,000 hours. That's arguably what?... 50,000 days; 7,142 weeks; 137 years of continuous 24/7/365 duty. ;)
fmfbrestel
4 / 5 (3) Dec 02, 2012
...but there is no reason for consumers to hesitate.

There already is no reason to hesitate. My 80GB Intel X25-M already has an MTBF of 1,200,000 hours. That's arguably what?... 50,000 days; 7,142 weeks; 137 years of continuous 24/7/365 duty. ;)


but it will be constantly losing memory as it goes. Mean Time Before Failure, is the time between system failures. System Failures are failures not dealt with by the system. The chip has error correction codes and redundant storage which allow it to lose memory sectors without a system failure. But at the end of that 137 years your going to have a chip that can store only about a kilobite of data.
secretevilplans
Dec 02, 2012
This comment has been removed by a moderator.
nakulD
4.9 / 5 (8) Dec 02, 2012
...but there is no reason for consumers to hesitate.

There already is no reason to hesitate. My 80GB Intel X25-M already has an MTBF of 1,200,000 hours. That's arguably what?... 50,000 days; 7,142 weeks; 137 years of continuous 24/7/365 duty. ;)


I wish MTBF was calculated like that. You think Intel actually tested the device for 137 years before it failed i.e. had some data loss ? No! What they do is e.g. 1000 devices worked flawlessly for 1200 hrs and then one or more failed. Hence MTBF is 1000*1200=1,200,000 hrs. Welcome to deception in world of Storage
nSpectre
5 / 5 (1) Dec 02, 2012
Hence, I said "arguably" :)

Yes, I'm somewhat aware how engineers produce the "representative" MTBF. And how marketing (hack*spit) then snows us with those numbers. But still, unless you get the occasional problematic "lemon", your consumer-level device is going to long out-live your use of it.

I still have an old CDC Wren 500MB full-height hard drive from the '90s that works just great. But I only keep it around because its case gets so hot you can't touch it and it makes a delightful scream if you pound on the table next to it. =8-D

So, just sayin', with error correction and a reserve memory pool, as noted above, todays SSD's generally will far out-live their consumer usefulness.

As to the article, I find the solution elegant and "spiffy" for long term industrial applications. At some point mechanical drives will become obsolete and we can look to a successor such as holographic or "quantum" storage. :)
nakulD
3.3 / 5 (3) Dec 02, 2012
Hence, I said "arguably" :)


Agreed, consumer level devices hardly see any issues with SSDs these days. This research will mostly be targeted for industrial applications. Most consumer level devices like phones and laptops these days have average lifetime of 2-3 years which is quite low and top of that their usage of storage is also relatively less compared to 15-20 years back when internet penetration was way low and everything was served from the hard drive (or f/sloppy drive :p)

What would be interesting to see is how this heating affects the lifespan of the device itself. Nobody would like to see the whole device toasted in attempt to save some part of the drive. On the other hand we might see some reduction in storage density because now you have additional components. If some other storage technology catches up, then this research may not have many applications. But again, it took more than 50 years for someone to find an alternative HDD, you never know.
Jonseer
3.8 / 5 (5) Dec 02, 2012
What would be interesting to see is how this heating affects the lifespan of the device itself. Nobody would like to see the whole device toasted in attempt to save some part of the drive.....


You might want to read the article a bit more slowly, and this time for comprehension not just the facts :)

The description of how it works, especially the part where you can infer the size of the chip being heated as well as that its done in stages not all at once, make it pretty clear that your worries are nothing to worry about.
willrandship
2 / 5 (2) Dec 02, 2012
I want this. Badly.

So, it seems like the self-healing happens at off-times, which means that in some situations the chip may still fail by way of never having time to heal (extreme R/W operations.) Of course, easily avoidable.
jdub
3.2 / 5 (5) Dec 02, 2012
most of these comments really suck

eachus:
Since current SSDs move data around for load balancing, the only new feature is the ability to refresh the memory, area by area.
...
(Actually, I have an SSD in both my laptop and in my desktop systems. Rather than use SSD RAIDs, I backup to a magnetic RAID. But I am an eary adopter. ;-)


1) no they dont move data to load balance. they just dont store it sequentially. if you were looking at a map of where the os thinks it is compared to where it is in flash it is not the same... but an ssd will never move an existing bit cell to cell for the sake of balancing.

it will however write your next save to a diff cell so saving your word file doesnt burn out a section with in a week. (times exaggerated)

2) congrats on having 2 ssds in 2012...but early adopters existed 5 years ago and we did things like hang 4 of the first ocz vertex drives off a hardware raid 5 card (if we really wanted to get fancy.)

3)you missed the prefix peta
guiding_light
5 / 5 (1) Dec 02, 2012
So it has to be localized heating? That does not bode well for their 3D NAND.
VendicarD
2 / 5 (4) Dec 03, 2012
No manufacturers will use this process.

First, it adds to the cost of the Nand Flash chips themselves, and also the supporting circuitry.

Second, it will reduce their sales of SSD's since they will have practically unlimited lifetime.

Corporations maximize their profits by herding their customers like cattle.

It is called "market shaping" or "market sculpting", and producing products that are designed to fail is part of the process.
jjd
5 / 5 (1) Dec 03, 2012
No manufacturer [from Detroit] will use this process.
jjd
not rated yet Dec 03, 2012
Manufacturers, in specifying MTBF, tend to state duty cycle; and it is never 100%.
jjd
5 / 5 (1) Dec 03, 2012
3D NAND will benefit from any planar/2d process. The hotspot they introduce is very local; e.g: let's guess 4,096 flip-flops, or maybe 2,048. The relevant 2D subregion will simply have its own hotplate. For the 3D think, just stack'em, with the odd address pin stuck out per tier, and a data latch per tier.
nakulD
not rated yet Dec 03, 2012
You might want to read the article a bit more slowly, and this time for comprehension not just the facts :)

The description of how it works, especially the part where you can infer the size of the chip being heated as well as that its done in stages not all at once, make it pretty clear that your worries are nothing to worry about.


That's what the paper claims its not yet proven but I hope it proves correct.
Eikka
2.7 / 5 (3) Dec 05, 2012
No! What they do is e.g. 1000 devices worked flawlessly for 1200 hrs and then one or more failed. Hence MTBF is 1000*1200=1,200,000 hrs.


That's not what they do either. That's just silly.

The MTBF and the write endurance of a flash drive are two entirely different things. MTBF is derived from running the device at elevated temperatures to accelerate the failure rates, and then extrapolating how fast it would break down under normal conditions.

The write endurance is a factor of the distribution of write endurance of individual cells, which is not estimated by the MTBF.

1) no they dont move data to load balance.


Yes they do. Period. That's what the TRIM garbage collection is designed to do.
privacyisnaught
5 / 5 (1) Dec 06, 2012
Yes they do. Period. That's what the TRIM garbage collection is designed to do.


No it's not, TRIM is not that same as garbage collection, TRIM is however a function designed to reduce the number of read/writes from garbage collection.

If you delete a file on an SSD, the SSD must 'empty' the cell for it to be reusable (normal hard drives simply tag it as empty and overwrites the cell).
But to empty a cell on an SSD requires the entire cell-cluster to be emptied.
That's what's called a garbage collection, to get around the cell-cluster issue the SSD loads all the other in use cells in that cell-cluster into cache, clears the cell-cluster and rewrites them.

This causes to many read-writes since the cells are only checked when due to be rewritten.

To get around this there is TRIM, what it does is mark a cell as invalid when you delete a file.

Then the SSD knows not to relocate that cell during a garbage collection if another cell in that cluster needs to be re-used.
Eikka
1 / 5 (2) Dec 06, 2012
No it's not, TRIM is not that same as garbage collection


It is.

When you delete a file from an SSD, the drive does not know the data is gone. It still treats it like any other data, because it cannot read the filesystem's allocation table to discover what is garbage and what is not. In fact it doesn't even know which part of the data is the file allocation table.

The TRIM command tells the SSD which bits of data are no longer in use, so the drive can then move valid data out of partially filled blocks and compact it into full blocks in order to make empty blocks available before you actually need them to speed things up.

It functions kinda like defragmentation in traditional drives, and issuing the TRIM command starts the whole routine.
Eikka
3 / 5 (2) Dec 06, 2012
Also:
no they dont move data to load balance.


Yes they do. Even without TRIM, when the drive changes some of the contents of a block, it has to re-write the whole block. If it always wrote the block back in the same place, it would be quickly destroyed.

Instead, it picks a random block and swaps the contents between the two so that subsequent changes to the same data would get spread around the entire chip instead of grinding the same spot over and over again. It also helps to refresh cells that have been sitting there for a long time and may be at risk to bleed out their charge and lose their data.

More specifically, a typical SSD has a number of reserve blocks, or over-provision blocks that amount to around 7-30% of the entire disk space, that it uses to swap things around and replace bad blocks with. That way you can increase chip yields because they don't have to be all perfect.
Eikka
1 / 5 (1) Dec 06, 2012
TRIM is however a function designed to reduce the number of read/writes from garbage collection.


Without TRIM, there would be no garbage collection as the drive wouldn't know what is garbage and what is not.
Freddog
5 / 5 (1) Dec 06, 2012
This is a quantum step in the flash memory world. Very Impressive! What are the CSI tv shows going to do when they can no longer piece together the bits of text and photos the criminals delete?
privacyisnaught
not rated yet Dec 06, 2012
Without TRIM, there would be no garbage collection as the drive wouldn't know what is garbage and what is not.

I thought that before the introduction of the TRIM function, the SSD cached every other cell in a cluster except that one cell that it was trying to overwrite.
If cell #8-#10 in a cluster of 64 cells were going to be rewritten due to being previously deleted, the SSD cached every other cell in that cluster that had any data, regardless if those cells also included data that had previously been deleted. (making it cache and rewrite the entire cluster once per deleted cell)
This means that using TRIM (before a garbage collection) worked on a cluster scale that emptied every 'deleted' cell in that cluster at once.
Compared to before TRIM the garbage collection only emptied the 'deleted' cell that were currently going to be overwritten. (couldn't check the entire cluster for other deleted cells besides the one that were going to be used right then).

Good Discovery though.
Eikka
3 / 5 (2) Dec 12, 2012
I thought that before the introduction of the TRIM function, the SSD cached every other cell in a cluster except that one cell that it was trying to overwrite.


A flash chip is typically arranged in blocks, where each byte or even bits are separately addressable for random access writing or reading. The block size varies with devices. Clusters are a file system thing, and a flash drive can be formatted with a cluster size different from its block size.

Here's the thing. The file system will always read or write in entire clusters. If the cluster size is set at 4k, and the block size of the flash device is 4k, then it will always re-write an entire block. This is usually the case with common USB thumbdrives and memory cards, so you don't need garbage collection.

With SSDs, the blocks are bigger for cost reasons. Without TRIM, the drive would have to act as if the entire block was re-written because it has to move all the data out to another block to free the one being rewritten
Eikka
1 / 5 (1) Dec 12, 2012
The point is that erasing a block is much much slower than writing to one, so a modern flash chip tries to never erase on write. It finds a new block that is already empty and writes there, leaving the old one as it is.

Then it comes back and empties the blocks that are no longer in use, unless of couse there's still something in there. Hence why you need the TRIM with garbage collection.

Of course you could have garbage collection without TRIM, but what would be the point? It comes as a bundle, because implementing it just makes sense.