(Presenting Wednesday, January 23rd, at 2pm EST) Trying to break the memory wall includes various efforts to bring memory closer to the processor, or to push processing into the memory stack, generally called Processing in Memory (PIM). If the memory is one the various "flavors" of non-volatile emerging memory technologies such as: spin-transfer torque RAM (STTRAM), phase-change memory (PCM), resistive RAM (RRAM), memristor, 3DXpoint, etc., limited endurance becomes an important issue in addition to all other general challenges common to PIM. Endurance refers to the fact that most non-volatile memories, including all these new emerging technologies, but also more traditional ones like Flash and EEPROM, have a limited lifetime in terms of how many times they can be written and erased - once the limit is exceeded, the number of faults in the memory increases rapidly and the memory device can no longer be used reliably. For storage applications (the main use of non-volatile memories until now) one way to deal with the limited endurance is to overprovision the device (i.e. leave some of the native capacity unutilized up-front and allocate it later as memory blocks start failing due to the limited endurance) and to use wear leveling by adding a level of memory virtualization in the form of a Flash Translation Layer (FTL) that maps logical blocks to physical blocks in a dynamic way such that the write/erase cycles are more or less equally distributed across the physical blocks such that no single block gets overwritten too many times in a row. Although the concept of an FTL was first introduced for Flash, similar mechanisms will work (and likely be necessary) for all emerging memory technologies with limited endurance (e.g. although not explicitly stated it is likely that the Intel Optane 3D XPoint is using a similar mechanism for wear-leveling, etc.). FTLs are an OK solution for storage applications but are suboptimal (to say the least) for main memory applications, and even more so for processing in memory, both from a latency point of view during normal logical-to-physical mapping, but especially because of the extra long delays necessary for moving data when a logical block needs to be re-allocated to a new physical block. Because of this methods that intrinsically compensate for the limited endurance would be especially preferable for PIM.
One such method is to take advantage of the recovery mechanisms associated with the stress that leads to the limited endurance in the first place. Since stress and wearout are mechanisms that take a device out of physical equilibrium, it turns out that simple thermodynamics tends to partially reverse the effect of stress when the stress is removed. This is a general physics argument that has been experimentally demonstrated for several wearout mechanisms, including Flash wearout, but also other more general ones, such as NBTI/PBTI, hot electrons, electromigration, etc. In this talk I will go over several of these mechanisms and explain the source of stress and ways to reverse it. The main idea is to go beyond simple passive recovery (just remove stress and wait) by reversing the direction of stress (active recovery) and accelerate the process (e.g. by increasing temperature). Such accelerated active recovery can lead to many orders of magnitude improvement in endurance, thus making processing in emerging technology memories practical.