New 'DRAM+' memory designed to provide DRAM performance with SSD-like storage capabilities, uses FeRAM tech

simple@lemm.ee · 3 months ago

New 'DRAM+' memory designed to provide DRAM performance with SSD-like storage capabilities, uses FeRAM tech

carl_dungeon@lemmy.world · 3 months ago

Oooh! Two “disruptive”’s and a “game changer”!

brucethemoose@lemmy.world · edit-2 3 months ago

Yeah, it’s a solution in search of a problem.

Is it much cheaper than DRAM? Great! But until then, even if it’s lower power due to not needing the refresh, flash is just so cheap that it can scale up much better.

And I dunno what they mean by AI workloads. How would non volatility help at all, unless it’s starting to approach SRAM performance.

Some embedded stuff could use it, but that’s not a huge margin market.

Optane was sorta interesting because it was ostensibly cheaper and higher capacity than DRAM, albeit not enough.

just_another_person@lemmy.world · 3 months ago

This type of memory creates a new kind of state capability for HPC computing and huge core-scaled workloads, so maybe that’s why you’re confused.

HP basically created the physical use-case awhile back with something called The Machine. They got up to the point of having all the hardware pieces functional and even built a Linux-ish OS, but then needed customers before getting to tackling the memory portion. Hence, why this type of memory tech exists.

We’re in a bit of weird time right now with computing in general where we’re sort of straddling the line between continuing projects with traditional computing and computers, or spending the time and effort to attempt to adapt certain projects to quantum computing. This memory is just one hardware path forward for traditional computing to keep scaling outward.

Where it makes the most sense: huge HPC clusters. Where it doesn’t: everywhere else.

I assume the author mentions “AI” because you could load an entire data set into this type of memory and have workers and have many NPU cores or clusters working off of the same address space without it being changed. Way faster than disk and it eliminates the context switching problem if you’re sure it’s state stays static.

brucethemoose@lemmy.world · edit-2 3 months ago

How is that any better than DRAM though? It would have to be much cheaper/GB, yet reasonably faster than the top-end SLC/MLC flash Samsung sells.

Another thing I don’t get… in all the training runs I see, dataset bandwidth needs are pretty small. Like, streaming images (much less like 128K tokens of text) is a minuscule drop in the bucket compared to how long a step takes, especially with hardware decoders for decompression.

Weights are an entirely different duck, and stuff like Cerebras clusters do stream them into system memory, but they need the speed of DRAM (or better).

just_another_person@lemmy.world · 3 months ago

I think you’re stuck in the traditional viewpoint of a computer being CPU+Mem+Storage. That’s fine for a single machine that a regular user would have.

This type of memory could essentially wipe out the need for traditional deployments in datacenters by having memory banks of this stuff operating with many CPUs as a client on a bus with no local storage needed, so just CPU+Mem and everything loaded into a known state via network storage that won’t go away if something loses power or crashes. It would definitely make the current idiotic use of GPUs more cost-effective and less wasteful.

If you try and take that down to a regular user needing a use-case, it’s really only going to matter for developers building things for such a system because it’s such a new idea having stateful memory. You may just be thinking about it like a single user, which is not what it would be used for at all (at first).

To your other question about the actual speed: current memory speeds only need to be that fast because of the storage involved and shuttling data across a bus between the three parts. Getting this new type of stateful memory to higher speeds than a current storage device would already show a performance benefit because you’re removing one step in the total transfer path between all three points and just having the two. So really a speed of something higher than SSD but slower than current DDR speeds should still see a benefit in theory.

Overall, this has been a path for things for quite awhile, and they’ve obviously got to get some sheets out to explain the performance and efficiency benefits still, and it will require a complete rework of how current CPUs and bridge controllers work…it’s quite a ways off from being an everyday product.

brucethemoose@lemmy.world · edit-2 3 months ago

You are talking theoretical.

A big reason that supercomputers moved to a network of “commodity” hardware architecture is that its cost effective.

How would one build a giant unified pool of this memory? CXL, but how does it look physically? Maybe you get a lot of bandwidth in parallel, but how would it be even close to the latency of “local” DRAM busses on each node? Is that setup truly more power efficient than banks of DRAM backed by infrequently touched flash? If your particular workload needs fast random access to memory, even at scale the only advantage seems to be some fault tolerance at a huge speed cost, and if you just need bulk high latency bandwidth, flash has got you covered for cheaper.

…I really like the idea of non a nonvolatile, single pool backed by caches, especially at scale, but ultimately architectural decisions come down to economics.

just_another_person@lemmy.world · 3 months ago

It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered…it’s quite a reduction when you’re thinking at large scale. If AWS or Google could reduce their energy needs by 33% on anything, they’d take it in a heartbeat. Thats just assuming this would/could be used somehow as a drop-in replacement, which seems unlikely. Think of an SoC with this on board, or an APU. The premise itself reduces cost while increasing efficiency, but again, they really need to get some sheets out and productize it before most companies will do much more than simply do trial runs for such things.

brucethemoose@lemmy.world · edit-2 3 months ago

It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered

And here’s the kicker.

You’re supposing it’s (given the no refresh bonus) 1/3 as fast as dram, similar latency, and cheap enough per gigabyte to replace most storage. That is a tall order, and it would be incredible if it hits all three of those. I find that highly improbable.

Even dram is starting to become a bottleneck for APUs, specifically, because making the bus wide is so expensive. This applies to the very top (the MI300A) and bottom (smartphones and laptop APUs).

Optane, for reference, was a lot slower than DRAM and a lot more expensive/less dense than flash even with all the work Intel put into it and busses built into then top end CPUs for direct access. And they thought that was pretty good. It was good enough for a niche when used in conjunction with dram sticks

just_another_person@lemmy.world · 3 months ago

No, you misunderstood. A current standard computer bus path is guaranteed to have at least 3 bus paths: CPU, RAM, Storage.

The amount of energy required to communicate between all three parts varies, but you can be guaranteed that removing just one PLUS removing the capacitor requirement for the memory will reduce power consumption by 1/3 of whatever that total bus power consumption is. This is ignoring any other additional buses and doing the bare minimum math.

The speed of this memory would matter less if you’re also reducing the static storage requirement. The speed at which it can communicate with the CPU only is what would matter, so if you’re not traversing CPU>RAM>SSD and only doing CPU>DRAM+, it’s going to be more efficient.

pelya@lemmy.world · 3 months ago

But it’s very convenient! When you have a BSOD, you don’t need your core dumped, you simply unplug your DRAM+ and send it to Microsoft using paper mail.

muusemuuse@lemm.ee · edit-2 3 months ago

Didn’t Intel do this with 3D cross point or something like that? Then it failed and was repurposed to optane, which also flopped?

brucethemoose@lemmy.world · 3 months ago

Yes because ultimately, it just wasn’t good enough.

That’s what I was trying to argue below. Unified memory is great if it’s dense and fast enough, but that’s a massive if.

surph_ninja@lemmy.world · 3 months ago

That’s cool, but does it have the same life cycle limitations as an SSD? Is this RAM gonna burn out any sooner?

ag10n@lemmy.world · 3 months ago

Ahh yes, optane redux

IndustryStandard@lemmy.world · 3 months ago

Optane 2 electronic boogaloo

Dojan@lemmy.world · 3 months ago

I’m here for the shade thrown in the comments, haha.

RememberTheApollo_@lemmy.world · 3 months ago

I wonder if this would improve suspend or sleep features on devices. Last state is held in memory, ready to go.

MonkderVierte@lemmy.ml · edit-2 3 months ago

deleted by creator