OR-1 dataflow CPU sketch

Goals:#

  • dynamic dataflow CPU achievable with discrete logic
  • One approach:
    • Basically...what if we'd had some sort of breakthrough that made content-addressable memory chips of reasonable size and cost available before VLSI was reasonable for CPUs
    • Likely: memory chips from the 90s/00s, rest of the logic much older components
  • Potential ways to make more "period-accurate":
    • additional layer of older components to produce content-addressable memory system or key-value store out of more conventional SRAM, etc.
  • https://web.archive.org/web/20250726013855/https://morethanmoore.substack.com/p/efficient-computers-electron-e1-cpu - has dataflow 'fabric' tiles with external SRAM, icache, a control core, and other peripherals

Reference Papers#

Reference Diagrams#

  • ![[Pasted image 20250914104445.png]]
  • ![[Pasted image 20250914105703.png]]
  • ![[Pasted image 20250914105716.png]]

Ideas#

  • multiplicative hashing
    • could reasonably i think implement something like multiplicative hashing (either naive unsigned hash(unsigned K) { return (a * K) >> (w - m); } or better distribution unsigned hash(unsigned K) { K ^= K >> (w - m); return (a * K) >> (w - m); } ) without too much logic. but then you end up with lots of wasted space in your RAM chip, because your keys will be distributed across it. i have a weird and probably dumb idea where you basically do a combo of content+sequence addressing so you can store e.g. an array of bytes that represent a single value or a vector used sequentially in a contiguous area of an SRAM chip and access each word of it very quickly with a simple count-up from the base address. trying to figure out how to pack efficiently without basically having to implement a memory allocator in hardware.
  • namespace partitioning in hardware
    • least significant bits of address is hash/tag shifted right by a the next-highest power of two of the data length. most significant bits form the index into it. or vice versa. you end up with a bunch of wastage if all your values are tiny and you don't have a lot of them, but equally you can more or less just...cache them indefinitely until overwritten, and for large values relative to your addressing space, you get good usage, up to a point (more or less when your hash is too few bits to not get a ton of collisions). and if you actually end up using a tag calculated at least partially by the compiler rather than a hash as the address, then you can get a lot more density of small values. could see a hybrid, where one half of the chip uses precalc tags, other half runtime hash, to accommodate intermediate values, etc.
  • use hardware impl of tries to do ip address-ish memory namespacing
  • one other thing to consider is serializing the bus to some extent. because our limit in terms of speed is in part down to the bad noise effects of breadboards (during the initial build) and the limits of the logic chips we're restricting ourselves to, it could be within the realm of reason to be able to make a serial interconnect that's fast enough to keep up, depending on how much we stretch the historical plausibility there.

Instruction Set#