Dynamic Neural Memory for In-Context Learning: SSMs or Transformers?
Can we combine features of both SSMs and Transformers to build a graph-like hierarchical memory mechanism with sublinear growth? Each node represents a concept, and edges connect concepts similar to an associative memory. The graph is traversed with attention mechanism to retrieve or add information.