Aum Oza

AI and ML Undergraduate

My Project Screenshot
From vectors to graphs, the hidden systems behind LLM memory.

How LLMs Manage Memory: Vector Databases, RAG, and Knowledge Graphs

Large Language Models (LLMs) have amazed us with their ability to generate content, summarize information, and even act like personal assistants. But one question keeps coming up: how do these models actually “remember” things?

In this blog, we’ll explore how memory is managed in LLMs by looking at vector databases, Retrieval-Augmented Generation (RAG), and how they compare with Knowledge Graphs.


A Quick Primer on Vector Databases

Before diving into memory, let’s get clear on vector databases.

How embeddings work

  1. From text to numbers: An embedding model converts unstructured data into high-dimensional numerical vectors (embeddings).

    • Why? Because embeddings capture the semantic meaning of data.
  2. Storage: These embeddings are stored in a vector database (VectorDB).

  3. The bottleneck: If you have millions of embeddings, retrieval slows down. Why?

    • Each query is turned into an embedding.
    • That query embedding is compared with all others in the database.
    • Similarity is measured by cosine similarity or Euclidean distance.
    • This means retrieval can take O(N) time in the worst case.
  4. The fix: vector indexes

    • Vector indexes pre-organize vectors in semantic space.
    • They only search the most promising “neighborhoods,” drastically reducing time.
  5. Approximate Nearest Neighbor (ANN) search

    • A strategy to get results that are “close enough” almost instantly.
    • Not 100% exact, but highly efficient.

Where Vector Embeddings Shine

Vector embeddings unlock powerful use cases:

  • Long-term memory for LLMs – storing and recalling past context.
  • Semantic search – results based on meaning, not just keywords.
  • Recommendation engines – surfacing relevant content (likely used in platforms like YouTube, X, or Instagram).

But Here’s the Catch: Why RAG Struggles as Memory

While RAG (Retrieval-Augmented Generation) is brilliant for knowledge retrieval, it struggles as a true memory system.

Why? Because semantic search can’t reason about why things change over time.

As @Zep_AI’s founder explains:

  1. RAG can’t track sequence or time.

    • Example:
      • User: “Love Adidas shoes.”
      • User (later): “Shoes broke in 2 months.”
      • User: “Switching to Puma.”
      • User: “Which sneakers should I buy?”

    The RAG-powered agent still recommends Adidas, ignoring that the user has already switched to Puma.

This happens because RAG retrieves based on semantic similarity, not temporal context.


Knowledge Graphs: The Alternative

This is where Knowledge Graphs come in. Unlike RAG, graphs can:

  • Track entities (users, products, brands).
  • Represent relationships (user → switched brand → Puma).
  • Handle time-aware reasoning, ensuring recommendations evolve with context.

By combining vector embeddings (for meaning) and knowledge graphs (for reasoning), we can build memory systems that are both intelligent and context-aware.


Conclusion

Vector databases and RAG are incredible for retrieval and semantic search. But when it comes to true memory for LLMs, they fall short mainly because they can’t capture temporal evolution.

That’s where Knowledge Graphs shine, bringing structure, relationships, and time awareness into play.

The future likely lies in a hybrid approach: RAG + Knowledge Graphs, enabling LLMs to not just remember but also reason and adapt over time.


Questions for Reflection

  • Can embeddings alone ever be enough for LLM memory?
  • How might hybrid systems (RAG + Knowledge Graphs) shape the future of AI assistants?

Further Reading