Show HN: How we leapfrogged traditional vector based RAG with a 'language map' https://ift.tt/UGCnEI1
Show HN: How we leapfrogged traditional vector based RAG with a 'language map' TL;DR: Vector-based RAG performs poorly for many real-world applications like codebase chats, and you should consider 'language maps'. Part of our mission at Mutable.ai is to make it much easier for developers to build and understand software. One of the natural ways to do this is to create a codebase chat, that answer questions about your repo and help you build features. It might seem simple to plug in your codebase into a state-of-the-art LLM, but LLMs have two limitations that make human-level assistance with code difficult: 1. They currently have context windows that are too small to accommodate most codebases, let alone your entire organization's codebases. 2. They need to reason immediately to answer any questions without thinking through the answer "step-by-step." We built a chat sometime a year ago based on keyword retrieval and vector embeddings. No matter how hard we tried, including training our own dedicated embedding model, we could not get the chat to get us good performance. Here is a typical example: https://ift.tt/ekanVFN... If you ask how to do quantization in llama.cpp the answers were oddly specific and seemed to pull in the wrong context consistently, especially from tests. We could, of course, take countermeasures, but it felt like a losing battle. So we went back to step 1, let’s understand the code, let’s do our homework, and for us, that meant actually putting an understanding of the codebase down in a document — a Wikipedia-style article — called Auto Wiki. The wiki features diagrams and citations to your codebase. Example: https://ift.tt/mM2tCjz This wiki is useful in and of itself for onboarding and understanding the business logic of a codebase, but one of the hopes for constructing such a document was that we’d be able to circumvent traditional keyword and vector-based RAG approaches. It turns out using a wiki to find context for an LLM overcomes many of the weaknesses of our previous approach, while still scaling to arbitrarily large codebases: 1. Instead of context retrieval through vectors or keywords, the context is retrieved by looking at the sources that the wiki cites. 2. The answers are based both on the section(s) of the wiki that are relevant AND the content of the actual code that we put into memory — this functions as a “language map” of the codebase. See it in action below for the same query as our old codebase chat: https://ift.tt/ekanVFN... https://ift.tt/ekanVFN... The answer cites it sources in both the wiki and the actual code and gives a step by step guide to doing quantization with example code. The quality of the answer is dramatically improved - it is more accurate, relevant, and comprehensive. It turns out language models love being given language and not a bunch of text snippets that are nearby in vector space or that have certain keywords! We find strong performance consistently across codebases of all sizes. The results from the chat are so good they even surprised us a little bit - you should check it out on a codebase of your own, at https://wiki.mutable.ai , which we are happy to do for free for open source code, and starts at just $2/mo/repo for private repos. We are introducing evals demonstrating how much better our chat is with this approach, but were so happy with the results we wanted to share with the whole community. Thank you! https://twitter.com/mutableai/status/1813815706783490055 July 19, 2024 at 12:10AM
Comments
Post a Comment