RAG works in two steps: retrieve relevant context (from a vector database, search index, or knowledge base) based on the user's query, then pass that context plus the query to an LLM to generate a grounded response.
Why it works: LLMs hallucinate when they don't know answers. RAG grounds them in actual data — your product docs, policy manuals, customer history. The LLM stops making things up because it has the real source material.
Indian B2B chatbots use RAG extensively: support bots answer from FAQ + ticket history, sales bots from product specs, internal bots from HR policies. Performance depends on retrieval quality (vector embeddings + filtering) more than LLM choice.