AI Demystified for Executives

#8 RAG Deep Dive: Transforming Business with Retrieval Augmented Generation

Andrew Psaltis

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 36:55

This episode of the AI Demystified for Executives podcast delves into Retrieval-Augmented Generation (RAG) and its strategic importance for businesses. It covers the components of a RAG system, various use cases such as enhancing productivity and speeding up onboarding, and the challenges and steps for implementation. Real-world examples, like Morgan Stanley's use of RAG, illustrate the potential benefits. Key points include vector databases, document processing, and hybrid retrieval approaches, alongside practical tips for evaluating and implementing RAG in enterprises.

  📍  Dragonfly Risings school of data proudly presents the AI demystified for executives podcast. This is the podcast for executives who want to learn how to apply aI to their business. 

I'm your host, Andrew Psaltis. And this week's topic is rag retrieval, augmented generation. Making AI systems business ready, this is our deep dive. Episode. 

In this episode. We're going to focus on six different things. The strategic importance. Of rag. For businesses. The components that make up a RAG system. 

Different use cases. 

The challenge is of course there's challenges. And then a getting started guide. And then we're going to wrap things up. 

So as we get started here. You could think of rag as your AIS personal research assistant. If you listen to the quick bites episode. Episode seven. We went over how rag you get used to provide. Context. For an LLM release gen AI applications. And it becomes an assistant with your enterprise data. 

So let's dig into that a little bit more and let's talk about the strategic importance. Of rag for businesses. 

There's again, a rack architecture really enables businesses to constrain. The generative AI applications. And to utilize. Enterprise specific content. So your domain specific content. When we talk about constraining them. This is one way of reducing hallucinations. Not because we're going to give the model. The enterprise content. That you wanted to use to generate its response along with the prompt. Now this content could be vectorized documents. It could be images. And we talked about vector databases several episodes ago. If you're not familiar with those, I'd encourage you to go back and listen to. Episodes five and six, where we talked about. Vector databases. She have an idea as to what a vectorized document is. 



Being able to do this and to constrain the model. To your domain specific data. Giving it, this content. Really crucial. Four. Being able to maintain the relevancy. In accuracy of AI driven content. This is really core to the value proposition. In the whole rag architecture. All right. This ability to dynamically retrieve. And utilize Roven information. From vast enterprise  data repositories. 

Now we'll talk in the challenge as when we say these vast enterprise. Data repositories. There's some work to do in many cases to make this. Ready. 

But we're really able to take this and make it so that. The responses are grounded and contextually aware. 

There's a data now is really becoming. 

Much more of a critical asset. We've talked for a while and you've heard I'm sure across the industry that, data's the new oil. And everyone has data. And how important your data is. This really does bring to the surface of how critical and asset your data is. When then you start to use it. With a rag system. 

And it's really the static nature of large language models. And their lack of awareness. Of the private and often proprietary information. That limits their usefulness. All right. So just having a chat cheap. Tea subscription. For your employees. It's not enough. And it's great that they can choose that, but that doesn't have any of your information. 

So using rag is a very practical way to overcome the limitations. I have these general large language models. 

Making enterprise data and information available to them for processing. 

It's pretty simple to understand. But this could be difficult to implement. As it's going to require several different technologies. In different techniques that may be new to your enterprise. 

It's also becoming really a competitive differentiator. 

But soon. This is going to be fundamental. In a necessary competency for any organization that's using generative. AI. So today it's still competitive and a differentiator. But it's not going to be too long before this is table stakes. So really understanding this, really working through the challenges in the organization, really getting your organization ready. Is going to be very important. 

So I would encourage you as we think through this, now you're thinking about the strategy. To reflect in the think of ease your organizational ready. 

Do you have content that's readily accessible? And I'm using content here or not data because it is really important. That it is content that's ready to be used. Not just data that's throughout your organization. But the content that's ready to be used. To enhance the responses and ground the responses of a model. 

So give that some thought and think about that and how that may be applicable in your. Environment and what things you may need to do. To start to bring the organization along. Let's talk next. About the components. That make up a rat architecture. Before we do that. And I'd mentioned in the previous episode. 

And if he didn't listen to that, he may be. Thinking as well of I've heard about fine tuning. Can I just find tune the model instead of using rag. There's trade-offs here to be made. Yeah, fine tuning. Is the right choice. When a vector embedding model, right? So a large language model. You want to update it to produce more accurate vectors, more accurate results. 

Now it may be. Applicable and you're updating a model. With publicly accessible data. That isn't likely to change. Or need too many updates often. 

Fine tuning a model that uses the right corporate vernacular. Is an example that could work. You know that new data's not often to change very much. And it's not in that case, like time constraint.  And it's the data set is most likely limited. 

You might also be able to use it when a certain task requires accuracy. To a degree that justifies throwing a lot of resources at it. Bloomberg did this when it built Bloomberg GPT. 

But when your data needs to be protected. Private. And up to the second accurate. It really needs to be in one of your enterprise systems, not in the model. 

The reality is that AI in production ends up being. Personalized AI. 

But it's a company like a sky point, AI making medical recommendations. It's Macquarie bank giving you financial planning advice. It's Priceline helping you plan your travel based on your travel preferences. Things that are, Content in each of those cases that. Is being updated pretty regularly needs to be fresh and is proprietary. 

You're not going to want to use that. Per se to fine tune the model. 

So as we now look at the components. Of. Iraq system. One of the most common ones. Is going to be a vector database. 

Remember a vector is just a very long list of numbers. That has these embeddings of the content. So you could have a word that gets imbedded that IM. Bedding. It's really just a fancy way of saying we're going to convert this word into numbers. 

This enables these types of data stores. To be able to do natural language search. All right. Cause the vector database is going to find. Similarity between the vector you're searching for and the factors that it has. 

There's all different vector databases that are on the market. 

And depending upon what infrastructure you use. And what data stories you use and your enterprise. Some of them. Are now offering. Vector database capabilities. And so it's worth a look and it's worth asking. Your it team and talking with them of what are you using? For example, Postgres SQL. Very popular has vector database capabilities. Those are getting richer and richer. So maybe an option. But so there's different vendors that are in the market. All the database vendors as well are adding this capability. Something to consider something to talk to your it team about. And also something to make sure that it fits inside of that performance envelope. So one component of rag is a vector database. The other part of it. Is it document processing. Of how do you get the data in there? 

And our first class. So I mentioned briefly in the previous section about, the strategic imperative of this. The document processing. Is an important part. So the vector database. Holds vectors. It's designed for that. So there's going to need to be a processing of documents. Business documents. Into these usable formats into vectors and two embeddings. There's things you'll hear about chunking strategies. Yeah. 

 Chunking strategies you can think about, if you have a. Hundred page PDF. You want to be able to break that up? Into what's called chunks. That chunk may be. 256 words. Maybe it's a thousand words, maybe it's done on characters. There's a variety of algorithms. We're not going to dig it to them in detail here, but just remember that there's a way of breaking up these documents into chunks.  Therefore when you do search the vector database, For the most similar. Results the most similar chunks, if you will. Then you get back just that relevant part of the document. Think about it when you do a search in Google. And it shows a little abstract or needs your search result, right? 

Shows like this couple of sentences. You can think about that, as a chunk out, it often highlights what's relevant.  It doesn't show you the whole document. It shows you just a relevant part. Yeah. So just to keep in mind, as you're thinking about, we need to take our documents, our images, our content, across our enterprise. And some are going to need to get processed. And stored in a vector database. Again, The data store that you're using today. May have this capability. To be able to vectorize content for you. 

Okay. So those two pieces that are happening, there's the data store. Perfect database. There's a document processing that's going to happen. And then there's the retrieval process. Yeah. So the retrieval process is. 

The part of the system where you're taking what the user. Typed in their input. I had, it could be their prompt if you will. So I typed this into a chat window. You're taking that. Content that they typed in. You're then going to retrieve it. From the vector database. 

And then we're passing it to the model. 

One other thing to keep in mind in this flow, it's simple from just these handful of components. There's things to keep in mind as far as. The context length. So this is with these models. You'll see different documentation of their size of their context window. How much context they could take. That's talking about the number of tokens. Tokens. You could think about them as roughly about three quarters of a word. 

I know that might sound a little strange to have three-quarters of a word, but it's about three quarters of a word on

Average. That token. Represents about three quarters of a word. It's about how many words you can pass to the model as  context.  Another reason why. 

If I'm going to get data out of the vector database, I want. The part of the document that is really most relevant to what the user is asking for. Not the whole document. Now some of the context, windows of these models today, like Google's latest Gemini model. It could handle  2 million tokens. 

It's a tremendous amount of content. But you do want to think about breaking this up.  Using what you need. Let's put it that way. Being. Prescriptive and really being strategic about it. Besides the fact that you want to constrain that context to make sure it's the most relevant information. There's also a cost part to this. 

So if you look at the pricing or whatever, your pricing arrangement is with any of the vendors. Or if you're just paying as you go with any of them. You'll see it, they charge for tokens going in tokens, coming out. So the context you pass in our tokens, right? So there's going to be a cost associated with those. So just think through that as you do that. 

And as you're talking with your it team about this. Just give some thoughts. You okay. Great. Is there a trunking strategy, right? Are we getting the right results? Are we passing in enough context, but not an excessive amount of context, right? Where one is a constant two. You may be giving the model a lot of information that doesn't need to generate a response. 

When you think about these pieces that make up the components? The vector database. Document processing retrieval. A lot of enterprise data stores are really not suited for delivering this. 

So you could end up with all sorts of problems of how do you do this. And a lot of enterprises. May need to take a different approach. And then simply using keyword search or vector embeddings. All right. The vector embeddings are great. . But it's possible to add significant amounts of metadata. To these data stores. It could be a pretty labor-intensive process. So you have to think about. How to structure your data. To apply the metadata. And if you want to automate it, think about how to go about automating this process. 

Now any interesting. Way that you can start to handle this. So sometimes just a vector shirt. Doesn't provide back everything you want. Or it's not rich enough. 

There's new ways of looking at this of doing what's called  hybrid retrieval.  This could further enhance or be further enhanced by semantic ranking. That prioritizes the most relevant data. To feed to the large language model. 

Avoiding the excessive token fees. Okay. So you think about this would. Get all the data from the vector database that you're querying. And then after that, do a semantic ranking of it. To figure out the most valuable data. Use when answering a specific user prompt. 

This will improve. Or could improve. The output, Jeremy, by the model. 

This combined approach. Is being increasingly employed. By different vendors  so you'll see this in some of the products  in the market. And they could offer a much more efficient and affordable rag solution to enterprises that are building these solutions. 

 Microsoft research was achieving  a higher quality rag solution. When they combined the keyword search in vector embeddings. With output, prioritize by semantic ranking. 

So you could almost think about that as taking the user's prompt. Searching the vector database, getting back the results. And then it's like a post-processing that happens then. Then you take those results. So you got back. The top 20. Most relevant. Chunks. From the vector database. You would then use a semantic ranking. 

To refine that further. And maybe you end up with a top 10. Those then get passed to the model. Yeah, you can think about this as a hybrid retrieval with semantic ranking. 

Some of the vector databases  on the market will have this type of capability. So as you talk to your it team and think about this. Make sure to bring up  how are you going to handle this? Are you able to do. Semantic ranking. You may see it with the vector databases as a re ranking. So think about the first retrieval as. Ranking the data. And then you're going to re rank those results. Okay, so you may see it as hybrid retrieval. Symantec  ranking. You may see it as re ranking, but make sure to ask the questions. Make sure to ask. How are we doing this? 

Or how are we planning on doing this? 

Let's now move on to some use cases. And give some examples of ways that this is being used ways you could think about using it. And I'd encourage you as we go through this. The thing about how this may be applicable to your business. 

The first type of use case, you could think of an enhancing productivity. Through a collaboration of human and AI. Yeah, that sounds fancy. But if you think about us, Using. AI is our assistant and using rag to make it more of our assistant. It's just collaboration. 

Great example of this is what Morgan Stanley has been doing. And they've had. Documented use cases. Of using GPT. Working in conjunction with open AI. With internal data. Morgan Stanley, as you can imagine, has. Huge amounts of intellectual capital. Cause they published thousands of papers yearly. And across topics that include insights on capital markets. On asset classes on industry analysis. On economic regions. So a huge amount of content. That they are publishing all the time. Their team of financial advisors. Excels and is driven by serving their clients. 

. So you mentioned all this content. And they've turned around. They've trained. Using rag, gPT four.  In an interview with  CNBC Jeff McMillan, who's the co president. And head of Morgan Stanley wealth management. He said the tool. Would allow the bank 16,000 plus advisors. To make sense of its vast research and data collection. He went on to say you essentially have the knowledge of the most knowledgeable person in wealth management instantly. Think of it as having our chief investment strategist. Chief global economist. Global equity strategist. And every other analyst around the globe on call for every advisor every day. 

They believe, and I would absolutely agree that this is a transformative capability for their company. 

How does that apply to your business? Can you think of ways that you could use us?  Think about being able to take. All the content. That's created in your enterprise. And making it. Easily accessible. To every single knowledge worker. How can you do that? It's no longer searching an internet site to find a document. This is being able to ask questions. Like you've probably experienced with Chet GPG. 

If you've used it. Being able to ask questions. And get it back all about your internal documents. In the case of Morgan Stanley, this is across everything that's published. And you could find other use cases like this. From some of the large consulting firms where they're doing a very similar thing where they are very content oriented enterprises. And have knowledge workers that need to consume that information. 

I'm guessing there's probably an opportunity to think about this in your organization as well. 

When you take this another stab. You could think about using this as a different type of use case. Of speeding up the onboarding process for new employees. 

So you can think about all the different company documents at training materials. Different queries, questions that people have asked over time. 

To be able to use this for new hires. To quickly learn. All this information about the company to be able to have this assist them. And their learning process. As they are beginning their onboarding and your enterprise. So when they have a question about a company policy about project details, A rag system could be used to dynamically generate responses. Incorporating all the latest internal documents. In any previous or similar queries? That anyone had. But she can think about how this could speed up the onboarding process. This could reduce the load on unhuman trainers. We ain't giving more engaged, onboarding, and experience. Potentially a quicker, a simulation. Into the company's culture and the workflow as well. 

So that example of onboarding and the Morgan Stanley example of. 

Using all the content from the organization in a chat. Type tool with rag. Our two examples and use cases of you have the content and you're making consumption of it and use of it and interactivity with it. Much better. 

But there's other ways that you could use this technology that aren't about, I'm looking for something, or I want to ask questions of data and get it out. You could use these technologies for. Customer feedback analysis. So think about this scenario. You have customers.  They're providing feedback to your business. From all different types of sources, this could be surveys that you send out. 

You could have an internal customer database. They could have given you reviews. They could be mentioning things about your business on social media. It could be forums. It could be competitor websites. It could be calls that they, that came in. Think of all the different ways. That there may be signals from customers interacting with your brand. Whether it's directly interacting with you, whether it's social. All the different. Data points that are out there. If you collect all of these. You could build a system. That could use rag. Cherry trees. All of these, maybe it's a canned query. 

If you will. For. The executive team. 

And it could be for customer service support. It could be for all different levels of the enterprise. You can imagine. But imagine a system where you have a canned query. That every night. Goes through. Using rag. Grabs all. That content. Related to inquiries. And as a model to generate a summary and all the highlights for you. So every morning. You  get this enriched data. Really understanding the nuance, sentiments. And see recurring themes accurately. These models are great at looking at patterns. So imagine if you're able to do that. And stiffest that information. On a daily basis. 

So now this is using rag to generate content for you. Not to ask questions per se. But to take all this content from all these different sources. And produce a report for you. 

Okay. Three different types of use cases. You could think of two related to. I want to query this data and I want to interact and ask questions from the onboarding process. That could be a lot of different data to the Morgan Stanley example. Which you'll see a lots of different enterprises. If we have all these experts, how do I do this? In the quick bites episode I made reference to. Using this for. Field engineers that are supporting your product in the field. Same thing, right? 

It's a very similar to Morgan Stanley wanting to do in this with analyst taking that enterprise content and making it so that someone can easily leverage it. 

A more recent thing if just see this in action. If you use Chiechi PT. You notice they now have the search capability for your conversations. So that is also. Being done using this type of technology. This is from a company called rock set that they purchased that specialized in rag and having this type of search technology. 

So you'll see it everywhere. It's been used for helping with their retrieval. 

That's a couple of different use cases. So again, think through. For your business, what use cases? You can start to use this and just. Think way outside the box. I pretend there isn't even a box at all. How could you use the content that's in your enterprise and unlock it? How could you use. The content being created. About your enterprise. And really unlock it. This is not without its challenges. So there's absolutely challenges with these systems, easy to conceptually understand. I'm sure right now. If we were sitting together in a coffee shop. You'd be able to quickly draw out on a napkin for me, what this look like. I have no doubt about that. But as you start to implement this, there's going to be challenges. So you need to be aware of some of these challenges and really work with your it team.  Help them  ask the questions and probe. One of the top things that's going to come up. It's going to be quality assurance. 

How do you TASS the retrieval accuracy? 

When I mentioned about the components and said that you'll start to see  hybrid retrieval. Sentiment, ranking, re ranking. This is to help. Retrieval accuracy. This is driving towards having better retrieval accuracy. All right. So you're gonna need to think through. What is the retrieval accuracy? How are you going to measure that? How are you going to monitor that? How do you handle the edge cases? This is just think through some of those things. Absolutely work with your it team. 

Think through how will this be possible? In the implementation challenges, right? So the quality is you got the data in there. All right. It's in a vector database and maybe you're using a semantic search with that, and you are doing re ranking. I implementation challenge of how do you keep the data fresh? And up to date. No, how do we want to make sure as data's changing? It's getting into the vector database. 

You're getting stuff in bedded. You're being able to quickly use it. So if you're doing the. Articles that are being published. If you're building a system that helps understand brands, And you're reacting to all the social. Content that's out there. I use the example of maybe this runs. Every night. Maybe it runs more often. But how do you get it into the system? As security as well. 

All right. Security is always a concern. So we need to make sure that think through that. 

Scaling this and performance money. If you have a tremendous amount of content. 

What happens when you put it in a vector database? How does it perform? That's a really understanding what is the performance envelope for your organization? I had in the Morgan Stanley example, which could also very easily be. Your customer,  service. You're on the phone with a customer and they have a problem and it's real time. 

You're trying to help them. So as you're using that chat application that may be exposed to customer service. How do you make sure that it's fast? How do you make sure that it's scales is performance? How to make sure it's current. The data's fresh. I'm an Excel that makes sure that it scales. And the resources. 

This is going to take, understanding. What does the it team need to build this? And there needs to be a discussion. On a build versus buy. How much of this. Do you need to. Or can you build. And how much of it is possibly worth does buying. 

And have realistic timeline expectations. And need to figure out how do you measure the ROI? 

So certainly challenges. Very doable. Just need to think through these and talk with your it partners about how you're going to overcome these challenges. So we think a getting started guide. Yeah, I'd say, maybe like the first 30 days of doing this. Having an assessment. Of we are. Your enterprise systems today. Where is the content? What do you need to do to. Make this ready? If you think of 30 to 60 days, How do you measure success? 

What tools are needed. Where's your it team. Can they pull this off? And then 60 to 90 days can you have a POC of this? Can you put something together?  Pick a simple use case to start.  I'm a huge fan of kind of crawl walk, run. If you don't have this today. Think about that crawl step. Get simple wins. I think about. Who needs to be involved in a project? 

Really consider what does success look like? And then how do you measure ROI? 

Really think about that. And I'd be happy to talk to you about. Ideas, I'm measuring ROI. I think we'll do a podcast on that as well. Please reach out to me. If you have specific questions on that. Tools to use. Shameless plug at dragon fly rising. We're building a comprehensive ROI platform for these capabilities. But really think through. How do you do this? 

So in summary, 

Today. Rag is a competitive differentiator. Tomorrow. It's going to be table stakes. And everyone's going to be doing this. And it's going to be expected when we interact with a business. That this context is there for any gen AI solution. Your employees are going to expect it. And you're also going to expect it. I had, we're going to start to expect that. Why wouldn't. My interactions with this company. Include. All the interactions I've had. If I'm a new employee. Why wouldn't I be able to ask questions of all the internal documents and get up to speed. All right. 

Why wouldn't I be able to have the expertise of all of the company's experts right at my fingertips. I think about how much more productive that makes everybody. 

So in closing is strategically important to think about. We've gone over some of the core components. Vector database through retrieval process. The document processing that needs to happen. 

The context. That goes into. The model. At the evaluation of enterprise systems. And are they ready? 

I've been able to do hybrid. Rag. Semantic ranking. Are we ranking? We talked about some of the different use cases and really encourage you to think about how you could. Uses in your enterprise and really go outside of the box. I just think wildly about the waste as could possibly be used. Just let your imagination run. Some of the challenges. 

And working closely with. I T teammates. On how to pull this off. Supporting them on where their skills and it's not just them in their skills. But there's a business AI literacy that needs to happen as well. So it's on both sides, right? It's a business side that it needs help. And it's the AI side that the business needs help. 

That wraps up. Our discussion of rag. 

Hope you've enjoyed listening. Please remember to subscribe to this podcast wherever you like. Listen to your podcasts. You don't miss an episode. That's all for now.