Contact us
Our friendly team would love to hear from you.
Day 1
Day 2
Day 3
Day 4
In this session as a whole, we will venture into a specific, highly sophisticated field of LLM usages called Retrieval Augmented Generation (RAG). We will learn in detail what this fancy-sounding term really means, how we can leverage such a solution in the real world, what’s the difference between using a normal LLM and a RAG system, what components such a tool consists of, and how to build it. The initial workshop aims to introduce the participants to the concept of RAG, its main components, and the LangChain library which is one of the commonly used tools for building the aforementioned system.
Beginner
The whole workshop has an introductory character, but the knowledge gained in the previous sessions is greatly recommended. Experience with LangChain is not obligatory, although one with it would find the discussed topic easier to understand. Overall, in comparison to the earlier sessions the difficulty could be placed somewhere between easy and medium.
The entire session is aimed at people who want to learn about RAG, how to build its main and additional components, how to improve the system and overcome its challenges, and most importantly – what advantages over the casual use of LLMs it provides.
This initial day is the introduction to the topic, which should bring all participants up to speed and allow them to squeeze the most out of the following, more specific workshops in this session.
The characteristic part of the whole session is that after the initial introduction, you learn something new each day, and while those things are separate and the participants could benefit from taking them separately, it’s much better to approach them as one, elaborate RAG guidebook.
After getting to know the basics of Retrieval Augmented Generation on the first day, we’re ready to dive deeper into its architecture. We’ve already built a demo from pre-prepared parts, but what if those elements don’t satisfy our needs? In this workshop, we will explore the issues related to the scenario when you don’t have your own data, we introduce a different, more efficient vector store and we show which evaluation metrics might be useful when assessing the retrieval module.
Beginner
The difficulty level of the workshop is low. Participants with prior knowledge of metrics’s mathematical formulas or Weaviate SDK may complete the day quicker, but it’s easily manageable for those without such background. Once again, experiences from the second session about prompt engineering and handling LLMs in general may come in handy.
As we continue the journey into the RAG domain, we learn a bit more about the retrieval part of the system: an alternative way to obtain training and/or testing data, how to enhance it with the leverage of metadata, and finally how to evaluate it separately from the RAG as a whole.
Bi-encoder
– Fine-tuning
– Evaluation
– Hard negatives
More on how to boost our retriever, this time from a different perspective.
We will start with building a simple embedding-based retrieval which will serve as a baseline. We will then evaluate it on the test set and analyze what kind of errors it makes. This will help us understand the limitations of the simple retrieval model and the data we are working with. We will also explore how the choice of chunking strategy can affect the retrieval performance. Next, we will go back to the basics and learn about lexical search and when it can be used to improve retrieval. Then, we will add another component to our retrieval pipeline: the reranker. We will learn how to use it and how it can improve the retrieval performance. Finally, we will come back to the embedding-based retrieval and see how to fine-tune it to further improve the performance.
Intermediate
From now on, things get more serious, but only just a bit. Once again it is the introduction of new terms, concepts, and ideas that the participants might find difficult to grasp at first. The code part is easily manageable by all who’ve participated in the previous workshops of this session. There are also references to the first session of the training, so it should be much easier for those who participated in the whole training cycle.
Similarly to the previous day, this is another part of broadening the knowledge we gained since the first workshop of the session. Here we introduce new components and showcase how to adjust them to our needs. All of those who want to customize RAG for their own needs could benefit greatly from this workshop.
This time, we focus on yet another part of the system – the generator.
We will start with an overview of the metrics used to evaluate the generated responses. In particular, we will focus on how to use Large Language Models (LLMs) to analyze the generated responses and compare them to the ground truth. Then, we will do a short recap and build a simple RAG system which will serve as a baseline. We will evaluate it on the test set and analyze what kind of errors it makes. In the next stage, we will explore the context created from documents returned by the retrieval model and analyze how the quality of the context affects the generation performance. Next, we will fine-tune the generator model to align it to the expected answers and improve the generation performance. Finally, we will explore several other extensions to the RAG model.
Advanced
Due to the usage of LoRA this might be the most code-advanced of the workshops of this session. The difficulty of other, more complex RAG components discussed there also serves to increase the overall difficulty. All in all, it’s safe to say that it’s the hardest of the workshops in the third session, thus its level is estimated as medium bordering on hard.
Yet again, we learn something new about the concept, though, out of all workshops in this session, this one is most suitable for the more experienced RAG users, due to the complexity of the introduced components, which while not exactly essential, will bring the system to a whole new level of user-friendliness, usability and security.
Interested in mastering Large Language Models? Fill out the form below to receive a tailored quotation for a course designed to meet your specific needs and objectives.