Unleashing the Power of Azure OpenAI and Service Embedding for unstructured document search


Are you struggling to find the information you need to be buried deep within unstructured documents? It’s time to unleash the power of Azure OpenAI and Service Embedding! In this blog post, we’ll show you how to harness the latest advancements in AI and cloud-based services to transform your unstructured document search.

With Azure OpenAI’s advanced natural language processing and Service Embedding’s powerful integration capabilities, you’ll be able to extract meaningful insights from your documents and make informed decisions with ease. Whether you’re in healthcare, legal, or any other industry that requires in-depth document analysis, this session will give you the tools you need to unlock the full potential of your data.

So, join us on this exciting journey into the world of unstructured document search and see for yourself how Azure OpenAI and Service Embedding can revolutionize the way you work with information.

Imagine you have a bunch of documents with different formats and structures. These can include contracts, letters, emails, social media posts, articles, and more. The unstructured document can contain various types of data, like text, images, audio, video, etc. So it will be challenging to find specific information from these documents because they aren’t organized in a consistent way.

To solve this problem, you can use one approach which is “Azure OpenAI and Service Embedding” which understands human language and can generate text. By embedding this service into your own tools, like search engines or chatbots, you can make it easier to search and analyze these unstructured documents. This means you can find the information you need more effectively, even if it’s expressed in different ways like sentences, pictures, or even videos.

Now, let’s go deeper and learn how Azure OpenAI and Service Embedding will help you to solve your unstructured document search problem.

What is LLM?

The large Language Model is an Artificial Intelligence system that is designed to understand and generate human-like text based on huge training data. These models are trained on massive datasets that contain a wide range of text sources such as books, articles, websites, and other written material.

One famous example of an LLM is OpenAI’s GPT series, such as GPT-3.5. These models are very powerful because they have billions of settings that allow them to understand and create complex and detailed text.

In a Large Language Model (LLM), the task is to predict the probability of the next word in a given sequence of words. The notation used in the statement is as follows:

Given a sequence of words ​ x1, x2, ….., xn ​compute the probability distribution of the next word ​P(Xt+1 | X1, X2, …, Xn)​.

For example, if the sequence is “I Love India”, X1 is “I”, X2 is Love, X3 is India, and so on. The sequence can be any length. Then it will calculate the probability distribution of the next word (Xt+1) given the previous words X1, X2, …, Xn. The probability distribution provides the likelihood of each possible word occurring next in the sequence, given the context of the previous words.

A Large Language Model (LLM) uses its training data to learn patterns and estimate the probabilities of words following a given sequence. After training, it can take a partial sequence as input and generate a probability distribution over possible next words. This allows the model to generate text that makes sense by selecting the most likely word or using sampling techniques.

What is the Prompt and Composition of the Prompt?

 A prompt is an input instruction provided to the AI model to generate a response or perform a specific task. Or you can say it serves as the initial context for the AI model to generate text or perform a task.

A prompt can be of various forms depending on the application. It can be a question, a statement, a partial sentence, or any specific instruction that guides the model’s output. For example, in a conversational setting, a prompt could be “Tell me a joke” or “What is the weather today?”.

The composition of the prompt includes:

  1. Instructions
  2. Context
  3. Input Data
  4. Output Indicator

Combining the above composition in a prompt, the user can provide specific instructions, relevant context, necessary input data, and guidance on the desired output.

Instructions: This Means which Task needs to be performed​

For example, the instruction can be “Write a product review for a new coffee maker”, “Summarize the main points of this article” or “Answer the following question”. The instruction should be clear, concise, and unambiguous for the model to understand what is expected from it.

Context: Is an additional Information to Fine-tune the Context​

For example, the context can be “The coffee maker is called ‘SmartBrew’ and it has a programmable timer and a built-in grinder”, “The article is about the latest trends in AI research” or “The question is from a trivia quiz”. The context should be relevant, accurate, and sufficient for the model to perform the task well.

Input Data: This means the Data that the Model will Process​

For example, the input data can be “I bought this coffee maker last week and I love it” or “Artificial intelligence (AI) is a branch of computer science that deals with creating machines and systems that can perform tasks that normally require human intelligence” or “Who is the author of ‘The Hitchhiker’s Guide to the Galaxy’?”. The input data should be valid, complete, and consistent for the model to generate meaningful output.

Output Indicator: This means how the output should be generated.​

For example, the output indicator can be “The product review should be positive and have at least 100 words”, “The summary should be one paragraph long and highlight the key points of the article” or “The answer should be a single word or phrase”. The output indicator should be specific, measurable, and achievable for the model to generate a satisfactory output.

Consider the below example which will demonstrate the correct composition of a prompt:

Prompting Types

  1. Zero-Shot Prompting: In this, a model is given a task without any additional information and it attempts to generate a response based on its existing knowledge.
  2. One-Shot Prompting: In this, a model is provided with a single example to generate a response, along with instructions, to guide its output.
  3. Few-Shot Prompting: In this, a model is provided with a few examples along with an instruction to guide the model’s output.

Chain of Thoughts

Chain-of-thought is a technique that allows large language models to handle complex tasks involving arithmetic, commonsense, and symbolic reasoning. It enables the models to follow a logical sequence of thoughts or steps to arrive at solutions or answers.

This approach is particularly useful when dealing with intricate problems that require multiple steps or considerations. It helps the models reason through the problem by connecting different pieces of information, drawing upon their knowledge base, and applying logical reasoning.

Let’s go into detail.

What if you want to solve a complicated problem, like a math word problem with multiple steps? What you will do? First, you will break it down into smaller parts and then solve each step before reaching the final answer. For example, Jason has 10 flowers. If Jason gives 4 flowers to her mom and then Jason gives 5 flowers to her dad, you will calculate the number of flowers he has left. The answer will be 1.

So, the main goal is to teach AI-language models to do something similar to Chain-of-Thought. This refers to a coherent series of intermediate reasoning steps that lead to the final answer to a problem. In the end, the large language models can generate chains of thought when provided with examples that showcase this kind of reasoning.

In the above image, a model solves a math word problem by following a chain of thought. This chain of thought helps the model arrive at the correct answer, even though it initially would have given the wrong answer. The standard prompting did not provide the correct answer. The chain of thought resembles a solution and can be seen as one. It is called a chain of thought because it imitates a step-by-step thinking process that leads to the correct answer.

Difference between Standard Prompting and COT Prompting

Standard prompting and CoT prompting are two techniques for refining the input of large language models (LLMs) to produce the desired output, without updating the actual weights of the model as you would with fine-tuning.

Standard PromptingCOT Prompting
Provides examples of input-output pairs for a given taskProvides examples of input-output pairs with intermediate reasoning steps for a given task
Expects the LLM to directly produce the output for a test inputExpects the LLM to generate intermediate reasoning steps and then produce the output for a test input
Works well for simple or single-step tasksWorks well for complex or multi-step tasks
Does not require a large model size to perform wellRequires a large model size (around 100B parameters) to perform well
Does not reveal the LLM’s thought process or logicReveals the LLM’s thought process or logic
Example:
Question: John has 12 apples. He gives 3 to Mary and 4 to Tom. How many apples does John have left?
Answer: 5
Example:
Question: John has 12 apples. He gives 3 to Mary and 4 to Tom. How many apples does John have left?
Answer: Step 1: Subtract the number of apples John gives away from the number of apples he has.
Step 2: Calculate 12 – 3 – 4.
Step 3: The answer is the result of step 2.
The Answer will be 5

What is ReAct (Reason + Act)?

ReAct (Reason + Act) is a framework designed to enhance the capabilities of large language models (LLMs) like GPT-3, GPT-3.5, or GPT-4 by enabling them to perform multimodal reasoning and take actions with the help of various tools and plugins. It allows LLMs to access real-time information, execute code, integrate with external services, and accomplish complex tasks by incorporating external expertise and data sources.

The ReAct framework is built on the following principles:

  1. Prompt with Tool List: The prompt includes a list of tools and a structured format that enables the LLM to execute actions iteratively until the original question is answered.
  2. Completion with Actions: The completion consists of the LLM’s response along with the actions performed by the LLM using the specified tools.
  3. Scratchpad for Iteration: A scratchpad is used to store the prompt and completion for each turn, serving as input for the subsequent iterations of the process.

Here, ReAct is utilized in conjunction with Azure OpenAI Service embedding models for the document search tasks. Additionally, ReAct is combined with Azure OpenAI Service models for other tasks such as content generation, summarization, semantic search, natural language to code translation, and more.

Now, let’s explore how the ReAct framework operates.

To illustrate its functionality, let’s consider a math reasoning problem that we want an AI model to solve. The ReAct framework is employed to tackle this math problem. For example, suppose the user presents the following math problem: “Jason has 10 apples. He gave 2 apples to his sister and 6 to his father. How many apples are left with Jason?”

When the user provides a math problem, the Language Model traces the reasoning process associated with the problem and subsequently provides a response to the user. Concurrently, the Language Model chooses an environment to execute specific actions. The environment then observes the problem and returns a response to the user.

What is Text Splitting?

Text Splitting refers to the process of breaking down a given text or document into smaller segments or chunks. This is done to handle longer texts that exceed the maximum input length allowed by the OpenAI models. By splitting the text into smaller parts, each segment or chunk can be processed individually, allowing the model to generate embeddings or perform other operations on the text effectively.

Consider the below diagram where the user wants to know the role of Harry in a book. Now, you will be curious to know how LLM will generate the answer from a huge pile of documents.

The user poses the question, “What was the role of Harry in this book?” The LLM receives this prompt and initiates an API call to retrieve information from two distinct chunks of text: Chunk A, B, C, and Chunk C, D. These chunks contain various sections of text, and one of them holds the relevant information about Harry’s role.

To make sense of the text and locate the relevant information, the LLM breaks down the text into smaller parts. Using its language understanding capabilities and knowledge, the AI model analyzes these smaller sections of text. It applies its comprehension skills to identify the chunk that contains the desired information about Harry’s role in the book.

Once the LLM identifies the relevant chunk, it generates a response based on the analyzed text and provides the user with the answer they were seeking. Through this process, the LLM effectively sifts through a large amount of text to locate the specific information and deliver it to the user.

What is Embedding?

An embedding is used to represent the data that is suitable for machine learning models and algorithms. It captures the semantic meaning of a text by using a set of numbers. These numbers form a vector, and the similarity between two texts can be determined by measuring the distance between their vectors. For example, If two texts have similar meanings, their vector representations will also be similar.

In Embedding mode, each object is processed to convert them into numerical representations called vectors. Consider the below diagram where you will see that each object has a different numerical representation or vectors.

Now, let’s take one real-world example.

Consider the above image where we have 3 sentences. Sentence 1: I want to order a large size Pizza, Sentence 2: I’ll have a huge size pizza, and Sentence 3: Ich mochte eine grobe Pizza bestellen (in the German language). All three sentences are processed by the encoder or embedding model and each sentence is transformed into a vector representation based on the semantic content it carries. This means that sentences with similar meanings are likely to have similar vector representations, as they are closer together in the vector space.

Understand the Vector Databases and Pinecone Vector Database

A vector database is a type of database that is specifically designed to store and query high-dimensional vectors efficiently. In a vector database, vectors are used as the primary data structure, and the database is optimized for similarity search and retrieval operations based on vector similarity.

Traditional databases are typically designed for storing structured data such as text, numbers, or relational data. However, vector databases focus on storing and querying vectors, which are mathematical representations of objects in a multi-dimensional space.

Consider the below diagram:

Query = “Tell me something about Empire State Building”.

Context = (Empty)

When the user provides a query without context, it can be challenging for the Large Language Model (LLM) to generate relevant and accurate answers. The LLM may struggle to determine the specific aspect of the Empire State Building that the user wants to know about. For instance, the user might be interested in learning about its height, location, or establishment date.

However, if the user provides a query with proper context, the LLM will be able to generate a more precise and appropriate answer that aligns with the user’s expectations. For example:

Query: Tell me the height of the Empire State Building.

Context: Height

All the queries and responses are stored in Vector Space. Vector space refers to a mathematical concept used to represent and manipulate textual data within language models. It represents a mathematical space where words or phrases are mapped as vectors, which are numerical representations with specific dimensions.

In this vector space, each word or phrase is assigned a vector, and the distance and direction between vectors indicate their semantic relationships. Words or phrases with similar meanings or contexts are closer to each other in the vector space, while those with different meanings are farther apart.

Vector space representations enable various natural language processing tasks, such as word similarity calculations, language generation, and understanding of semantic relationships between words.

Now, in the image below, we can observe the processing of the query and context by the embedding model. The Large Language Model (LLM) utilizes this embedding model to search for the specific vector that contains the data relevant to the query posed by the user.

Pinecone vector database provides a managed service for efficient vector storage and similar search. It is designed to handle large-scale vector data and enable fast and accurate searches based on vector similarity.

With Pinecone, you can store your vectors and perform similarity searches to find the most similar vectors to a given query vector. It uses advanced indexing techniques and similarity search algorithms to efficiently retrieve vectors that are similar to the query vector.

LangChain Framework

LangChain is a cutting-edge framework designed to facilitate the development of robust applications that harness the power of language models. It operates based on several key principles that drive its functionality and effectiveness.

Below are the two important key principles of the LangChain Framework:

  1. Data-aware: It integrates with external data sources, allowing applications to access and use contextual information from various places. This enhances the system’s ability to provide comprehensive and relevant responses, resulting in a more immersive user experience.
  2. Agentic: It empowers applications to actively interact with users and make informed decisions. Developers can create dynamic applications that understand user input, analyze the context, and generate appropriate responses. This makes the user experience more engaging and opens up possibilities for advanced and intelligent applications.

Now, let’s consider some benefits of LangChain Frameworks:

  1. Comprehensive and Contextually Rich Responses: With LangChain’s data awareness, applications can access and integrate information from external sources. This enriches the responses generated by the language model, ensuring they are comprehensive and contextually relevant. Users can expect more detailed and in-depth answers, leading to a better understanding and satisfaction when using the application.
  2. More Powerful and Differentiated Applications: LangChain empowers developers to create standout applications. Its agentic capabilities enable the development of intelligent systems that actively engage in conversations, adapt to user preferences, and make informed decisions. This results in more powerful, engaging, and efficient applications that effectively address complex user needs with accuracy.

Want to know more about LangChain, Visit – Data Liberation: Empowering Mankind with Azure OpenAI and Azure SQL

LangChain Demo Code

We have divided the code into three parts:

Recursively downloading the help file

This code will download the help file from URL and store it in the lo

wget -r -A.html -P C:\Your-folder-location-for-langchain-help-download https://api.python.langchain.com/en/latest/

R switch downloads the files recursively.

Creating Pinecone Vector DB

First, we will have to create Pinecone DB. You need to register yourself with Pinecone DB and then create an index. There you need to specify the Index name and use these values:

Ingestion to Pinecone Vector DB

This code initializes the Pinecone service, defines a function to ingest documents into a Pinecone vector index using the LangChain library, and demonstrates the usage of the function by running it. The LangChain library is used to load, split, and process documents before ingesting them into the Pinecone vector index.

import os
from langchain.document_loaders import ReadTheDocsLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import pinecone


pinecone.init(
    api_key=os.environ["PINECONE_API_KEY"],
    environment=os.environ["PINECONE_ENVIRONMENT_REGION"],
)


def ingest_docs() -> None:
    file_path = r"C:\Users\kalpa\OneDrive\doc-helper\langchain-docs\python.langchain.com\en\latest"
    loader = ReadTheDocsLoader(file_path, encoding="utf8")
    raw_documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, chunk_overlap=100, separators=["\n\n", "\n", " ", ""]
    )
    print(f"loaded {len(raw_documents) } documents")

    documents = text_splitter.split_documents(documents=raw_documents)
    print(f"Splitted into {len(documents)} chunks")

    for doc in documents:
        old_path = doc.metadata["source"]
        new_url = old_path.replace("langchain-docs", "https:/")
        doc.metadata.update({"source": new_url})

    print(f"Going to insert {len(documents)} to Pinecone")
    embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))
    Pinecone.from_documents(documents, embeddings, index_name="langchain-doc-index")
    print("****** Added to Pinecone vectorstore vectors")


if __name__ == "__main__":
    ingest_docs()

Let’s understand this code:

  1. Imports:
    • import os: This imports the os module, which provides a way to interact with the operating system, such as accessing environment variables.
    • from langchain.document_loaders import ReadTheDocsLoader: This imports the ReadTheDocsLoader class from the LangChain library, which is used to load documents from a specified file path.
    • from langchain.text_splitter import RecursiveCharacterTextSplitter: This imports the RecursiveCharacterTextSplitter class from the LangChain library, which is used to split long documents into smaller chunks.
    • from langchain.embeddings import OpenAIEmbeddings: This imports the OpenAIEmbeddings class from the LangChain library, which provides functionality for working with OpenAI language models.
    • from langchain.vectorstores import Pinecone: This imports the Pinecone class from the LangChain library, which provides integration with the Pinecone vector index.
    • import pinecone: This imports the Pinecone library.
  2. Pinecone Initialization:
    • pinecone.init(...): This initializes the Pinecone service using the provided API key and environment region. The values are obtained from the corresponding environment variables (PINECONE_API_KEY and PINECONE_ENVIRONMENT_REGION).
  3. Document Ingestion:
    • The code defines a function named ingest_docs that takes no arguments and returns None.
    • Inside the function, a file path is specified where the documents are located.
    • A ReadTheDocsLoader instance is created with the specified file path and encoding.
    • The documents are loaded using the loader, and the loaded raw documents are stored in the raw_documents variable.
    • A RecursiveCharacterTextSplitter instance is created with certain configuration parameters (chunk_size, chunk_overlap, separators).
    • The splitter is used to split the raw documents into smaller chunks, and the resulting chunks are stored in the documents variable.
    • The code updates the source URLs in the metadata of each document to replace the local file path with a URL.
    • The number of documents and chunks are printed for informational purposes.
    • An OpenAIEmbeddings instance is created using the OpenAI API key obtained from the environment variable.
    • The Pinecone.from_documents method is called with the documents, embeddings, and index name (“langchain-doc-index”) to ingest the documents into the Pinecone vector index.
    • A confirmation message is printed to indicate that the documents have been added to the vector index.
  4. Execution:
    • The code block under if __name__ == "__main__": is executed when the script is run directly (not imported as a module).
    • The ingest_docs function is called.

Backend code

This code initializes the Pinecone service, defines a function to run a conversational retrieval model, and demonstrates the usage of the function by running a sample query. The LangChain library is used to integrate the OpenAI language model and Pinecone document retriever to perform conversational retrieval tasks.

import os
from typing import Any, Dict, List
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores import Pinecone
import pinecone

pinecone.init(
    api_key=os.environ["PINECONE_API_KEY"],
    environment=os.environ["PINECONE_ENVIRONMENT_REGION"],
)

#INDEX_NAME = "langchain-doc-index"

def run_llm(query: str, chat_history: List[Dict[str, Any]] = []):
    embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
    docsearch = Pinecone.from_existing_index(
        embedding=embeddings,
        index_name="langchain-doc-index",
    )

   
    chat = ChatOpenAI(
        verbose=True,
        temperature=0,
    )

    qa = ConversationalRetrievalChain.from_llm(
        llm=chat,
        retriever=docsearch.as_retriever(),
        return_source_documents=True,
        chain_type="stuff"
    )
    return qa({"question": query, "chat_history": chat_history})


if __name__ == "__main__":
    print(run_llm(query="What are Langchain use cases ?")) 

Let’s understand this code:

  1. Imports:
    • import os: This imports the os module, which provides a way to interact with the operating system, such as accessing environment variables.
    • from typing import Any, Dict, List: This imports various type hints used in function signatures.
    • from langchain.embeddings.openai import OpenAIEmbeddings: This imports the OpenAIEmbeddings class from the LangChain library, which provides functionality for working with OpenAI language models.
    • from langchain.chat_models import ChatOpenAI: This imports the ChatOpenAI class from the LangChain library, which represents a conversational model powered by OpenAI.
    • from langchain.chains import ConversationalRetrievalChain: This imports the ConversationalRetrievalChain class from the LangChain library, which represents a conversational retrieval chain that combines a language model and a document retriever.
    • from langchain.vectorstores import Pinecone: This imports the Pinecone class from the LangChain library, which provides integration with the Pinecone vector index.
    • import pinecone: This imports the Pinecone library.
  2. Pinecone Initialization:
    • pinecone.init(...): This initializes the Pinecone service using the provided API key and environment region. The values are obtained from the corresponding environment variables (PINECONE_API_KEY and PINECONE_ENVIRONMENT_REGION).
  3. Function Definition:
    • def run_llm(query: str, chat_history: List[Dict[str, Any]] = []):: This defines a function named run_llm that takes a query string and an optional chat history list as input. The function returns a conversational retrieval result.
    • Inside the function, an instance of OpenAIEmbeddings is created using the OpenAI API key obtained from the OPENAI_API_KEY environment variable.
    • The Pinecone instance is created using the OpenAIEmbeddings instance and the name of the index (“langchain-doc-index”).
    • A ChatOpenAI instance is created with certain configuration parameters (verbose=True, temperature=0), representing a conversational model powered by OpenAI.
    • Finally, a ConversationalRetrievalChain is created using the ChatOpenAI instance and the Pinecone instance. The chain is configured to return source documents and is assigned a chain type (“stuff”).
    • The qa method of the chain is called with the query and chat history, and the result is returned.
  4. Execution:
    • The code block under if __name__ == "__main__": is executed when the script is run directly (not imported as a module).
    • The run_llm function is called with a sample query (“What are Langchain use cases?”) and the result is printed.

Streamlit Chatting code

this code creates a simple Streamlit app that allows users to interact with a chatbot. The user enters a prompt, and the chatbot generates a response using the run_llm function. The chat history is displayed in the app interface.

from typing import Set
from backend.core import run_llm
import streamlit as st
from streamlit_chat import message

#This function takes a set of strings (source_urls) as input and returns a string. It creates a formatted string containing the source URLs from the set.
def create_sources_string(source_urls: Set[str]) -> str:
    if not source_urls:
        return ""
    sources_list = list(source_urls)
    sources_list.sort()
    sources_string = "sources:\n"
    for i, source in enumerate(sources_list):
        sources_string += f"{i+1}. {source}\n"
    return sources_string


st.header("LangChain🦜🔗 - Helper Bot")




# The following lines check if certain keys exist in st.session_state (Streamlit's session state) and initialize them as empty lists if they don't exist.
# These lists are used to store the chat history.
if (
    "chat_answers_history" not in st.session_state
    and "user_prompt_history" not in st.session_state
    and "chat_history" not in st.session_state
):
    st.session_state["chat_answers_history"] = []
    st.session_state["user_prompt_history"] = []
    st.session_state["chat_history"] = []

#This line creates a text input field labeled "Prompt" where the user can enter their message.
# #If the user clicks the "Submit" button or presses Enter, the value of the input field is assigned to the prompt variable.
prompt = st.text_input("Prompt", placeholder="Enter your message here...") or st.button(
    "Submit"
)
#If prompt has a non-empty value, the code inside the if block is executed.
if prompt:
    #This displays a loading spinner while the response is being generated.
    with st.spinner("Generating response..."):
        generated_response = run_llm(
            query=prompt, chat_history=st.session_state["chat_history"]
        )
#The code then extracts the source URLs from the generated response and creates a formatted response string using the create_sources_string function.
        sources = set(
            [doc.metadata["source"] for doc in generated_response["source_documents"]]
        )
        formatted_response = (
            f"{generated_response['answer']} \n\n {create_sources_string(sources)}"
        )

        st.session_state.chat_history.append((prompt, generated_response["answer"]))
        st.session_state.user_prompt_history.append(prompt)
        st.session_state.chat_answers_history.append(formatted_response)
#If the chat history lists (st.session_state["chat_answers_history"] and st.session_state["user_prompt_history"]) are not empty, the code enters the if block.
if st.session_state["chat_answers_history"]:
    for generated_response, user_query in zip(
        st.session_state["chat_answers_history"],
        st.session_state["user_prompt_history"],
    ):
        #The message function is called in a loop to display each user query and its corresponding generated response.
        message(
            user_query,
            is_user=True,
        )
        message(generated_response)

Let’s understand this code:

  1. Imports:
    • from typing import Set: This imports the Set class from the typing module, which is used to define a set of values with a specific type.
    • from backend.core import run_llm: This imports the run_llm function from the backend.core module.
    • import streamlit as st: This imports the Streamlit library and assigns it an alias st.
    • from streamlit_chat import message: This imports the message function from the streamlit_chat module.
  2. Function Definition:
    • def create_sources_string(source_urls: Set[str]) -> str: This function takes a set of strings (source_urls) as input and returns a string. It creates a formatted string containing the source URLs from the set.
  3. Streamlit Setup:
    • st.header("LangChain🦜🔗 - Helper Bot"): This displays a header in the Streamlit app.
    • The following lines check if certain keys exist in st.session_state (Streamlit’s session state) and initialize them as empty lists if they don’t exist. These lists are used to store the chat history.
  4. User Interaction:
    • prompt = st.text_input("Prompt", placeholder="Enter your message here...") or st.button("Submit"): This line creates a text input field labeled “Prompt” where the user can enter their message. If the user clicks the “Submit” button or presses Enter, the value of the input field is assigned to the prompt variable.
  5. Chatbot Response Generation:
    • If prompt has a non-empty value, the code inside the if block is executed.
    • with st.spinner("Generating response..."):: This displays a loading spinner while the response is being generated.
    • generated_response = run_llm(query=prompt, chat_history=st.session_state["chat_history"]): This calls the run_llm function from the backend.core module with the user’s query (prompt) and the chat history (st.session_state["chat_history"]) as inputs. It assigns the generated response to the generated_response variable.
    • The code then extracts the source URLs from the generated response and creates a formatted response string using the create_sources_string function.
    • Finally, the user’s prompt, generated answer, and formatted response are appended to the chat history lists.
  6. Displaying Chat History:
    • If the chat history lists (st.session_state["chat_answers_history"] and st.session_state["user_prompt_history"]) are not empty, the code enters the if block.
    • The message function is called in a loop to display each user query and its corresponding generated response.

This is how Streamlit UI will look like: In order to run this UI you need to run this command line:

streamlit run main.py

This will create a URL and you just need to click that URL.

Conclusion

In this article, we have demonstrated the effectiveness of Azure OpenAI and Service Embedding for unstructured document searches. By utilizing the ReAct framework, which combines reasoning and action, users can query a knowledge base and extract relevant information using language models. This integration enables efficient retrieval of accurate answers from unstructured documents, and it can also be applied to other tasks such as content generation and semantic search. Overall, Azure OpenAI and Service Embedding provide a powerful solution for handling large amounts of textual data and extracting valuable insights.

+ There are no comments

Add yours

Leave a Reply