Data Liberation: Empowering Mankind with Azure OpenAI and Azure SQL


In today’s world of endless information, we are on a mission to set data free. LangChain, in collaboration with Azure OpenAI, has the ability to comprehend and generate text that closely resembles human language. This has the potential to transform the way we analyze data. By combining these technologies, organizations gain the ability to harness data for making thoughtful decisions. Are you tired of poring over endless spreadsheets and databases in search of the information you need? Imagine being able to simply ask a chatbot a question and get instant results from your database. It sounds like science fiction, but with Azure OpenAI and Azure SQL, it’s a reality! In this session, we’ll show you how to unlock the power of conversational AI to make data more accessible and user-friendly.

With Azure OpenAI and Azure SQL data management capabilities, you’ll be able to create a chatbot that can understand your questions and deliver the data you need in real-time. From streamlining customer service to optimizing business processes, the possibilities are endless.

So, if you’re ready to harness the human side of data and revolutionize the way you work with information, this blog is not to be missed!

What is LangChain?

LangChain is a framework specifically designed for developing applications powered by language models. It aims to go beyond just utilizing a language model and focuses on two key principles: being data-aware and agentic.

  1. Data-aware: LangChain enables the connection between a language model and other sources of data. This means that in addition to processing and generating text, the framework allows the language model to interact with and access external data, such as databases or other data repositories. By incorporating data awareness, LangChain enables more comprehensive and contextually rich responses from the language model.
  2. Agentic: LangChain empowers a language model to interact with its environment. This implies that the language model is not limited to being a passive responder but can actively engage with the application’s ecosystem. By being agentic, the language model can understand and process user inputs, make decisions, and take actions based on the application’s defined logic or rules.

The LangChain framework is built around these principles, aiming to provide developers with a platform to create more powerful and differentiated applications that leverage language models in a data-aware and agentic manner.

Fundamentals of LangChain

LangChain works on these fundamental concepts:

  1. Chain of Thought: Imagine you are solving a math problem, and you break it down into smaller steps to find the answer. That’s what Chain of Thought is about. It’s a technique we use to encourage the language model to think step-by-step and generate a series of reasoning steps. It helps the model to provide more detailed and logical responses. For example, if we ask the model, “How do you solve a math problem step-by-step?” it will give us a sequence of smaller steps to follow.
  2. Action Plan Generation: Sometimes, we need the language model to suggest actions to take. Action Plan Generation is a technique that helps the model generate actions or steps to follow. We can then take those actions and ask the model what to do next. It’s like having a virtual assistant that tells us what to do. For example, if we ask the model, “What should I do to bake a cake?” it will give us a list of actions like mixing ingredients, preheating the oven, and so on.
  3. ReAct: ReAct is a combination of a Chain of Thought and Action Plan Generation. It prompts the model to think about what action to take and then take it. It’s like having a conversation with the model where it not only thinks but also acts based on its thinking. For example, if we ask the model, “What steps should I follow to write a story?” it will give us a series of steps to think about and then act upon, like brainstorming ideas, creating characters, and writing the plot.
  4. Self-ask: Self-ask is a method that the model uses to ask follow-up questions. It builds on a Chain of Thought prompting. The model asks a question and then looks for the answer using an external search engine. It’s like having a model that can search for information to answer its own questions. For example, if the model is asked, “What are the benefits of exercise?” it can search the internet for information and provide a more informed answer.
  5. Prompt Chaining: Prompt Chaining is about combining multiple language model calls. The output of one step becomes the input for the next step. It’s like linking different questions or prompts together. For example, if we ask the model, “What is the capital of France?” and then use its answer as the input for the next question, like “What is the population of that capital?”, the model can provide a continuous chain of information.
  6. Memetic Proxy: Memetic Proxy is encouraging the language model to respond in a certain way by framing the discussion in a context that the model knows of and that will result in that type of response. For example, let’s imagine a conversation between a student and a teacher. The student asks, “Can you explain the water cycle?” The model, acting as the teacher, responds, “Of course! The water cycle is the continuous movement of water on, above, and below the Earth’s surface.” By framing the conversation as a student-teacher interaction, the model is more likely to provide an educational and informative response.
  7. Self-Consistency: Self Consistency is a decoding strategy that involves sampling a diverse set of reasoning paths and then selecting the most consistent answer. For example, if we ask the model, “Why is exercise important?” it might generate multiple responses like “Exercise keeps us fit,” “Exercise helps us stay healthy,” or “Exercise improves our physical and mental well-being.” The model then evaluates these responses and selects the most consistent and accurate answer, which is “Exercise improves our physical and mental well-being.”
  8. Inception: Inception, also known as First Person Instruction, encourages the language model to think in a certain way by including the start of its own response in the prompt. For example, if we ask the model, “What is your favorite food? Please start your response with ‘My favorite food is…'”, the model might generate the complete response as “My favorite food is pizza.” By including the start of the model’s response in the prompt, we guide its thinking toward providing a specific type of answer.
  9. MemPrompt: MemPrompt involves the language model maintaining a memory of errors and user feedback, using it to prevent a repetition of mistakes. For example, if the model previously provided an incorrect answer to a question, it remembers that mistake. When faced with a similar question in the future, the model learns from the feedback and avoids repeating the same error. This memory helps the model improve its accuracy over time.

These concepts and techniques help us make the language model smarter and more helpful in various tasks. It’s like training the model to think, reason, and respond in a more human-like way, making it a valuable tool for many applications.

What is an Agent in Lang Chain?

The agent is a special kind of software that is like a smart helper that can do different tasks for you. Depending on what you ask, the agent can decide which tools to use and how to use them.

There are two main types of agents: “Action Agents” and “Plan-and-Execute Agents.” Let’s understand what they do:

1. Action Agents: These agents are like helpers who make decisions one step at a time. Here’s how they work:

  • When you give some input to the agent, it thinks about which tool to use (like a calculator or a dictionary) and what information it needs.
  • Then, the agent uses that tool with the required information and gets some results.
  • It remembers what tool it used, what information it gave, and what results it got.
  • Based on these results, the agent decides what to do next. It may choose another tool or give you an answer directly.
  • This process continues until the agent decides it doesn’t need any more tools and gives you a final answer.

2. Plan-and-Execute Agents: These agents plan ahead and then take action step by step. Here’s how they work:

  • When you give some input to the agent, it starts making a plan. It lists all the steps it needs to take to solve your problem.
  • Once the plan is ready, the agent starts executing each step one by one.
  • The steps can be different tools or actions, like finding information or solving puzzles.
  • The agent follows the plan and keeps track of what it’s doing.
  • After completing all the steps, the agent gives you the final answer or result.

Action Agents are good for simple tasks because they decide on the spot and take actions step by step. Plan-and-Execute Agents are better for complex tasks that need careful planning. However, they may take more time and make more decisions.

In many cases, it’s useful to have an Action Agent working together with a Plan-and-Execute Agent. The Action Agent can help in executing the plan made by the Plan-and-Execute Agent.

So, agents are like helpful programs that decide what tools to use and how to use them based on your input. They can solve problems by taking action or by planning and then executing those plans.

SQL Database Agent

Imagine we could have an agent for SQL database who can act as a helper for us to answer all the Data-related questions in plain English without a need for a user to know the SQL syntax for querying the SQL database. We will use a SQL Database agent for this purpose. Let’s dive into the code.

Create an Open AI instance in Azure

To get started with Azure OpenAI, you first need to deploy the cognitive services. This involves creating and configuring the necessary resources in Azure. The Cognitive Services provide a range of pre-built AI capabilities that can be easily integrated into your applications.

Follow the below steps to deploy the cognitive service to create Azure OpenAI:

STEP 1: Open Azure Portal and click on “Create a resource“.

STEP 2: Now, Select “Cognitive Services” in Azure Marketplace.

STEP 3: Click on “Create” to begin the process.

Azure OpenAI and Azure SQL (1)

STEP 4: Enter the details such as subscription, Region where you want to deploy the cognitive services, and Name of Azure OpenAI Service. Click Next.

Till now, only the standard tier is available. In the future, if other pricing tiers are available, you can select accordingly.

STEP 5: Now, select whether you want to configure all network security for cognitive services or disable it from all networks and make it only accessible from private endpoints. Click Next.

Azure OpenAI and Azure SQL (2)

STEP 6: Enter the cognitive service’s name and value, which will categorize resources and view consolidating billing with the same tag to multiple resources and resource groups.

Azure OpenAI and Azure SQL (3)

STEP 7: Now, review all the details which you have entered till now and click on the “Create” button.

Azure OpenAI and Azure SQL (4)

Azure OpenAI is created.

STEP 8: Now, go to Resource Management => Keys and Endpoint. In the right panel, click on “Show Keys“. You will see the Endpoint URL. See the below image to better understand.

Now you have to copy the URL because this URL will be used in the open.api_key.

Azure OpenAI and Azure SQL (5)

Once you finish setting up the Cognitive Services, your instance will be ready to use. Now, you can take advantage of Azure OpenAI by combining Cognitive Services with your own applications. In this example, I use Databricks Notebook to access the cognitive service by service endpoint. But you can use simple Python code to achieve the same.

Setting up the Azure SQL DB

Here we will create an Azure SQL DB instance and deploy the AdventureWorks sample database for querying the data. Follow the below steps to do so:

STEP 1: Go to the Azure Portal and click on “Create a resource“.

STEP 2: Now, search for “Azure SQL Database” and click on the “Create” button to create a new Azure SQL Database instance.

STEP 3: In the “Basic” tab, enter the necessary details such as Subscription, Resource Group, database name, and Server. Then configure the “Computer + Storage” and Networking options (See the below image).

Click Next.

Azure OpenAI and Azure SQL (6)

STEP 4: Here you need to configure the firewall rules to allow access to the database. Select “yes” for the “Add current client IP address” option. If you want to Add a private endpoint, then click on the “+Add private endpoint” option.

Click Next for Security settings.

Azure OpenAI and Azure SQL (7)

STEP 5: Here, configure the Ledger which will verify the integrity of your data and is also used to detect the tampering that might have occurred.

Click “Next: Additional settings”

Azure OpenAI and Azure SQL (8)

STEP 6: Now, select the Data source. Select “Sample” for the “Use existing data” option. It will AdventureWorksLT as the sample database.

Azure OpenAI and Azure SQL (9)

STEP 7: Now, enter the Tag name and Tag value for the SQL database. Then click on “Next: Review+ create”.

Azure OpenAI and Azure SQL (10)

STEP 8: Check all the information which you have entered till now. After checking, click on “Create”.

Azure OpenAI and Azure SQL (11)

Change the Table Schemas to DBO (for Azure SQL DB or SQL Server)

To ensure compatibility with Azure SQL DB or SQL Server, modify the table schemas to DBO. The Python code relies on tables owned by DBO, thus necessitating the recreation of these tables in the dbo schema.

/*Please run this code when the schema is not set to DBO.
 The code will modify the schema of all the tables to DBO.
 Sepcifically, it will change the shema of ProductModelProductDescription table from SalesLT to dbo
*/

Select * into dbo.ProductModelProductDescription from SalesLT.ProductModelProductDescription

Setting up Chinook DB

Chinook DB is a sample database that is commonly used for SQL queries. It provides a realistic data model and allows users to explore various SQL operations.

Here we are using SQLite and Chinook DB as they can quickly set up and manipulate a database environment, making it a good choice for small projects.

Follow the below steps to set up Chinook DB:

STEP 1: Download SQL Lite and Chinnok db and Place them in a designated folder.

STEP 2: Use the following URL to connect to the Database:

db = SQLDatabase.from_uri('sqlite:///C:/sqllite/db/chinook.db')

Please note the three forward slashes after sqllite and the path is also using forward slashes despite the fact that Windows paths are back slashes. This is the main trick to access the Chinook db.

STEP 3: Now install the SQL Lite UI tool to access the Chinok db locally, ensuring that the files are accessible and not corrupted.

SQL Chain

SQL chain allows users to ask questions over a SQL database and get answers using a language model. Here is the SQL Chain code:


import os 
from langchain.llms import AzureOpenAI
import openai
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.prompts.prompt import PromptTemplate

from sqlalchemy.engine import URL
# copied from Open AI
os.environ["OPENAI_API_KEY"] = "************"
openai.api_type = "azure"
# copied from Open AI
openai.api_base = "https://XYZ.openai.azure.com/"
openai.api_version = "2022-12-01"
 

llm = AzureOpenAI( temperature=0,  verbose=True, deployment_name="text-davinci-003", model_name="text-davinci-003")
print(llm)


_DEFAULT_TEMPLATE =  """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Question:
{table_info}
 {input}"""

PROMPT = PromptTemplate(
    input_variables=["input", "table_info", "dialect"], template=_DEFAULT_TEMPLATE
)


# You can either use chinook db or Azure SQL DB or SQL Server deployed in VM. 
#We will discuss the URI in next section. 
#This URI is for Azure SQL DB
#db = SQLDatabase.from_uri("mssql+pymssql://sqladmin1:Password@llmsvr.database.windows.net:1433/llmdb)
#This URI is for Chinook DB 
db = SQLDatabase.from_uri('sqlite:///C:/sqllite/db/chinook.db')

#This is the URI for the SQL Server deployed into Azure VM.
#db = SQLDatabase.from_uri("mssql+pymssql://sqladmin1:Welcome12345@IPAddress_Of SQLDB:1433/AdventureWorks2017")

toolkit = SQLDatabaseToolkit(db=db, llm=llm )
agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    db=db
)
#Here you will enter the real question
agent_executor.run("How many employees are there?")

The above code sets up the necessary imports and configurations for using the LangChain library with the Azure OpenAI model. It initializes an instance of the AzureOpenAI model, defines a default prompt template, and creates an agent executor.

The code also establishes a connection to a SQL database, either the Chinook DB, Azure SQL DB, or SQL Server deployed on an Azure VM. The syntax for these databases is already provided but we need to use only one of the databases. The SQLDatabaseToolkit is created to interact with the database, and the agent_executor runs a specified question against the configured SQL database.

Here is the outcome of the code:

> Entering new SQLDatabaseChain chain...
How many employees are there?
SQLQuery:
SELECT COUNT(*) FROM "Employee";
SQLResult: [(8,)]
Answer:There are 8 employees.
> Finished chain.
'There are 8 employees.'

SQL Chain Options

SQL Chain provides many options and we will discuss each of them here:

Option 1: Return Intermediate Steps

You can return the intermediate steps of the SQLDatabaseChain. This will allow you to access the SQL statement that was generated, as well as the result of running that against the SQL Database.

Option 2: Limit the number of rows returned

If you are querying for several rows of a table you can select the maximum number of results you want to get by using the ‘top_k’ parameter (default is 10). This is useful for avoiding query results that exceed the prompt max length or consume tokens unnecessarily.

Option 3: Verbose Output

We can enable verbose output by enabling the verbose switch. Enabling verbose output means that the system will provide more detailed and extensive information during a process or operation. This additional information can include status updates, progress reports, or other relevant details that can help in understanding and troubleshooting the process.

#Return Intermediate Steps and Verbose Output and Limit the number of rows returned switches can be enabled as in the below line of code.
toolkit = SQLDatabaseToolkit(db=db, llm=llm ,use_query_checker=True, verbose=True, return_intermediate_steps=True, top_k=5 )

Essentially, enabling the verbose switch allows for a more thorough and comprehensive output that aids in gaining deeper insights and a clearer understanding of what is happening within the system.

Option 4: Adding Example Rows from each Table

Occasionally, the structure of the data may not be immediately apparent, and it can be beneficial to include a subset of rows from the tables in the prompt. This enables the Language Model (LM) to comprehend the data and generate a final query more effectively. In this case, we will utilize this approach by presenting two rows from the “Track” table, which will inform the LM that artists’ full names are stored in the dataset.


_DEFAULT_TEMPLATE = """You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

{table_info}

Question: {input}"""

custom_table_info = {
    "Track": """CREATE TABLE Track (
	"TrackId" INTEGER NOT NULL, 
	"Name" NVARCHAR(200) NOT NULL,
	"Composer" NVARCHAR(220),
	PRIMARY KEY ("TrackId")
)
/*
3 rows from Track table:
TrackId	Name	Composer
1	For Those About To Rock (We Salute You)	Angus Young, Malcolm Young, Brian Johnson
2	Balls to the Wall	None
3	My favorite song ever	The coolest composer of all time
*/"""
}

db = SQLDatabase.from_uri('sqlite:///C:/sqllite/db/chinook.db',include_tables=['tracks','playlists'],sample_rows_in_table_info=5,custom_table_info=custom_table_info)

Please note down these three entries:

  1. include_tables=[‘tracks’,’playlists’]
  2. sample_rows_in_table_info=2
  3. custom_table_info=custom_table_info

Option 5: Customize the Prompt

You can customize the prompt. Here is an example to enforce the model that emp is the same as the Employee table so when in the conversation we use emp it means employee.

_DEFAULT_TEMPLATE = """You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most 5 results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

{table_info}
If someone asks for the table emp, they really mean the employee table.

Question: {input}"""

db = SQLDatabase.from_uri('sqlite:///C:/sqllite/db/chinook.db')

toolkit = SQLDatabaseToolkit(db=db, llm=llm ,use_query_checker=True, verbose=True, return_intermediate_steps=True, top_k=5 )
agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    db=db
)

agent_executor.run("How many employees are there in the emp table ?")

Option 6: Custom Table Info

There are situations where it can be advantageous to supply customized table information instead of relying on automatically generated table definitions and the initial sample rows. This allows for greater control over the data provided to the model. For instance, if the initial rows of a table lack relevance or are uninformative, it can be beneficial to manually select example rows that are more diverse or contain more pertinent information.

Furthermore, it is possible to restrict the visibility of certain columns to the model if they are deemed unnecessary.

To provide this information, you can use a dictionary format with the table names as keys and table information as values. Let’s illustrate this approach by offering a custom definition and sample rows for the “Track” table, including only a few selected columns:

custom_table_info = {
    "Track": """CREATE TABLE Track (
	"TrackId" INTEGER NOT NULL, 
	"Name" NVARCHAR(200) NOT NULL,
	"Composer" NVARCHAR(220),
	PRIMARY KEY ("TrackId")
)
/*
3 rows from Track table:
TrackId	Name	Composer
1	For Those About To Rock (We Salute You)	Angus Young, Malcolm Young, Brian Johnson
2	Balls to the Wall	None
3	My favorite song ever	The coolest composer of all time
*/"""
}

db = SQLDatabase.from_uri("sqlite:///C:/sqllite/db/chinook.db",include_tables=['Track', 'Playlist'],
sample_rows_in_table_info=2,custom_table_info=custom_table_info)

print(db.table_info)

Note how our custom table definition and sample rows Track override the sample_rows_in_table_info parameter. Tables that are not overridden by custom_table_info, in this example Playlist, will have their table info gathered automatically as usual.

Option 7: SQLDatabaseSequentialChain

This is useful in cases where the number of tables in the database is large and we need to use the tables from the queries:

The chain is as follows:

1. Based on the query, determine which tables to use.
2. Based on those tables, call the normal SQL database chain.
from langchain.chains import SQLDatabaseSequentialChain
db = SQLDatabase.from_uri("sqlite:///C:/sqllite/db/chinook.db")

chain = SQLDatabaseSequentialChain.from_llm(llm, db, verbose=True)

chain.run("How many employees are also customers?")

Here is the outcome of this code.

> Entering new SQLDatabaseSequentialChain chain...
Table names to use:
['Employee', 'Customer']

> Entering new SQLDatabaseChain chain...
How many employees are also customers?
SQLQuery:SELECT COUNT(*) FROM Employee e INNER JOIN Customer c ON e.EmployeeId = c.SupportRepId;
SQLResult: [(59,)]
Answer:59 employees are also customers.
> Finished chain.

> Finished chain.
'59 employees are also customers.'

Conclusion

The integration of Azure OpenAI and SQL through LangChain introduces a new era of data interaction, where the power of natural language processing meets the vast potential of databases. By embracing this technology, organizations can bridge the gap between humans and data, unlocking insights and enabling smarter decision-making processes. And this blog helps to achieve this.

Let’s summarize what we have learned so far. In this article, we have explored LangChain and its fundamentals, the role of agents in LangChain (specifically the SQL Database Agent), and the process of setting up LangChain in Azure. Additionally, we delved into various SQL Chain options, such as returning intermediate steps and limiting row count, as well as enhancing query understanding through example rows and prompt customization. We also discussed the importance of custom table information and the integration of SQL database functionality through SQLDatabaseSequentialChain. By covering these topics, we comprehensively understood LangChain’s capabilities for data analysis and query generation.

1 comment

Add yours
  1. 1
    Muthu Kasi

    Thank you very much, Rajaniesh, for providing such excellent content. It proved instrumental in successfully completing my proof of concept by utilizing your code as a reference. Once again, I extend my heartfelt gratitude for your diligent efforts in assembling this valuable resource.

Leave a Reply