LangChain Foundations - An Engineer's Guide

Master the essentials of LangChain, the go-to framework for building robust LLM applications. Learn to manage prompts, enforce structured outputs with Pydantic, and build a simple RAG pipeline to chat with your documents.

Updated Jun 22, 202515 min read

In the previous tutorials, we've seen how to augment LLMs with memory, structured output, and tools. While it's possible to build these systems from scratch, orchestrating all the components—managing prompts, handling conversation history, calling different model APIs, and processing tool outputs—quickly becomes complex and unwieldy. Each new model or tool you add can require significant custom code, leading to brittle and hard-to-maintain applications.

This is the problem LangChain1 solves. It is an open-source framework designed to simplify the development of applications powered by Large Language Models. Think of it as the glue for your AI stack. LangChain provides a standardized, modular set of building blocks that lets you compose complex applications by chaining together LLMs, tools, and data sources.

This tutorial provides a hands-on quickstart to the core components of LangChain. You will learn how to use its abstractions to call different LLMs, manage prompts dynamically, enforce structured outputs, and build a complete Retrieval-Augmented Generation (RAG) pipeline that can answer questions about a PDF document. By the end, you'll understand why LangChain is an indispensable tool for any AI engineer looking to build sophisticated, robust, and maintainable AI systems efficiently.

Tutorial Goals

  • Use LangChain to interact with different LLM providers (Google, Ollama).
  • Dynamically construct prompts and manage conversation history.
  • Enforce reliable, structured JSON output from an LLM using Pydantic.
  • Build a complete, local RAG pipeline to chat with a PDF document.
  • Enable an LLM to use a retrieval tool to answer questions.

Setup

Before we begin, we need to install the necessary libraries. This setup includes the core langchain library along with specific integrations for LLM providers, document loading (pypdf), and embeddings (fastembed):

pip install -Uqqq pip --progress-bar off
pip install -qqq langchain==0.3.26 --progress-bar off
pip install -qqq langchain-ollama==0.3.3 --progress-bar off
pip install -qqq langchain-google-genai==2.1.5 --progress-bar off
pip install -qqq langchain-community==0.3.26 --progress-bar off
pip install -qqq pypdf==5.6.0 --progress-bar off
pip install -qqq fastembed==0.7.1 --progress-bar off

We will also download a sample PDF document about the Aston Martin Valhalla, which we'll use later for our RAG system.

gdown 15bT0a295EjL7klOOMWxMdvRQQSQ4tjxv -O data/

With the dependencies installed and the sample document ready, let's import the necessary modules.

python code
import textwrap
from pprint import pprint
from typing import Literal
 
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_core.vectorstores import InMemoryVectorStore
from pydantic import BaseModel, Field
 
load_dotenv()

Call OpenAI Model

Making a call to an OpenAI model is straightforward. We use the init_chat_model function to initialize the model and then invoke it with a prompt:

py code
openai_model = init_chat_model("gpt-4o-mini", model_provider="openai")
response = openai_model.invoke("Explain in one sentence what is LangChain?")
print(response.content)
Response
LangChain is a framework designed to facilitate the development of applications that leverage language models, enabling tasks such as natural language understanding, generation, and interaction with external data sources.

The response is an AIMessage object, which contains the generated content as well as additional metadata, we'll look at it in more detail in the next section.

Multiple LLM Providers

A key feature of LangChain is its ability to abstract away the specific APIs of different LLM providers. This allows you to switch between models from Google, OpenAI, Anthropic, or a local Ollama instance with minimal code changes.

We'll use the same helper function, init_chat_model, to initialize our models. Let's start with Google's Gemini:

python code
gemini_model = init_chat_model(
    "gemini-2.5-flash",
    model_provider="google_genai",
    thinking_budget=100,
    include_thoughts=True,
)

The Gemini 2.5 Flash model is a "reasoning" model2. You can control the amount of thinking time with the thinking_budget parameter.

To interact with the model, we'll use the .invoke() method. It sends the prompt to the model and returns a response object:

python code
response = gemini_model.invoke("Explain in one sentence what is LangChain?")

This time, the content in the response is a list of messages, including the thinking and the response:

py code
response.content[0]["thinking"]
md codeThinking
**My Summary of LangChain's Essence**
 
Alright, I've got it. The challenge is to boil down LangChain to a single sentence, but still capture its core. Considering my deep understanding of the landscape, what _really_ defines LangChain is its ability to orchestrate and simplify the development of sophisticated applications leveraging Large Language Models, particularly by connecting these models to external data and tools through agents and chains. It's essentially a powerful framework for LLM-powered application builders.
py code
response.content[1]
md codeResponse
LangChain is a framework designed to simplify the development of applications powered by large language models (LLMs),
enabling them to connect with external data sources and computational tools.

Finally, we can access the usage metadata, which includes the number of tokens used in the request and response:

py code
response.usage_metadata
json code
{
  "input_tokens": 10,
  "output_tokens": 33,
  "total_tokens": 134,
  "input_token_details": { "cache_read": 0 },
  "output_token_details": { "reasoning": 91 }
}

Now, let's switch to a local model running on Ollama, like qwen3:8b3. The initialization is nearly identical:

python code
qwen_model = init_chat_model("qwen3:8b", model_provider="ollama")
response = qwen_model.invoke("Explain in one sentence what is LangChain? /no_think")
print(response.content)
md codeResponse
LangChain is a framework that enables developers to build applications that leverage large language models by providing tools for task execution, memory, and integration with other systems.

Notice that the core interaction logic (.invoke()) remains the same, regardless of the underlying model provider. This abstraction is great for experimenting with different models.

Prompts and Chat History

Hardcoding prompts directly into your application is inflexible. LangChain's ChatPromptTemplate provides a structured way to build dynamic prompts from multiple components, such as system instructions, user queries, and conversation history.

A prompt template is composed of a list of messages. Let's create a system prompt for a customer support agent:

py code
system_message = """
You're a helpful customer support agent.
You're given a conversation between a customer and a support agent.
 
You're helping a customer to buy 90s Hip-hop styled t-shirts.
 
<instructions>
- Your name is {agent_name}
- Always deny answering about anything not related to the products
- You need to respond to the customer's message
- You need to respond in the same language as the customer's message
</instructions>
"""

And now for the ChatPromptTemplate object along with the user message:

python code
user_message = "Hi! What's your name? /no_think"
 
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_message), ("user", user_message)]
)

We can now use the .invoke() method on the template, passing a dictionary to fill in any variables:

py code
prompt = prompt_template.invoke({"agent_name": "Slim Shady"})

This generates a PromptValue object, which can be sent directly to the model.

py code
response = qwen_model.invoke(prompt)
Response
Yo, my name's Slim Shady, and I'm here to help you find the perfect 90s Hip-hop styled t-shirts! What can I do for you?

Pretty neat, right?

Managing Conversation History

To maintain context in a conversation, you can manage the chat history as a list of messages. We start with the initial prompt messages and the first AI response.

python code
history = [*prompt.to_messages(), response]

When the user sends a new message, we simply append it to the history list and pass the entire list back to the model. The model now has the full context of the conversation. Let's add a new message to the history and invoke the model again:

python code
new_query = HumanMessage(
    """
I want a t-shirt with the style of Wu-Tang Clan.
I want to show a deadlifter that doesn't like Pencil Necks.
Describe the t-shirt design to a t-shirt designer.
/no_think
""".strip()
)
history.append(new_query)
 
response = qwen_model.invoke(history)
Response
Yo, the t-shirt needs to have a gritty, underground Wu-Tang Clan vibe. The front should feature the iconic Wu-Tang Clan logo in bold, black ink with some red accents to give it that raw energy. Add some graffiti-style text in the corners that says, "No Pencil Necks Allowed" in a bold, stylized font. The back should have a simple, dark background with a silhouette of a determined deadlifter, holding a barbell, and a subtle tag that reads "Real Hype, Real Grind." Keep the overall look dark, edgy, and authentic to the 90s hip-hop culture.

The model uses the previous turns to understand the context and provides a relevant, detailed description for the t-shirt designer, demonstrating its ability to maintain a coherent conversation.

Structured Outputs with Pydantic

A common challenge in AI engineering is forcing an LLM to produce output in a consistent, machine-readable format like JSON. LangChain's integration with Pydantic makes this seamless.

First, we define the desired data structure using a Pydantic BaseModel. The field names and descriptions act as direct instructions to the LLM:

python code
class SongClassification(BaseModel):
    song_name: str = Field(description="The name of the song")
    style: Literal["Gangsta Rap", "R&B", "Other"] = Field(
        description="Style of the song"
    )
    reasoning: str = Field(description="Why the style was chosen")

Next, we bind this structure to our model using .with_structured_output(). This creates a new "chain" that will automatically format the output as a Pydantic object:

python code
structured_model = qwen_model.with_structured_output(SongClassification)

Now we can create a prompt asking the LLM to extract information and invoke the structured model. Let's prepare the prompts:

python code
prompt = """
You are a music expert on various genres. Your task is to guess the name of the song and then classify the style of it.
 
<instructions>
- Recognize the name of the song
- Classify the style of the song into one of the following styles: gangsta rap, R&B or other
- Try to recognise the song and then choose the style based on it
- If you can't recognise the song, just use the lyrics
</instructions>
 
Guess the name of the song and then classify the style of it into one of the following styles:
 
- Gangsta Rap
- R&B
- Other
 
Based on the following partial lyrics:
 
<lyrics>
{lyrics}
</lyrics>
 
Respond in JSON format and try your best to guess the name of the song and the style of it.
""".strip()
 
lyrics = """
I grew up on the crime side, the New York Times side
Stayin' alive was no jive
Had second hands, Mom's bounced on old man
So then we moved to Shaolin land
""".strip()

The invocation remains the same:

py code
response = structured_model.invoke(prompt.format(lyrics=lyrics))
pprint(response.model_dump())
json codeResponse
{
  "reasoning": "The lyrics reference 'crime side', 'New York Times side', ...",
  "song_name": "Juice",
  "style": "Gangsta Rap"
}

The model returns a validated SongClassification object, not just a raw string. This makes the output predictable and safe to use in your application code.

External Knowledge with a RAG

LLMs only know what they were trained on. To answer questions about private or recent documents, we use Retrieval-Augmented Generation (RAG). LangChain simplifies building the entire RAG pipeline.

1. Load Documents

PDF File Preview
PDF File Preview

First, we load the document (first page shown above). LangChain provides various DocumentLoader integrations. Here, we use PyPDFLoader to load our Aston Martin Technical Overview PDF:

python code
loader = PyPDFLoader("data/aston-martin-valhalla.pdf")
doc_pages = loader.load()
print(f"Loaded {len(doc_pages)} pages.")
Response
Loaded 2 pages.

2. Embed and Store (Indexing)

Next, we need to create embeddings (numerical representations) of the document chunks and store them in a vector store for efficient searching.

python code
embeddings = FastEmbedEmbeddings()
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=doc_pages)
Response
['3857aae1-c3c9-4f92-9f04-324000fa72ed',
 '72bf6bb6-3f6e-4f2f-80c0-1ba9d6ed9442']

We use the efficient FastEmbedEmbeddings for local embedding generation and a simple InMemoryVectorStore for this example. For production, you would typically use a persistent vector database like Qdrant, Milvus, or Weaviate.

3. Retrieve and Generate

Now we can build the part of the pipeline that answers questions.

Retrieve: First, we search the vector store for document chunks relevant to the user's question:

python code
question = "What is the Valhalla's engine?"
results = vector_store.similarity_search(question, k=1)
print(results[0].page_content[:200])
Retrieved Context
**Aston Martin Valhalla Technical Overview**
**Powertrain:**
The Aston Martin Valhalla is propelled by a high-performance hybrid powertrain. Its mid-mounted
4.0-liter twin-turbocharged V8 engine...

Generate: We then take the retrieved context, combine it with the user's question in a prompt, and send it to the LLM to generate a final answer:

python code
QA_PROMPT = """
You're a helpful assistant that can answer questions based on the provided information.
 
<instructions>
- Use the information to answer the question
- If the information is not available, say "I don't know"
- Be concise and to the point
- Cite the source of the information
</instructions>
 
Use the following information to answer the question:
 
<context>
{context}
</context>
 
<question>
{question}
</question>
 
Say that you don't know if the information is not available.
 
Answer:
/no_think
""".strip()
py code
question = "What is the Valhalla's engine?"
results = vector_store.similarity_search(question, k=1)
prompt = QA_PROMPT.format(context=results[0].page_content, question=question)
response = qwen_model.invoke(prompt)
Response
The Aston Martin Valhalla is powered by a mid-mounted 4.0-liter twin-turbocharged V8 engine, developed in collaboration with Mercedes-AMG. This engine is paired with a battery-electric system as part of its hybrid powertrain.

[Source: Aston Martin Valhalla Technical Overview]

This entire RAG flow can be encapsulated in a single function, making it easy to reuse:

py code
def ask_question(question: str) -> AIMessage:
    results = vector_store.similarity_search(question, k=1)
 
    prompt = QA_PROMPT.format(context=results[0].page_content, question=question)
 
    return qwen_model.invoke(prompt)

Let's test it with a new question:

py code
response = ask_question("How fast can it accelerate?")
Response
The Aston Martin Valhalla can accelerate from 0 to 60 mph in under 2.5 seconds.

This information is cited from the "Performance" section of the technical overview.

You can go through the PDF and check the answers are correct.

Add Capabilities with Tools

Tool calling elevates a RAG system by allowing the LLM to decide when to search for information. Instead of a fixed retrieval step, the LLM can call a "search" tool if it determines it needs external knowledge.

First, we define our retrieval logic as a tool using the @tool decorator. This function wraps the vector_store.similarity_search logic:

python code
@tool
def answer_query(query: str) -> str:
    """Answer a question based on user's private information.
 
    Args:
        query: the question to answer
    """
    results = vector_store.similarity_search(query, k=1)
    return results[0].page_content

Next, we'll bind this tool to our model. This makes the model aware of the tool and its capabilities.

python code
model_with_tools = qwen_model.bind_tools([answer_query])

Let's prepare the system prompt:

py code
PROMPT = """
You're a helpful assistant that can answer questions based on the provided information.
 
<instructions>
- Use the `answer_query` tool to find the answer
- If you don't know the answer, say "I don't know"
</instructions>
 
Answer the question from the user:
 
<question>
{question}
</question>
 
/no_think
"""

Now, we can ask a question. The LLM will first analyze the prompt and decide whether to call the answer_query tool:

python code
question = "What is the transmission of the Valhalla?"
prompt = PROMPT.format(question=question)
response = model_with_tools.invoke(prompt)
print(response.tools)
json codeResponse
[
  {
    "name": "answer_query",
    "args": { "query": "What is the transmission of the Valhalla?" },
    "id": "5b874e47-e0e4-45fa-85f4-2c5011cd8163",
    "type": "tool_call"
  }
]

The response.tool_calls attribute shows the model's decision. Our code then executes the tool, gets the result, and adds it back to the conversation history as a ToolMessage:

python code
history = [HumanMessage(prompt)]
 
available_tools = {"answer_query": answer_query}
 
for tool_call in response.tool_calls:
    selected_tool = available_tools[tool_call["name"].lower()]
    tool_msg = selected_tool.invoke(tool_call)
    history.append(tool_msg)

Finally, we invoke the model one last time with the updated history. The model now has the retrieved context and can formulate the final answer.

python code
response = model_with_tools.invoke(history)
Response
The Aston Martin Valhalla uses an **8-speed dual-clutch transmission (DCT)** to transmit power to the wheels. This transmission is designed for seamless and rapid gear changes, optimizing both performance and fuel efficiency.

This demonstrates a more autonomous agent that can intelligently decide when to access its knowledge base.

Debugging and Tracing with MLFlow

When a multi-step chain fails or produces an unexpected result, its "black box" nature makes troubleshooting nearly impossible. MLFlow (which is one option) helps you by capturing every step of your flow — from the initial prompt to each tool call and the final LLM response. By offering a detailed, it allows you to inspect the exact inputs and outputs at each stage, diagnose latency bottlenecks, monitor token usage, and ultimately understand why your application behaved the way it did. MLFlow integrates4 with LangChain to automatically log all the steps and metrics of your workflows.

To start, open a terminal and start the MLFlow server:

mlflow server --host 127.0.0.1 --port 8080

Then, in your code, you can set the MLFlow tracking URI and enable the auto-logging:

py code
os.environ["MLFLOW_TRACKING_URI"] = "http://localhost:8080"
 
mlflow.set_experiment("LangChain Integration")
mlflow.langchain.autolog()

This will create an experiment (if not already created) and log all the steps and metrics of your workflows. Now you can invoke your model as usual:

py code
response = qwen_model.invoke("Explain what is MLFlow in one sentence. /no_think")

Open your terminal (in a new tab) and run:

mlflow ui

Open the MLFlow UI in your browser and check your LangChain Integration experiment. You should see the trace of your workflow:

MLFlow Trace
MLFlow Trace

No LCEL?

The LangChain Expression Language (LCEL)7 is defined in the documentation as:

The LangChain Expression Language (LCEL) takes a declarative approach to building new Runnables from existing Runnables.

This is great, if you're familiar with it. However, the resulting syntax doesn't look like normal Python code to most people. Simplicity is really important when building real-world AI systems, in most cases you won't be needing LCEL. The drawback is that you'll have to write more code to achieve the same result or use another library such as LangGraph. Sometimes we'll use LCEL, but do it sparingly.

Conclusion

You've now navigated the core functionalities of the LangChain framework, from basic model interaction and prompt management to advanced techniques like structured output, Retrieval-Augmented Generation, and tool use. You've seen how LangChain provides a standardized layer of abstraction that accelerates the development of complex AI applications. By composing these fundamental building blocks, you can create systems that are robust, maintainable, and adaptable to different LLMs and external data sources.

Loading...

Homework Exercises

  1. Modify Chat Prompt: Take the customer support agent example and modify the system_message to change the agent's persona. Make it a very formal and professional "Senior Account Manager" named "Mr. Henderson". Test it with the same initial user query ("Hi! What's your name?") and observe how the response changes.

  2. Extend Structured Output: Add a new field to the SongClassification Pydantic model called estimated_year: int = Field(description="The estimated year the song was released"). Rerun the model with the Notorious B.I.G. lyrics and see if the LLM correctly populates the new field.

  3. Ask a Different RAG Question: Using the ask_question function from the RAG section, ask a new question about the Aston Martin Valhalla that requires information from the second page of the PDF. A good question might be: "What is the price of the Valhalla?" or "What is the interior like?". Verify that the model provides an accurate answer based on the document.

References

Footnotes

  1. LangChain Documentation

  2. Gemini 2.5 Flash

  3. Qwen3

  4. MLFlow LangChain Integration

  5. LangSmith

  6. LangFuse

  7. LangChain Expression Language (LCEL)