LangChain Foundations - An Engineer's Guide

Master the essentials of LangChain, the go-to framework for building robust LLM applications. Learn to manage prompts, enforce structured outputs with Pydantic, and build a simple RAG pipeline to chat with your documents.

Updated Jun 22, 202515 min read

In the previous tutorials, we've seen how to augment LLMs with memory, structured output, and tools. While it's possible to build these systems from scratch, orchestrating all the components—managing prompts, handling conversation history, calling different model APIs, and processing tool outputs—quickly becomes complex and unwieldy. Each new model or tool you add can require significant custom code, leading to brittle and hard-to-maintain applications.

This is the problem LangChain¹ solves. It is an open-source framework designed to simplify the development of applications powered by Large Language Models. Think of it as the glue for your AI stack. LangChain provides a standardized, modular set of building blocks that lets you compose complex applications by chaining together LLMs, tools, and data sources.

This tutorial provides a hands-on quickstart to the core components of LangChain. You will learn how to use its abstractions to call different LLMs, manage prompts dynamically, enforce structured outputs, and build a complete Retrieval-Augmented Generation (RAG) pipeline that can answer questions about a PDF document. By the end, you'll understand why LangChain is an indispensable tool for any AI engineer looking to build sophisticated, robust, and maintainable AI systems efficiently.

Tutorial Goals

Use LangChain to interact with different LLM providers (Google, Ollama).
Dynamically construct prompts and manage conversation history.
Enforce reliable, structured JSON output from an LLM using Pydantic.
Build a complete, local RAG pipeline to chat with a PDF document.
Enable an LLM to use a retrieval tool to answer questions.

Setup

Project Setup

You can find the complete code on GitHub: LangChain Foundations Notebook.

Before we begin, we need to install the necessary libraries. This setup includes the core langchain library along with specific integrations for LLM providers, document loading (pypdf), and embeddings (fastembed):

pip install -Uqqq pip --progress-bar off
pip install -qqq langchain==0.3.26 --progress-bar off
pip install -qqq langchain-ollama==0.3.3 --progress-bar off
pip install -qqq langchain-google-genai==2.1.5 --progress-bar off
pip install -qqq langchain-community==0.3.26 --progress-bar off
pip install -qqq pypdf==5.6.0 --progress-bar off
pip install -qqq fastembed==0.7.1 --progress-bar off

We will also download a sample PDF document about the Aston Martin Valhalla, which we'll use later for our RAG system.

gdown 15bT0a295EjL7klOOMWxMdvRQQSQ4tjxv -O data/

With the dependencies installed and the sample document ready, let's import the necessary modules.

import textwrap
from pprint import pprint
from typing import Literal
 
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain_core.vectorstores import InMemoryVectorStore
from pydantic import BaseModel, Field
 
load_dotenv()

Call OpenAI Model

Making a call to an OpenAI model is straightforward. We use the init_chat_model function to initialize the model and then invoke it with a prompt:

openai_model = init_chat_model("gpt-4o-mini", model_provider="openai")
response = openai_model.invoke("Explain in one sentence what is LangChain?")
print(response.content)

Response

LangChain is a framework designed to facilitate the development of applications that leverage language models, enabling tasks such as natural language understanding, generation, and interaction with external data sources.

The response is an AIMessage object, which contains the generated content as well as additional metadata, we'll look at it in more detail in the next section.

Multiple LLM Providers

A key feature of LangChain is its ability to abstract away the specific APIs of different LLM providers. This allows you to switch between models from Google, OpenAI, Anthropic, or a local Ollama instance with minimal code changes.

We'll use the same helper function, init_chat_model, to initialize our models. Let's start with Google's Gemini:

gemini_model = init_chat_model(
    "gemini-2.5-flash",
    model_provider="google_genai",
    thinking_budget=100,
    include_thoughts=True,
)

The Gemini 2.5 Flash model is a "reasoning" model². You can control the amount of thinking time with the thinking_budget parameter.

To interact with the model, we'll use the .invoke() method. It sends the prompt to the model and returns a response object:

response = gemini_model.invoke("Explain in one sentence what is LangChain?")

This time, the content in the response is a list of messages, including the thinking and the response:

response.content[0]["thinking"]

Thinking

**My Summary of LangChain's Essence**
 
Alright, I've got it. The challenge is to boil down LangChain to a single sentence, but still capture its core. Considering my deep understanding of the landscape, what _really_ defines LangChain is its ability to orchestrate and simplify the development of sophisticated applications leveraging Large Language Models, particularly by connecting these models to external data and tools through agents and chains. It's essentially a powerful framework for LLM-powered application builders.

response.content[1]

Response

LangChain is a framework designed to simplify the development of applications powered by large language models (LLMs),
enabling them to connect with external data sources and computational tools.

Finally, we can access the usage metadata, which includes the number of tokens used in the request and response:

response.usage_metadata

{
  "input_tokens": 10,
  "output_tokens": 33,
  "total_tokens": 134,
  "input_token_details": { "cache_read": 0 },
  "output_token_details": { "reasoning": 91 }
}

Now, let's switch to a local model running on Ollama, like qwen3:8b³. The initialization is nearly identical:

qwen_model = init_chat_model("qwen3:8b", model_provider="ollama")
response = qwen_model.invoke("Explain in one sentence what is LangChain? /no_think")
print(response.content)

Response

LangChain is a framework that enables developers to build applications that leverage large language models by providing tools for task execution, memory, and integration with other systems.

Notice that the core interaction logic (.invoke()) remains the same, regardless of the underlying model provider. This abstraction is great for experimenting with different models.

Prompts and Chat History

Hardcoding prompts directly into your application is inflexible. LangChain's ChatPromptTemplate provides a structured way to build dynamic prompts from multiple components, such as system instructions, user queries, and conversation history.

A prompt template is composed of a list of messages. Let's create a system prompt for a customer support agent:

system_message = """
You're a helpful customer support agent.
You're given a conversation between a customer and a support agent.
 
You're helping a customer to buy 90s Hip-hop styled t-shirts.
 
<instructions>
- Your name is {agent_name}
- Always deny answering about anything not related to the products
- You need to respond to the customer's message
- You need to respond in the same language as the customer's message
</instructions>
"""

And now for the ChatPromptTemplate object along with the user message:

user_message = "Hi! What's your name? /no_think"
 
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_message), ("user", user_message)]
)

We can now use the .invoke() method on the template, passing a dictionary to fill in any variables:

prompt = prompt_template.invoke({"agent_name": "Slim Shady"})

This generates a PromptValue object, which can be sent directly to the model.

response = qwen_model.invoke(prompt)

Response

Yo, my name's Slim Shady, and I'm here to help you find the perfect 90s Hip-hop styled t-shirts! What can I do for you?

Pretty neat, right?

Managing Conversation History

To maintain context in a conversation, you can manage the chat history as a list of messages. We start with the initial prompt messages and the first AI response.

history = [*prompt.to_messages(), response]

When the user sends a new message, we simply append it to the history list and pass the entire list back to the model. The model now has the full context of the conversation. Let's add a new message to the history and invoke the model again:

new_query = HumanMessage(
    """
I want a t-shirt with the style of Wu-Tang Clan.
I want to show a deadlifter that doesn't like Pencil Necks.
Describe the t-shirt design to a t-shirt designer.
/no_think
""".strip()
)
history.append(new_query)
 
response = qwen_model.invoke(history)

Response

Yo, the t-shirt needs to have a gritty, underground Wu-Tang Clan vibe. The front should feature the iconic Wu-Tang Clan logo in bold, black ink with some red accents to give it that raw energy. Add some graffiti-style text in the corners that says, "No Pencil Necks Allowed" in a bold, stylized font. The back should have a simple, dark background with a silhouette of a determined deadlifter, holding a barbell, and a subtle tag that reads "Real Hype, Real Grind." Keep the overall look dark, edgy, and authentic to the 90s hip-hop culture.

The model uses the previous turns to understand the context and provides a relevant, detailed description for the t-shirt designer, demonstrating its ability to maintain a coherent conversation.

Structured Outputs with Pydantic

A common challenge in AI engineering is forcing an LLM to produce output in a consistent, machine-readable format like JSON. LangChain's integration with Pydantic makes this seamless.

First, we define the desired data structure using a Pydantic BaseModel. The field names and descriptions act as direct instructions to the LLM:

class SongClassification(BaseModel):
    song_name: str = Field(description="The name of the song")
    style: Literal["Gangsta Rap", "R&B", "Other"] = Field(
        description="Style of the song"
    )
    reasoning: str = Field(description="Why the style was chosen")

Next, we bind this structure to our model using .with_structured_output(). This creates a new "chain" that will automatically format the output as a Pydantic object:

structured_model = qwen_model.with_structured_output(SongClassification)

Now we can create a prompt asking the LLM to extract information and invoke the structured model. Let's prepare the prompts:

prompt = """
You are a music expert on various genres. Your task is to guess the name of the song and then classify the style of it.
 
<instructions>
- Recognize the name of the song
- Classify the style of the song into one of the following styles: gangsta rap, R&B or other
- Try to recognise the song and then choose the style based on it
- If you can't recognise the song, just use the lyrics
</instructions>
 
Guess the name of the song and then classify the style of it into one of the following styles:
 
- Gangsta Rap
- R&B
- Other
 
Based on the following partial lyrics:
 
<lyrics>
{lyrics}
</lyrics>
 
Respond in JSON format and try your best to guess the name of the song and the style of it.
""".strip()
 
lyrics = """
I grew up on the crime side, the New York Times side
Stayin' alive was no jive
Had second hands, Mom's bounced on old man
So then we moved to Shaolin land
""".strip()

The invocation remains the same:

response = structured_model.invoke(prompt.format(lyrics=lyrics))
pprint(response.model_dump())

Response

{
  "reasoning": "The lyrics reference 'crime side', 'New York Times side', ...",
  "song_name": "Juice",
  "style": "Gangsta Rap"
}

The model returns a validated SongClassification object, not just a raw string. This makes the output predictable and safe to use in your application code.

External Knowledge with a RAG

LLMs only know what they were trained on. To answer questions about private or recent documents, we use Retrieval-Augmented Generation (RAG). LangChain simplifies building the entire RAG pipeline.

1. Load Documents

First, we load the document (first page shown above). LangChain provides various DocumentLoader integrations. Here, we use PyPDFLoader to load our Aston Martin Technical Overview PDF:

loader = PyPDFLoader("data/aston-martin-valhalla.pdf")
doc_pages = loader.load()
print(f"Loaded {len(doc_pages)} pages.")

Response

Loaded 2 pages.

2. Embed and Store (Indexing)

Next, we need to create embeddings (numerical representations) of the document chunks and store them in a vector store for efficient searching.

embeddings = FastEmbedEmbeddings()
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(documents=doc_pages)

Response

['3857aae1-c3c9-4f92-9f04-324000fa72ed',
 '72bf6bb6-3f6e-4f2f-80c0-1ba9d6ed9442']

We use the efficient FastEmbedEmbeddings for local embedding generation and a simple InMemoryVectorStore for this example. For production, you would typically use a persistent vector database like Qdrant, Milvus, or Weaviate.

3. Retrieve and Generate

Now we can build the part of the pipeline that answers questions.

Retrieve: First, we search the vector store for document chunks relevant to the user's question:

question = "What is the Valhalla's engine?"
results = vector_store.similarity_search(question, k=1)
print(results[0].page_content[:200])

Retrieved Context

**Aston Martin Valhalla Technical Overview**
**Powertrain:**
The Aston Martin Valhalla is propelled by a high-performance hybrid powertrain. Its mid-mounted
4.0-liter twin-turbocharged V8 engine...

Generate: We then take the retrieved context, combine it with the user's question in a prompt, and send it to the LLM to generate a final answer:

QA_PROMPT = """
You're a helpful assistant that can answer questions based on the provided information.
 
<instructions>
- Use the information to answer the question
- If the information is not available, say "I don't know"
- Be concise and to the point
- Cite the source of the information
</instructions>
 
Use the following information to answer the question:
 
<context>
{context}
</context>
 
<question>
{question}
</question>
 
Say that you don't know if the information is not available.
 
Answer:
/no_think
""".strip()

question = "What is the Valhalla's engine?"
results = vector_store.similarity_search(question, k=1)
prompt = QA_PROMPT.format(context=results[0].page_content, question=question)
response = qwen_model.invoke(prompt)

Response

The Aston Martin Valhalla is powered by a mid-mounted 4.0-liter twin-turbocharged V8 engine, developed in collaboration with Mercedes-AMG. This engine is paired with a battery-electric system as part of its hybrid powertrain.

[Source: Aston Martin Valhalla Technical Overview]

This entire RAG flow can be encapsulated in a single function, making it easy to reuse:

def ask_question(question: str) -> AIMessage:
    results = vector_store.similarity_search(question, k=1)
 
    prompt = QA_PROMPT.format(context=results[0].page_content, question=question)
 
    return qwen_model.invoke(prompt)

Let's test it with a new question:

response = ask_question("How fast can it accelerate?")

Response

The Aston Martin Valhalla can accelerate from 0 to 60 mph in under 2.5 seconds.

This information is cited from the "Performance" section of the technical overview.

You can go through the PDF and check the answers are correct.

Add Capabilities with Tools

Tool calling elevates a RAG system by allowing the LLM to decide when to search for information. Instead of a fixed retrieval step, the LLM can call a "search" tool if it determines it needs external knowledge.

First, we define our retrieval logic as a tool using the @tool decorator. This function wraps the vector_store.similarity_search logic:

@tool
def answer_query(query: str) -> str:
    """Answer a question based on user's private information.
 
    Args:
        query: the question to answer
    """
    results = vector_store.similarity_search(query, k=1)
    return results[0].page_content

Next, we'll bind this tool to our model. This makes the model aware of the tool and its capabilities.

model_with_tools = qwen_model.bind_tools([answer_query])

Let's prepare the system prompt:

PROMPT = """
You're a helpful assistant that can answer questions based on the provided information.
 
<instructions>
- Use the `answer_query` tool to find the answer
- If you don't know the answer, say "I don't know"
</instructions>
 
Answer the question from the user:
 
<question>
{question}
</question>
 
/no_think
"""

Now, we can ask a question. The LLM will first analyze the prompt and decide whether to call the answer_query tool:

question = "What is the transmission of the Valhalla?"
prompt = PROMPT.format(question=question)
response = model_with_tools.invoke(prompt)
print(response.tools)

Response

[
  {
    "name": "answer_query",
    "args": { "query": "What is the transmission of the Valhalla?" },
    "id": "5b874e47-e0e4-45fa-85f4-2c5011cd8163",
    "type": "tool_call"
  }
]

The response.tool_calls attribute shows the model's decision. Our code then executes the tool, gets the result, and adds it back to the conversation history as a ToolMessage:

history = [HumanMessage(prompt)]
 
available_tools = {"answer_query": answer_query}
 
for tool_call in response.tool_calls:
    selected_tool = available_tools[tool_call["name"].lower()]
    tool_msg = selected_tool.invoke(tool_call)
    history.append(tool_msg)

Finally, we invoke the model one last time with the updated history. The model now has the retrieved context and can formulate the final answer.

response = model_with_tools.invoke(history)

Response

The Aston Martin Valhalla uses an **8-speed dual-clutch transmission (DCT)** to transmit power to the wheels. This transmission is designed for seamless and rapid gear changes, optimizing both performance and fuel efficiency.

This demonstrates a more autonomous agent that can intelligently decide when to access its knowledge base.

Debugging and Tracing with MLFlow

When a multi-step chain fails or produces an unexpected result, its "black box" nature makes troubleshooting nearly impossible. MLFlow (which is one option) helps you by capturing every step of your flow — from the initial prompt to each tool call and the final LLM response. By offering a detailed, it allows you to inspect the exact inputs and outputs at each stage, diagnose latency bottlenecks, monitor token usage, and ultimately understand why your application behaved the way it did. MLFlow integrates⁴ with LangChain to automatically log all the steps and metrics of your workflows.

Tracing tools

There are a variety of tracing and observability tools for AI systems. LangSmith⁵ (paid) is one provided by the LangChain team. Another open and free option is LangFuse⁶ - feel free to pick your favorite.

To start, open a terminal and start the MLFlow server:

mlflow server --host 127.0.0.1 --port 8080

Then, in your code, you can set the MLFlow tracking URI and enable the auto-logging:

os.environ["MLFLOW_TRACKING_URI"] = "http://localhost:8080"
 
mlflow.set_experiment("LangChain Integration")
mlflow.langchain.autolog()

This will create an experiment (if not already created) and log all the steps and metrics of your workflows. Now you can invoke your model as usual:

response = qwen_model.invoke("Explain what is MLFlow in one sentence. /no_think")

Open your terminal (in a new tab) and run:

mlflow ui

Open the MLFlow UI in your browser and check your LangChain Integration experiment. You should see the trace of your workflow:

No LCEL?

The LangChain Expression Language (LCEL)⁷ is defined in the documentation as:

The LangChain Expression Language (LCEL) takes a declarative approach to building new Runnables from existing Runnables.

This is great, if you're familiar with it. However, the resulting syntax doesn't look like normal Python code to most people. Simplicity is really important when building real-world AI systems, in most cases you won't be needing LCEL. The drawback is that you'll have to write more code to achieve the same result or use another library such as LangGraph. Sometimes we'll use LCEL, but do it sparingly.

Conclusion

You've now navigated the core functionalities of the LangChain framework, from basic model interaction and prompt management to advanced techniques like structured output, Retrieval-Augmented Generation, and tool use. You've seen how LangChain provides a standardized layer of abstraction that accelerates the development of complex AI applications. By composing these fundamental building blocks, you can create systems that are robust, maintainable, and adaptable to different LLMs and external data sources.

Homework Exercises

Modify Chat Prompt: Take the customer support agent example and modify the system_message to change the agent's persona. Make it a very formal and professional "Senior Account Manager" named "Mr. Henderson". Test it with the same initial user query ("Hi! What's your name?") and observe how the response changes.
Extend Structured Output: Add a new field to the SongClassification Pydantic model called estimated_year: int = Field(description="The estimated year the song was released"). Rerun the model with the Notorious B.I.G. lyrics and see if the LLM correctly populates the new field.
Ask a Different RAG Question: Using the ask_question function from the RAG section, ask a new question about the Aston Martin Valhalla that requires information from the second page of the PDF. A good question might be: "What is the price of the Valhalla?" or "What is the interior like?". Verify that the model provides an accurate answer based on the document.

AI Systems Engineering

LangChain Foundations - An Engineer's Guide

Tutorial Goals

Setup

Project Setup

Call OpenAI Model

Multiple LLM Providers

Prompts and Chat History

Managing Conversation History

Structured Outputs with Pydantic

External Knowledge with a RAG

1. Load Documents

2. Embed and Store (Indexing)

3. Retrieve and Generate

Add Capabilities with Tools

Debugging and Tracing with MLFlow

Tracing tools

No LCEL?

Conclusion

Homework Exercises

References

Footnotes

The AI Engineer Toolkit - APIs, structured output, tools

Connect AI to External Systems - Model Context Protocol