Language Models - Working with LLMs, Chat Models & Embeddings β
Master the foundation of LangChain - integrating and working with different types of language models for your AI applications
π― Understanding Language Models in LangChain β
Language models are the core intelligence behind your LangChain applications. LangChain provides a unified interface to work with different types of models, making it easy to switch between providers and model types.
π€ Types of Language Models β
π€ LANGCHAIN MODEL ECOSYSTEM π€
(Different model types & uses)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BASE LANGUAGE MODEL β
β (Common Interface) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β β β
βββββββββΌβββββββ βββββΌβββββ ββββββΌββββββββββ
β LLM MODEL β β CHAT β β EMBEDDING β
β β β MODEL β β MODEL β
β Text β Text β β Conv β β Text β Vec β
β Completion β β Based β β Similarity β
β Simple I/O β β Roles β β Search β
ββββββββββββββββ ββββββββββ ββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β USE CASES β
β β
β π Text Generation π¬ Chatbots β
β π Data Analysis π Q&A Systems β
β π¨ Creative Writing π§ Reasoning β
β π Summarization π Knowledge Search β
ββββββββββββββββββββββββββββββββββββββββββββπ€ LLM Models (Text Completion) β
LLM models are traditional completion models that generate text based on a prompt.
π Basic LLM Usage β
from langchain_openai import OpenAI
from langchain_community.llms import Ollama
# OpenAI LLM (GPT-3.5 Instruct)
llm = OpenAI(
model="gpt-3.5-turbo-instruct",
temperature=0.7,
max_tokens=500
)
# Simple text completion
response = llm.invoke("The benefits of renewable energy are")
print(response)
# Local LLM with Ollama
local_llm = Ollama(model="llama2")
response = local_llm.invoke("Explain machine learning in simple terms:")
print(response)βοΈ LLM Configuration Options β
# Detailed LLM configuration
llm = OpenAI(
model="gpt-3.5-turbo-instruct",
temperature=0.7, # Creativity (0-1)
max_tokens=1000, # Response length
top_p=0.9, # Nucleus sampling
frequency_penalty=0.1, # Reduce repetition
presence_penalty=0.1, # Encourage new topics
n=1, # Number of responses
best_of=1, # Best of N generations
streaming=True # Enable streaming
)π Streaming LLM Responses β
# Stream responses for better UX
for chunk in llm.stream("Write a story about AI"):
print(chunk, end="", flush=True)π¬ Chat Models (Conversation-Based) β
Chat models work with structured conversation messages with roles (system, human, assistant).
π£οΈ Basic Chat Model Usage β
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
# Initialize chat model
chat = ChatOpenAI(
model="gpt-4",
temperature=0.7,
max_tokens=500
)
# Single message
response = chat.invoke([HumanMessage(content="What is LangChain?")])
print(response.content)
# Conversation with system message
messages = [
SystemMessage(content="You are a helpful Python programming tutor."),
HumanMessage(content="How do I create a list in Python?")
]
response = chat.invoke(messages)
print(response.content)π Message Types and Roles β
from langchain_core.messages import (
SystemMessage,
HumanMessage,
AIMessage,
FunctionMessage,
ToolMessage
)
# System message - Sets behavior/context
system_msg = SystemMessage(
content="You are an expert data scientist. Answer questions concisely with examples."
)
# Human message - User input
human_msg = HumanMessage(
content="What's the difference between supervised and unsupervised learning?"
)
# AI message - Assistant response (for conversation history)
ai_msg = AIMessage(
content="Supervised learning uses labeled data, unsupervised finds patterns in unlabeled data."
)
# Complete conversation
conversation = [system_msg, human_msg, ai_msg]
# Continue conversation
new_question = HumanMessage(content="Can you give me examples of each?")
conversation.append(new_question)
response = chat.invoke(conversation)
print(response.content)π§ Advanced Chat Features β
# Chat with function calling (for GPT-4)
from langchain_core.tools import tool
@tool
def get_weather(location: str) -> str:
"""Get current weather for a location."""
return f"The weather in {location} is sunny, 75Β°F"
# Bind tools to chat model
chat_with_tools = chat.bind_tools([get_weather])
# Use tools in conversation
response = chat_with_tools.invoke([
HumanMessage(content="What's the weather like in New York?")
])
print(response)π¨ Chat Model Configuration β
# Advanced chat configuration
chat = ChatOpenAI(
model="gpt-4",
temperature=0.3, # Lower for more focused responses
max_tokens=2000, # Longer responses
top_p=0.8, # Nucleus sampling
frequency_penalty=0.2, # Reduce repetition
presence_penalty=0.1, # Encourage topic diversity
model_kwargs={
"stop": ["\n\n"], # Stop sequences
"logit_bias": {}, # Token probability bias
}
)π Embedding Models (Vector Representations) β
Embedding models convert text into numerical vectors for similarity search and semantic understanding.
π Basic Embedding Usage β
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
# OpenAI embeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-ada-002",
chunk_size=1000 # Process in chunks
)
# Generate embeddings for single text
text = "LangChain is a framework for building AI applications"
vector = embeddings.embed_query(text)
print(f"Vector dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")
# Generate embeddings for multiple documents
documents = [
"Machine learning is a subset of AI",
"Natural language processing analyzes text",
"Deep learning uses neural networks"
]
doc_vectors = embeddings.embed_documents(documents)
print(f"Generated {len(doc_vectors)} document vectors")π§ Local Embedding Models β
# Use local HuggingFace models for privacy
local_embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'},
encode_kwargs={'normalize_embeddings': True}
)
# Same interface as OpenAI
vector = local_embeddings.embed_query("Local embedding example")
print(f"Local vector dimension: {len(vector)}")π Embedding Similarity Comparison β
import numpy as np
def cosine_similarity(vec1, vec2):
"""Calculate cosine similarity between two vectors"""
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
# Compare text similarity
text1 = "Python programming language"
text2 = "Python coding and development"
text3 = "Cooking with vegetables"
vec1 = embeddings.embed_query(text1)
vec2 = embeddings.embed_query(text2)
vec3 = embeddings.embed_query(text3)
print(f"Python texts similarity: {cosine_similarity(vec1, vec2):.3f}")
print(f"Python vs cooking similarity: {cosine_similarity(vec1, vec3):.3f}")π Model Comparison and Selection β
π Model Comparison Table β
| Model Type | Best For | Speed | Cost | Privacy |
|---|---|---|---|---|
| GPT-4 | Complex reasoning, accuracy | Slow | High | Cloud |
| GPT-3.5 | General tasks, speed | Fast | Medium | Cloud |
| Claude | Long context, safety | Medium | Medium | Cloud |
| Local LLMs | Privacy, custom domains | Variable | Low | Full |
| Embeddings | Similarity, search | Fast | Low | Configurable |
π― Choosing the Right Model β
class ModelSelector:
def __init__(self):
self.models = {
'reasoning': ChatOpenAI(model="gpt-4", temperature=0.1),
'creative': ChatOpenAI(model="gpt-4", temperature=0.9),
'fast': ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
'local': Ollama(model="llama2"),
'embeddings': OpenAIEmbeddings()
}
def get_model(self, task_type: str, privacy_required: bool = False):
"""Select appropriate model based on task and requirements"""
if privacy_required:
return self.models['local']
if task_type == 'analysis':
return self.models['reasoning']
elif task_type == 'creative':
return self.models['creative']
elif task_type == 'general':
return self.models['fast']
else:
return self.models['fast']
# Usage
selector = ModelSelector()
analysis_model = selector.get_model('analysis')
private_model = selector.get_model('general', privacy_required=True)β‘ Performance Optimization β
π Caching for Efficiency β
from langchain.cache import InMemoryCache, SQLiteCache
from langchain.globals import set_llm_cache
# In-memory caching
set_llm_cache(InMemoryCache())
# Persistent caching
# set_llm_cache(SQLiteCache(database_path=".langchain.db"))
# Now all LLM calls are cached
llm = ChatOpenAI()
response1 = llm.invoke("What is Python?") # API call
response2 = llm.invoke("What is Python?") # Cached resultπ Batch Processing β
# Process multiple inputs efficiently
async def batch_process():
chat = ChatOpenAI()
# Batch invoke for multiple queries
queries = [
"Explain machine learning",
"What is deep learning?",
"Define natural language processing"
]
# Convert to messages
messages_batch = [[HumanMessage(content=q)] for q in queries]
# Batch processing
responses = await chat.abatch(messages_batch)
for query, response in zip(queries, responses):
print(f"Q: {query}")
print(f"A: {response.content}\n")
# Run batch processing
# import asyncio
# asyncio.run(batch_process())π Monitoring Model Usage β
from langchain.callbacks import get_openai_callback
# Track token usage and costs
with get_openai_callback() as cb:
response = llm.invoke("Explain quantum computing")
print(f"Tokens used: {cb.total_tokens}")
print(f"Cost: ${cb.total_cost:.4f}")π§ Custom Model Integration β
π οΈ Creating Custom Model Wrapper β
from langchain_core.language_models.llms import LLM
from typing import Optional, List, Any
class CustomLLM(LLM):
"""Custom LLM wrapper example"""
model_name: str = "custom-model"
temperature: float = 0.7
@property
def _llm_type(self) -> str:
return "custom"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[Any] = None,
**kwargs: Any,
) -> str:
"""Custom model inference logic"""
# Implement your model calling logic here
# This could be an API call, local model inference, etc.
return f"Custom model response to: {prompt}"
@property
def _identifying_params(self) -> dict:
"""Get the identifying parameters."""
return {"model_name": self.model_name, "temperature": self.temperature}
# Use custom model
custom_model = CustomLLM(temperature=0.8)
response = custom_model.invoke("Hello, custom model!")
print(response)π‘οΈ Error Handling and Retries β
π Robust Model Calls β
from langchain.schema import OutputParserException
import time
class RobustModelCaller:
def __init__(self, model, max_retries=3):
self.model = model
self.max_retries = max_retries
def safe_invoke(self, messages, backoff_factor=2):
"""Invoke model with retry logic"""
for attempt in range(self.max_retries):
try:
return self.model.invoke(messages)
except Exception as e:
if attempt == self.max_retries - 1:
raise e
wait_time = backoff_factor ** attempt
print(f"Attempt {attempt + 1} failed: {e}")
print(f"Retrying in {wait_time} seconds...")
time.sleep(wait_time)
raise Exception("All retry attempts failed")
# Usage
robust_caller = RobustModelCaller(chat)
try:
response = robust_caller.safe_invoke([
HumanMessage(content="Explain AI safety")
])
print(response.content)
except Exception as e:
print(f"Final error: {e}")π― Best Practices β
β Model Selection Guidelines β
Task Complexity
- Simple tasks: GPT-3.5-turbo
- Complex reasoning: GPT-4
- Creative tasks: Higher temperature
Performance Requirements
- Real-time: GPT-3.5-turbo
- Accuracy critical: GPT-4
- Batch processing: Consider cost vs. speed
Privacy & Security
- Sensitive data: Local models
- Public data: Cloud models OK
- Compliance: Check provider terms
Cost Optimization
- Cache frequent queries
- Use appropriate model sizes
- Monitor token usage
π Security Considerations β
import os
from langchain_core.messages import HumanMessage
# Secure API key management
def get_secure_model():
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
raise ValueError("API key not found in environment variables")
return ChatOpenAI(
api_key=api_key,
model="gpt-3.5-turbo",
temperature=0.7
)
# Input sanitization
def sanitize_input(user_input: str) -> str:
"""Basic input sanitization"""
# Remove potentially harmful content
cleaned = user_input.replace("<!--", "").replace("-->", "")
# Limit length
return cleaned[:1000]
# Safe model usage
def safe_model_call(user_input: str):
cleaned_input = sanitize_input(user_input)
model = get_secure_model()
try:
response = model.invoke([HumanMessage(content=cleaned_input)])
return response.content
except Exception as e:
return f"Error processing request: {str(e)}"π Next Steps β
Ready to dive deeper into LangChain models? Continue with:
- Model Providers - Compare different AI providers
- Model Configuration - Advanced tuning and optimization
- Prompt Templates - Create dynamic prompts for your models
- LCEL Basics - Chain models together with LCEL
Key Takeaways:
- Three model types: LLMs (completion), Chat (conversation), Embeddings (vectors)
- Unified interface: Switch between providers easily
- Configuration matters: Temperature, tokens, and other parameters affect output
- Performance optimization: Use caching, batching, and appropriate model selection
- Security first: Protect API keys and sanitize inputs