Skip to content

Language Models - Working with LLMs, Chat Models & Embeddings ​

Master the foundation of LangChain - integrating and working with different types of language models for your AI applications

🎯 Understanding Language Models in LangChain ​

Language models are the core intelligence behind your LangChain applications. LangChain provides a unified interface to work with different types of models, making it easy to switch between providers and model types.

πŸ€– Types of Language Models ​

text
                    πŸ€– LANGCHAIN MODEL ECOSYSTEM πŸ€–
                       (Different model types & uses)

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                    BASE LANGUAGE MODEL                          β”‚
    β”‚                   (Common Interface)                            β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚            β”‚            β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   LLM MODEL  β”‚ β”‚  CHAT  β”‚ β”‚  EMBEDDING   β”‚
    β”‚              β”‚ β”‚ MODEL  β”‚ β”‚   MODEL      β”‚
    β”‚ Text β†’ Text  β”‚ β”‚ Conv   β”‚ β”‚ Text β†’ Vec   β”‚
    β”‚ Completion   β”‚ β”‚ Based  β”‚ β”‚ Similarity   β”‚
    β”‚ Simple I/O   β”‚ β”‚ Roles  β”‚ β”‚ Search       β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚            β”‚            β”‚
            β–Ό            β–Ό            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚           USE CASES                      β”‚
    β”‚                                          β”‚
    β”‚ πŸ“ Text Generation  πŸ’¬ Chatbots        β”‚
    β”‚ πŸ“Š Data Analysis    πŸ” Q&A Systems     β”‚
    β”‚ 🎨 Creative Writing 🧠 Reasoning       β”‚
    β”‚ πŸ“š Summarization   πŸ”— Knowledge Search β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”€ LLM Models (Text Completion) ​

LLM models are traditional completion models that generate text based on a prompt.

πŸ“ Basic LLM Usage ​

python
from langchain_openai import OpenAI
from langchain_community.llms import Ollama

# OpenAI LLM (GPT-3.5 Instruct)
llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,
    max_tokens=500
)

# Simple text completion
response = llm.invoke("The benefits of renewable energy are")
print(response)

# Local LLM with Ollama
local_llm = Ollama(model="llama2")
response = local_llm.invoke("Explain machine learning in simple terms:")
print(response)

βš™οΈ LLM Configuration Options ​

python
# Detailed LLM configuration
llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,        # Creativity (0-1)
    max_tokens=1000,        # Response length
    top_p=0.9,             # Nucleus sampling
    frequency_penalty=0.1,  # Reduce repetition
    presence_penalty=0.1,   # Encourage new topics
    n=1,                   # Number of responses
    best_of=1,             # Best of N generations
    streaming=True         # Enable streaming
)

πŸ”„ Streaming LLM Responses ​

python
# Stream responses for better UX
for chunk in llm.stream("Write a story about AI"):
    print(chunk, end="", flush=True)

πŸ’¬ Chat Models (Conversation-Based) ​

Chat models work with structured conversation messages with roles (system, human, assistant).

πŸ—£οΈ Basic Chat Model Usage ​

python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

# Initialize chat model
chat = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    max_tokens=500
)

# Single message
response = chat.invoke([HumanMessage(content="What is LangChain?")])
print(response.content)

# Conversation with system message
messages = [
    SystemMessage(content="You are a helpful Python programming tutor."),
    HumanMessage(content="How do I create a list in Python?")
]
response = chat.invoke(messages)
print(response.content)

πŸ’­ Message Types and Roles ​

python
from langchain_core.messages import (
    SystemMessage,
    HumanMessage, 
    AIMessage,
    FunctionMessage,
    ToolMessage
)

# System message - Sets behavior/context
system_msg = SystemMessage(
    content="You are an expert data scientist. Answer questions concisely with examples."
)

# Human message - User input
human_msg = HumanMessage(
    content="What's the difference between supervised and unsupervised learning?"
)

# AI message - Assistant response (for conversation history)
ai_msg = AIMessage(
    content="Supervised learning uses labeled data, unsupervised finds patterns in unlabeled data."
)

# Complete conversation
conversation = [system_msg, human_msg, ai_msg]

# Continue conversation
new_question = HumanMessage(content="Can you give me examples of each?")
conversation.append(new_question)

response = chat.invoke(conversation)
print(response.content)

πŸ”§ Advanced Chat Features ​

python
# Chat with function calling (for GPT-4)
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"The weather in {location} is sunny, 75Β°F"

# Bind tools to chat model
chat_with_tools = chat.bind_tools([get_weather])

# Use tools in conversation
response = chat_with_tools.invoke([
    HumanMessage(content="What's the weather like in New York?")
])
print(response)

🎨 Chat Model Configuration ​

python
# Advanced chat configuration
chat = ChatOpenAI(
    model="gpt-4",
    temperature=0.3,           # Lower for more focused responses
    max_tokens=2000,           # Longer responses
    top_p=0.8,                # Nucleus sampling
    frequency_penalty=0.2,     # Reduce repetition
    presence_penalty=0.1,      # Encourage topic diversity
    model_kwargs={
        "stop": ["\n\n"],      # Stop sequences
        "logit_bias": {},      # Token probability bias
    }
)

πŸ” Embedding Models (Vector Representations) ​

Embedding models convert text into numerical vectors for similarity search and semantic understanding.

πŸ“Š Basic Embedding Usage ​

python
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings

# OpenAI embeddings
embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    chunk_size=1000  # Process in chunks
)

# Generate embeddings for single text
text = "LangChain is a framework for building AI applications"
vector = embeddings.embed_query(text)
print(f"Vector dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")

# Generate embeddings for multiple documents
documents = [
    "Machine learning is a subset of AI",
    "Natural language processing analyzes text",
    "Deep learning uses neural networks"
]
doc_vectors = embeddings.embed_documents(documents)
print(f"Generated {len(doc_vectors)} document vectors")

🧠 Local Embedding Models ​

python
# Use local HuggingFace models for privacy
local_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

# Same interface as OpenAI
vector = local_embeddings.embed_query("Local embedding example")
print(f"Local vector dimension: {len(vector)}")

πŸ“ˆ Embedding Similarity Comparison ​

python
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Compare text similarity
text1 = "Python programming language"
text2 = "Python coding and development"
text3 = "Cooking with vegetables"

vec1 = embeddings.embed_query(text1)
vec2 = embeddings.embed_query(text2)
vec3 = embeddings.embed_query(text3)

print(f"Python texts similarity: {cosine_similarity(vec1, vec2):.3f}")
print(f"Python vs cooking similarity: {cosine_similarity(vec1, vec3):.3f}")

πŸ”„ Model Comparison and Selection ​

πŸ“Š Model Comparison Table ​

Model TypeBest ForSpeedCostPrivacy
GPT-4Complex reasoning, accuracySlowHighCloud
GPT-3.5General tasks, speedFastMediumCloud
ClaudeLong context, safetyMediumMediumCloud
Local LLMsPrivacy, custom domainsVariableLowFull
EmbeddingsSimilarity, searchFastLowConfigurable

🎯 Choosing the Right Model ​

python
class ModelSelector:
    def __init__(self):
        self.models = {
            'reasoning': ChatOpenAI(model="gpt-4", temperature=0.1),
            'creative': ChatOpenAI(model="gpt-4", temperature=0.9),
            'fast': ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
            'local': Ollama(model="llama2"),
            'embeddings': OpenAIEmbeddings()
        }
    
    def get_model(self, task_type: str, privacy_required: bool = False):
        """Select appropriate model based on task and requirements"""
        if privacy_required:
            return self.models['local']
        
        if task_type == 'analysis':
            return self.models['reasoning']
        elif task_type == 'creative':
            return self.models['creative']
        elif task_type == 'general':
            return self.models['fast']
        else:
            return self.models['fast']

# Usage
selector = ModelSelector()
analysis_model = selector.get_model('analysis')
private_model = selector.get_model('general', privacy_required=True)

⚑ Performance Optimization ​

πŸš€ Caching for Efficiency ​

python
from langchain.cache import InMemoryCache, SQLiteCache
from langchain.globals import set_llm_cache

# In-memory caching
set_llm_cache(InMemoryCache())

# Persistent caching
# set_llm_cache(SQLiteCache(database_path=".langchain.db"))

# Now all LLM calls are cached
llm = ChatOpenAI()
response1 = llm.invoke("What is Python?")  # API call
response2 = llm.invoke("What is Python?")  # Cached result

πŸ”„ Batch Processing ​

python
# Process multiple inputs efficiently
async def batch_process():
    chat = ChatOpenAI()
    
    # Batch invoke for multiple queries
    queries = [
        "Explain machine learning",
        "What is deep learning?",
        "Define natural language processing"
    ]
    
    # Convert to messages
    messages_batch = [[HumanMessage(content=q)] for q in queries]
    
    # Batch processing
    responses = await chat.abatch(messages_batch)
    
    for query, response in zip(queries, responses):
        print(f"Q: {query}")
        print(f"A: {response.content}\n")

# Run batch processing
# import asyncio
# asyncio.run(batch_process())

πŸ“Š Monitoring Model Usage ​

python
from langchain.callbacks import get_openai_callback

# Track token usage and costs
with get_openai_callback() as cb:
    response = llm.invoke("Explain quantum computing")
    print(f"Tokens used: {cb.total_tokens}")
    print(f"Cost: ${cb.total_cost:.4f}")

πŸ”§ Custom Model Integration ​

πŸ› οΈ Creating Custom Model Wrapper ​

python
from langchain_core.language_models.llms import LLM
from typing import Optional, List, Any

class CustomLLM(LLM):
    """Custom LLM wrapper example"""
    
    model_name: str = "custom-model"
    temperature: float = 0.7
    
    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[Any] = None,
        **kwargs: Any,
    ) -> str:
        """Custom model inference logic"""
        # Implement your model calling logic here
        # This could be an API call, local model inference, etc.
        return f"Custom model response to: {prompt}"
    
    @property
    def _identifying_params(self) -> dict:
        """Get the identifying parameters."""
        return {"model_name": self.model_name, "temperature": self.temperature}

# Use custom model
custom_model = CustomLLM(temperature=0.8)
response = custom_model.invoke("Hello, custom model!")
print(response)

πŸ›‘οΈ Error Handling and Retries ​

πŸ”„ Robust Model Calls ​

python
from langchain.schema import OutputParserException
import time

class RobustModelCaller:
    def __init__(self, model, max_retries=3):
        self.model = model
        self.max_retries = max_retries
    
    def safe_invoke(self, messages, backoff_factor=2):
        """Invoke model with retry logic"""
        for attempt in range(self.max_retries):
            try:
                return self.model.invoke(messages)
            
            except Exception as e:
                if attempt == self.max_retries - 1:
                    raise e
                
                wait_time = backoff_factor ** attempt
                print(f"Attempt {attempt + 1} failed: {e}")
                print(f"Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
        
        raise Exception("All retry attempts failed")

# Usage
robust_caller = RobustModelCaller(chat)
try:
    response = robust_caller.safe_invoke([
        HumanMessage(content="Explain AI safety")
    ])
    print(response.content)
except Exception as e:
    print(f"Final error: {e}")

🎯 Best Practices ​

βœ… Model Selection Guidelines ​

  1. Task Complexity

    • Simple tasks: GPT-3.5-turbo
    • Complex reasoning: GPT-4
    • Creative tasks: Higher temperature
  2. Performance Requirements

    • Real-time: GPT-3.5-turbo
    • Accuracy critical: GPT-4
    • Batch processing: Consider cost vs. speed
  3. Privacy & Security

    • Sensitive data: Local models
    • Public data: Cloud models OK
    • Compliance: Check provider terms
  4. Cost Optimization

    • Cache frequent queries
    • Use appropriate model sizes
    • Monitor token usage

πŸ”’ Security Considerations ​

python
import os
from langchain_core.messages import HumanMessage

# Secure API key management
def get_secure_model():
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("API key not found in environment variables")
    
    return ChatOpenAI(
        api_key=api_key,
        model="gpt-3.5-turbo",
        temperature=0.7
    )

# Input sanitization
def sanitize_input(user_input: str) -> str:
    """Basic input sanitization"""
    # Remove potentially harmful content
    cleaned = user_input.replace("<!--", "").replace("-->", "")
    # Limit length
    return cleaned[:1000]

# Safe model usage
def safe_model_call(user_input: str):
    cleaned_input = sanitize_input(user_input)
    model = get_secure_model()
    
    try:
        response = model.invoke([HumanMessage(content=cleaned_input)])
        return response.content
    except Exception as e:
        return f"Error processing request: {str(e)}"

πŸ”— Next Steps ​

Ready to dive deeper into LangChain models? Continue with:


Key Takeaways:

  • Three model types: LLMs (completion), Chat (conversation), Embeddings (vectors)
  • Unified interface: Switch between providers easily
  • Configuration matters: Temperature, tokens, and other parameters affect output
  • Performance optimization: Use caching, batching, and appropriate model selection
  • Security first: Protect API keys and sanitize inputs

Released under the MIT License.