Language Models - Working with LLMs, Chat Models & Embeddings

Master the foundation of LangChain - integrating and working with different types of language models for your AI applications

🎯 Understanding Language Models in LangChain

Language models are the core intelligence behind your LangChain applications. LangChain provides a unified interface to work with different types of models, making it easy to switch between providers and model types.

🤖 Types of Language Models

text

                    🤖 LANGCHAIN MODEL ECOSYSTEM 🤖
                       (Different model types & uses)

    ┌─────────────────────────────────────────────────────────────────┐
    │                    BASE LANGUAGE MODEL                          │
    │                   (Common Interface)                            │
    └─────────────────────┬───────────────────────────────────────────┘
                         │
            ┌────────────┼────────────┐
            │            │            │
    ┌───────▼──────┐ ┌───▼────┐ ┌────▼─────────┐
    │   LLM MODEL  │ │  CHAT  │ │  EMBEDDING   │
    │              │ │ MODEL  │ │   MODEL      │
    │ Text → Text  │ │ Conv   │ │ Text → Vec   │
    │ Completion   │ │ Based  │ │ Similarity   │
    │ Simple I/O   │ │ Roles  │ │ Search       │
    └──────────────┘ └────────┘ └──────────────┘
            │            │            │
            ▼            ▼            ▼
    ┌──────────────────────────────────────────┐
    │           USE CASES                      │
    │                                          │
    │ 📝 Text Generation  💬 Chatbots        │
    │ 📊 Data Analysis    🔍 Q&A Systems     │
    │ 🎨 Creative Writing 🧠 Reasoning       │
    │ 📚 Summarization   🔗 Knowledge Search │
    └──────────────────────────────────────────┘

🔤 LLM Models (Text Completion)

LLM models are traditional completion models that generate text based on a prompt.

📝 Basic LLM Usage

python

from langchain_openai import OpenAI
from langchain_community.llms import Ollama

# OpenAI LLM (GPT-3.5 Instruct)
llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,
    max_tokens=500
)

# Simple text completion
response = llm.invoke("The benefits of renewable energy are")
print(response)

# Local LLM with Ollama
local_llm = Ollama(model="llama2")
response = local_llm.invoke("Explain machine learning in simple terms:")
print(response)

⚙️ LLM Configuration Options

python

# Detailed LLM configuration
llm = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,        # Creativity (0-1)
    max_tokens=1000,        # Response length
    top_p=0.9,             # Nucleus sampling
    frequency_penalty=0.1,  # Reduce repetition
    presence_penalty=0.1,   # Encourage new topics
    n=1,                   # Number of responses
    best_of=1,             # Best of N generations
    streaming=True         # Enable streaming
)

🔄 Streaming LLM Responses

python

# Stream responses for better UX
for chunk in llm.stream("Write a story about AI"):
    print(chunk, end="", flush=True)

💬 Chat Models (Conversation-Based)

Chat models work with structured conversation messages with roles (system, human, assistant).

🗣️ Basic Chat Model Usage

python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

# Initialize chat model
chat = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    max_tokens=500
)

# Single message
response = chat.invoke([HumanMessage(content="What is LangChain?")])
print(response.content)

# Conversation with system message
messages = [
    SystemMessage(content="You are a helpful Python programming tutor."),
    HumanMessage(content="How do I create a list in Python?")
]
response = chat.invoke(messages)
print(response.content)

💭 Message Types and Roles

python

from langchain_core.messages import (
    SystemMessage,
    HumanMessage, 
    AIMessage,
    FunctionMessage,
    ToolMessage
)

# System message - Sets behavior/context
system_msg = SystemMessage(
    content="You are an expert data scientist. Answer questions concisely with examples."
)

# Human message - User input
human_msg = HumanMessage(
    content="What's the difference between supervised and unsupervised learning?"
)

# AI message - Assistant response (for conversation history)
ai_msg = AIMessage(
    content="Supervised learning uses labeled data, unsupervised finds patterns in unlabeled data."
)

# Complete conversation
conversation = [system_msg, human_msg, ai_msg]

# Continue conversation
new_question = HumanMessage(content="Can you give me examples of each?")
conversation.append(new_question)

response = chat.invoke(conversation)
print(response.content)

🔧 Advanced Chat Features

python

# Chat with function calling (for GPT-4)
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"The weather in {location} is sunny, 75°F"

# Bind tools to chat model
chat_with_tools = chat.bind_tools([get_weather])

# Use tools in conversation
response = chat_with_tools.invoke([
    HumanMessage(content="What's the weather like in New York?")
])
print(response)

🎨 Chat Model Configuration

python

# Advanced chat configuration
chat = ChatOpenAI(
    model="gpt-4",
    temperature=0.3,           # Lower for more focused responses
    max_tokens=2000,           # Longer responses
    top_p=0.8,                # Nucleus sampling
    frequency_penalty=0.2,     # Reduce repetition
    presence_penalty=0.1,      # Encourage topic diversity
    model_kwargs={
        "stop": ["\n\n"],      # Stop sequences
        "logit_bias": {},      # Token probability bias
    }
)

🔍 Embedding Models (Vector Representations)

Embedding models convert text into numerical vectors for similarity search and semantic understanding.

📊 Basic Embedding Usage

python

from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings

# OpenAI embeddings
embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    chunk_size=1000  # Process in chunks
)

# Generate embeddings for single text
text = "LangChain is a framework for building AI applications"
vector = embeddings.embed_query(text)
print(f"Vector dimension: {len(vector)}")
print(f"First 5 values: {vector[:5]}")

# Generate embeddings for multiple documents
documents = [
    "Machine learning is a subset of AI",
    "Natural language processing analyzes text",
    "Deep learning uses neural networks"
]
doc_vectors = embeddings.embed_documents(documents)
print(f"Generated {len(doc_vectors)} document vectors")

🧠 Local Embedding Models

python

# Use local HuggingFace models for privacy
local_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

# Same interface as OpenAI
vector = local_embeddings.embed_query("Local embedding example")
print(f"Local vector dimension: {len(vector)}")

📈 Embedding Similarity Comparison

python

import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Compare text similarity
text1 = "Python programming language"
text2 = "Python coding and development"
text3 = "Cooking with vegetables"

vec1 = embeddings.embed_query(text1)
vec2 = embeddings.embed_query(text2)
vec3 = embeddings.embed_query(text3)

print(f"Python texts similarity: {cosine_similarity(vec1, vec2):.3f}")
print(f"Python vs cooking similarity: {cosine_similarity(vec1, vec3):.3f}")

🔄 Model Comparison and Selection

📊 Model Comparison Table

Model Type	Best For	Speed	Cost	Privacy
GPT-4	Complex reasoning, accuracy	Slow	High	Cloud
GPT-3.5	General tasks, speed	Fast	Medium	Cloud
Claude	Long context, safety	Medium	Medium	Cloud
Local LLMs	Privacy, custom domains	Variable	Low	Full
Embeddings	Similarity, search	Fast	Low	Configurable

🎯 Choosing the Right Model

python

class ModelSelector:
    def __init__(self):
        self.models = {
            'reasoning': ChatOpenAI(model="gpt-4", temperature=0.1),
            'creative': ChatOpenAI(model="gpt-4", temperature=0.9),
            'fast': ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
            'local': Ollama(model="llama2"),
            'embeddings': OpenAIEmbeddings()
        }
    
    def get_model(self, task_type: str, privacy_required: bool = False):
        """Select appropriate model based on task and requirements"""
        if privacy_required:
            return self.models['local']
        
        if task_type == 'analysis':
            return self.models['reasoning']
        elif task_type == 'creative':
            return self.models['creative']
        elif task_type == 'general':
            return self.models['fast']
        else:
            return self.models['fast']

# Usage
selector = ModelSelector()
analysis_model = selector.get_model('analysis')
private_model = selector.get_model('general', privacy_required=True)

⚡ Performance Optimization

🚀 Caching for Efficiency

python

from langchain.cache import InMemoryCache, SQLiteCache
from langchain.globals import set_llm_cache

# In-memory caching
set_llm_cache(InMemoryCache())

# Persistent caching
# set_llm_cache(SQLiteCache(database_path=".langchain.db"))

# Now all LLM calls are cached
llm = ChatOpenAI()
response1 = llm.invoke("What is Python?")  # API call
response2 = llm.invoke("What is Python?")  # Cached result

🔄 Batch Processing

python

# Process multiple inputs efficiently
async def batch_process():
    chat = ChatOpenAI()
    
    # Batch invoke for multiple queries
    queries = [
        "Explain machine learning",
        "What is deep learning?",
        "Define natural language processing"
    ]
    
    # Convert to messages
    messages_batch = [[HumanMessage(content=q)] for q in queries]
    
    # Batch processing
    responses = await chat.abatch(messages_batch)
    
    for query, response in zip(queries, responses):
        print(f"Q: {query}")
        print(f"A: {response.content}\n")

# Run batch processing
# import asyncio
# asyncio.run(batch_process())

📊 Monitoring Model Usage

python

from langchain.callbacks import get_openai_callback

# Track token usage and costs
with get_openai_callback() as cb:
    response = llm.invoke("Explain quantum computing")
    print(f"Tokens used: {cb.total_tokens}")
    print(f"Cost: ${cb.total_cost:.4f}")

🔧 Custom Model Integration

🛠️ Creating Custom Model Wrapper

python

from langchain_core.language_models.llms import LLM
from typing import Optional, List, Any

class CustomLLM(LLM):
    """Custom LLM wrapper example"""
    
    model_name: str = "custom-model"
    temperature: float = 0.7
    
    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[Any] = None,
        **kwargs: Any,
    ) -> str:
        """Custom model inference logic"""
        # Implement your model calling logic here
        # This could be an API call, local model inference, etc.
        return f"Custom model response to: {prompt}"
    
    @property
    def _identifying_params(self) -> dict:
        """Get the identifying parameters."""
        return {"model_name": self.model_name, "temperature": self.temperature}

# Use custom model
custom_model = CustomLLM(temperature=0.8)
response = custom_model.invoke("Hello, custom model!")
print(response)

🛡️ Error Handling and Retries

🔄 Robust Model Calls

python

from langchain.schema import OutputParserException
import time

class RobustModelCaller:
    def __init__(self, model, max_retries=3):
        self.model = model
        self.max_retries = max_retries
    
    def safe_invoke(self, messages, backoff_factor=2):
        """Invoke model with retry logic"""
        for attempt in range(self.max_retries):
            try:
                return self.model.invoke(messages)
            
            except Exception as e:
                if attempt == self.max_retries - 1:
                    raise e
                
                wait_time = backoff_factor ** attempt
                print(f"Attempt {attempt + 1} failed: {e}")
                print(f"Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
        
        raise Exception("All retry attempts failed")

# Usage
robust_caller = RobustModelCaller(chat)
try:
    response = robust_caller.safe_invoke([
        HumanMessage(content="Explain AI safety")
    ])
    print(response.content)
except Exception as e:
    print(f"Final error: {e}")

🎯 Best Practices

✅ Model Selection Guidelines

Task Complexity
- Simple tasks: GPT-3.5-turbo
- Complex reasoning: GPT-4
- Creative tasks: Higher temperature
Performance Requirements
- Real-time: GPT-3.5-turbo
- Accuracy critical: GPT-4
- Batch processing: Consider cost vs. speed
Privacy & Security
- Sensitive data: Local models
- Public data: Cloud models OK
- Compliance: Check provider terms
Cost Optimization
- Cache frequent queries
- Use appropriate model sizes
- Monitor token usage

🔒 Security Considerations

python

import os
from langchain_core.messages import HumanMessage

# Secure API key management
def get_secure_model():
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("API key not found in environment variables")
    
    return ChatOpenAI(
        api_key=api_key,
        model="gpt-3.5-turbo",
        temperature=0.7
    )

# Input sanitization
def sanitize_input(user_input: str) -> str:
    """Basic input sanitization"""
    # Remove potentially harmful content
    cleaned = user_input.replace("<!--", "").replace("-->", "")
    # Limit length
    return cleaned[:1000]

# Safe model usage
def safe_model_call(user_input: str):
    cleaned_input = sanitize_input(user_input)
    model = get_secure_model()
    
    try:
        response = model.invoke([HumanMessage(content=cleaned_input)])
        return response.content
    except Exception as e:
        return f"Error processing request: {str(e)}"

🔗 Next Steps

Ready to dive deeper into LangChain models? Continue with:

Model Providers - Compare different AI providers
Model Configuration - Advanced tuning and optimization
Prompt Templates - Create dynamic prompts for your models
LCEL Basics - Chain models together with LCEL

Key Takeaways:

Three model types: LLMs (completion), Chat (conversation), Embeddings (vectors)
Unified interface: Switch between providers easily
Configuration matters: Temperature, tokens, and other parameters affect output
Performance optimization: Use caching, batching, and appropriate model selection
Security first: Protect API keys and sanitize inputs

Language Models - Working with LLMs, Chat Models & Embeddings ​

🎯 Understanding Language Models in LangChain ​

🤖 Types of Language Models ​

🔤 LLM Models (Text Completion) ​

📝 Basic LLM Usage ​

⚙️ LLM Configuration Options ​

🔄 Streaming LLM Responses ​

💬 Chat Models (Conversation-Based) ​

🗣️ Basic Chat Model Usage ​

💭 Message Types and Roles ​

🔧 Advanced Chat Features ​

🎨 Chat Model Configuration ​

🔍 Embedding Models (Vector Representations) ​

📊 Basic Embedding Usage ​

🧠 Local Embedding Models ​

📈 Embedding Similarity Comparison ​

🔄 Model Comparison and Selection ​

📊 Model Comparison Table ​

🎯 Choosing the Right Model ​

⚡ Performance Optimization ​

🚀 Caching for Efficiency ​

🔄 Batch Processing ​

📊 Monitoring Model Usage ​

🔧 Custom Model Integration ​

🛠️ Creating Custom Model Wrapper ​

🛡️ Error Handling and Retries ​

🔄 Robust Model Calls ​

🎯 Best Practices ​

✅ Model Selection Guidelines ​

🔒 Security Considerations ​

🔗 Next Steps ​

Language Models - Working with LLMs, Chat Models & Embeddings

🎯 Understanding Language Models in LangChain

🤖 Types of Language Models

🔤 LLM Models (Text Completion)

📝 Basic LLM Usage

⚙️ LLM Configuration Options

🔄 Streaming LLM Responses

💬 Chat Models (Conversation-Based)

🗣️ Basic Chat Model Usage

💭 Message Types and Roles

🔧 Advanced Chat Features

🎨 Chat Model Configuration

🔍 Embedding Models (Vector Representations)

📊 Basic Embedding Usage

🧠 Local Embedding Models

📈 Embedding Similarity Comparison

🔄 Model Comparison and Selection

📊 Model Comparison Table

🎯 Choosing the Right Model

⚡ Performance Optimization

🚀 Caching for Efficiency

🔄 Batch Processing

📊 Monitoring Model Usage

🔧 Custom Model Integration

🛠️ Creating Custom Model Wrapper

🛡️ Error Handling and Retries

🔄 Robust Model Calls

🎯 Best Practices

✅ Model Selection Guidelines

🔒 Security Considerations

🔗 Next Steps