Model Providers - OpenAI, Anthropic, Hugging Face & Local Models β
Complete guide to integrating different AI model providers with LangChain - compare features, setup, and choose the best provider for your needs
π LangChain Provider Ecosystem β
LangChain supports dozens of model providers, giving you flexibility to choose the best models for your specific needs, budget, and requirements.
π― Provider Categories β
π LANGCHAIN PROVIDER ECOSYSTEM π
(Complete provider landscape)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β βοΈ CLOUD PROVIDERS β
β (API-based, scalable) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β β β
βββββββββΌβββββββ βββββΌβββββ ββββββΌββββββββββ
β OPENAI β βANTHROPICβ β GOOGLE β
β β β CLAUDE β β GEMINI β
β β’ GPT-4/3.5 β β β’ Safetyβ β β’ Multimodal β
β β’ Most popularβ β β’ Long β β β’ Free tier β
β β’ Great docs β β context β β β’ Integrationβ
ββββββββββββββββ ββββββββββ ββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π LOCAL PROVIDERS β
β (Privacy, control, cost) β
β β
β π¦ Ollama π± Hugging Face π Transformers β
β β’ Easy setup β’ Model hub β’ Direct integration β
β β’ No API costs β’ Open source β’ Full control β
β β’ Privacy β’ Community β’ Custom training β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββπ€ OpenAI - The Standard Bearer β
OpenAI provides the most popular and well-documented models, making it the go-to choice for most applications.
π Setup and Configuration β
# Installation
pip install langchain-openaiimport os
from langchain_openai import ChatOpenAI, OpenAI, OpenAIEmbeddings
# Set API key (recommended: use environment variables)
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
# Chat model (recommended for most use cases)
chat_model = ChatOpenAI(
model="gpt-4",
temperature=0.7,
max_tokens=1000,
api_key=os.getenv("OPENAI_API_KEY")
)
# Completion model (for simple text generation)
completion_model = OpenAI(
model="gpt-3.5-turbo-instruct",
temperature=0.7
)
# Embeddings model
embeddings = OpenAIEmbeddings(
model="text-embedding-ada-002"
)π OpenAI Model Options β
| Model | Best For | Context Length | Cost (per 1K tokens) |
|---|---|---|---|
| GPT-4 | Complex reasoning, accuracy | 8K-128K | $0.01-0.06 |
| GPT-3.5-turbo | General tasks, speed | 16K | $0.001-0.002 |
| GPT-4-turbo | Long context, multimodal | 128K | $0.01-0.03 |
| text-embedding-ada-002 | Embeddings | 8K | $0.0001 |
π― OpenAI Advanced Features β
# Function calling (tool use)
from langchain_core.tools import tool
@tool
def calculator(expression: str) -> str:
"""Calculate mathematical expressions safely"""
try:
return str(eval(expression))
except:
return "Invalid expression"
# Bind tools to model
chat_with_tools = ChatOpenAI(model="gpt-4").bind_tools([calculator])
# Vision capabilities (GPT-4V)
from langchain_core.messages import HumanMessage
vision_model = ChatOpenAI(model="gpt-4-vision-preview")
response = vision_model.invoke([
HumanMessage(content=[
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
])
])π° Cost Optimization for OpenAI β
# Cost-aware model selection
class CostOptimizedOpenAI:
def __init__(self):
self.models = {
'cheap': ChatOpenAI(model="gpt-3.5-turbo", max_tokens=500),
'balanced': ChatOpenAI(model="gpt-4", max_tokens=300),
'premium': ChatOpenAI(model="gpt-4", max_tokens=2000)
}
def get_model(self, complexity: str, budget: str):
if budget == 'low':
return self.models['cheap']
elif complexity == 'high':
return self.models['premium']
else:
return self.models['balanced']
optimizer = CostOptimizedOpenAI()
model = optimizer.get_model(complexity='medium', budget='medium')π§ Anthropic Claude - Safety and Long Context β
Anthropic's Claude models excel in safety, nuanced reasoning, and handling very long contexts.
π§ Setup and Configuration β
# Installation
pip install langchain-anthropicimport os
from langchain_anthropic import ChatAnthropic
# Set API key
os.environ["ANTHROPIC_API_KEY"] = "your-api-key-here"
# Claude model
claude = ChatAnthropic(
model="claude-3-sonnet-20240229",
temperature=0.7,
max_tokens=1000,
api_key=os.getenv("ANTHROPIC_API_KEY")
)
# Use Claude
from langchain_core.messages import HumanMessage
response = claude.invoke([
HumanMessage(content="Explain the ethical implications of AI development")
])
print(response.content)π Claude Model Comparison β
| Model | Context Length | Best For | Relative Cost |
|---|---|---|---|
| Claude-3 Haiku | 200K | Speed, simple tasks | Low |
| Claude-3 Sonnet | 200K | Balanced performance | Medium |
| Claude-3 Opus | 200K | Complex reasoning | High |
π‘οΈ Claude Safety Features β
# Claude excels at handling sensitive topics safely
safety_prompt = """
Please provide information about AI safety concerns
while being balanced and educational.
"""
safety_response = claude.invoke([HumanMessage(content=safety_prompt)])
print(safety_response.content)
# Long context handling (up to 200K tokens)
long_document = "..." * 10000 # Very long text
long_context_response = claude.invoke([
HumanMessage(content=f"Summarize this document: {long_document}")
])π Google Gemini - Multimodal and Free Tier β
Google's Gemini models offer strong performance with generous free tiers and multimodal capabilities.
βοΈ Setup and Configuration β
# Installation
pip install langchain-google-genaiimport os
from langchain_google_genai import ChatGoogleGenerativeAI
# Set API key
os.environ["GOOGLE_API_KEY"] = "your-api-key-here"
# Gemini model
gemini = ChatGoogleGenerativeAI(
model="gemini-pro",
temperature=0.7,
convert_system_message_to_human=True # Gemini-specific setting
)
# Use Gemini
response = gemini.invoke([
HumanMessage(content="Explain quantum computing in simple terms")
])
print(response.content)π¨ Gemini Multimodal Capabilities β
# Gemini Vision for image analysis
gemini_vision = ChatGoogleGenerativeAI(model="gemini-pro-vision")
# Analyze images (when available)
# response = gemini_vision.invoke([
# HumanMessage(content=[
# {"type": "text", "text": "Describe this image"},
# {"type": "image_url", "image_url": {"url": "image_url_here"}}
# ])
# ])π Gemini Free Tier Benefits β
# Free tier monitoring
class GeminiUsageTracker:
def __init__(self):
self.requests_today = 0
self.daily_limit = 60 # Free tier limit
def can_make_request(self):
return self.requests_today < self.daily_limit
def make_request(self, prompt):
if not self.can_make_request():
return "Daily limit reached. Please try tomorrow."
self.requests_today += 1
return gemini.invoke([HumanMessage(content=prompt)])
tracker = GeminiUsageTracker()π Local Models - Privacy and Control β
Run models locally for complete privacy, customization, and cost control.
π¦ Ollama - Easy Local Setup β
# Install Ollama
# macOS/Linux: curl -fsSL https://ollama.ai/install.sh | sh
# Windows: Download from ollama.ai
# Pull models
ollama pull llama2
ollama pull codellama
ollama pull mistralfrom langchain_community.llms import Ollama
# Initialize local model
local_llm = Ollama(
model="llama2",
temperature=0.7,
num_predict=256, # Max tokens to generate
top_k=40, # Top-k sampling
top_p=0.9 # Nucleus sampling
)
# Use local model
response = local_llm.invoke("Explain the benefits of local AI models")
print(response)
# List available models
# ollama listπ€ Hugging Face Models β
# Installation
pip install langchain-huggingface transformers torchfrom langchain_huggingface import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# Local Hugging Face model
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Create pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=200,
temperature=0.7,
do_sample=True,
device=-1 # Use CPU, set to 0 for GPU
)
# LangChain wrapper
hf_llm = HuggingFacePipeline(pipeline=pipe)
# Use model
response = hf_llm.invoke("The future of AI is")
print(response)π§ Custom Local Model Integration β
from langchain_core.language_models.llms import LLM
from typing import Optional, List, Any
class CustomLocalLLM(LLM):
model_path: str
def __init__(self, model_path: str):
super().__init__()
self.model_path = model_path
# Initialize your model here
# self.model = load_model(model_path)
@property
def _llm_type(self) -> str:
return "custom_local"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[Any] = None,
**kwargs: Any,
) -> str:
# Your custom inference logic
# return self.model.generate(prompt)
return f"Custom local model response to: {prompt[:50]}..."
# Usage
custom_model = CustomLocalLLM(model_path="/path/to/your/model")π Embedding Providers Comparison β
π Embedding Provider Options β
# OpenAI Embeddings (most popular)
from langchain_openai import OpenAIEmbeddings
openai_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
# Hugging Face Embeddings (free, local)
from langchain_community.embeddings import HuggingFaceEmbeddings
hf_embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# Cohere Embeddings (good for search)
from langchain_community.embeddings import CohereEmbeddings
cohere_embeddings = CohereEmbeddings(
model="embed-english-v2.0",
cohere_api_key="your-cohere-key"
)
# Local Sentence Transformers
from langchain_community.embeddings import SentenceTransformerEmbeddings
st_embeddings = SentenceTransformerEmbeddings(
model_name="all-MiniLM-L6-v2"
)π Embedding Performance Comparison β
| Provider | Dimension | Languages | Speed | Quality | Cost |
|---|---|---|---|---|---|
| OpenAI ada-002 | 1536 | 100+ | Fast | High | Low |
| Sentence-BERT | 384-768 | 50+ | Fast | Good | Free |
| Cohere | 4096 | 100+ | Fast | High | Medium |
| Local ST | Variable | Variable | Medium | Good | Free |
π Provider Switching and Fallbacks β
π Multi-Provider Strategy β
class MultiProviderLLM:
def __init__(self):
self.providers = {
'primary': ChatOpenAI(model="gpt-4"),
'secondary': ChatAnthropic(model="claude-3-sonnet-20240229"),
'fallback': Ollama(model="llama2")
}
self.current_provider = 'primary'
def invoke(self, messages, max_retries=3):
providers_to_try = ['primary', 'secondary', 'fallback']
for provider_name in providers_to_try:
try:
provider = self.providers[provider_name]
return provider.invoke(messages)
except Exception as e:
print(f"{provider_name} failed: {e}")
continue
raise Exception("All providers failed")
# Usage
multi_llm = MultiProviderLLM()
response = multi_llm.invoke([HumanMessage(content="Hello")])β‘ Load Balancing β
import random
from typing import List
class LoadBalancedLLM:
def __init__(self, providers: List[Any]):
self.providers = providers
self.usage_count = {i: 0 for i in range(len(providers))}
def invoke(self, messages):
# Simple round-robin load balancing
provider_idx = min(self.usage_count, key=self.usage_count.get)
provider = self.providers[provider_idx]
try:
response = provider.invoke(messages)
self.usage_count[provider_idx] += 1
return response
except Exception as e:
# Remove failed provider temporarily
return self._fallback_invoke(messages, exclude=[provider_idx])
def _fallback_invoke(self, messages, exclude):
available_providers = [
(i, p) for i, p in enumerate(self.providers)
if i not in exclude
]
for idx, provider in available_providers:
try:
return provider.invoke(messages)
except:
continue
raise Exception("All providers failed")
# Setup load balancer
providers = [
ChatOpenAI(model="gpt-3.5-turbo"),
ChatAnthropic(model="claude-3-haiku-20240307"),
Ollama(model="llama2")
]
load_balancer = LoadBalancedLLM(providers)π° Cost Comparison and Optimization β
π Cost Analysis Tool β
class CostAnalyzer:
def __init__(self):
self.pricing = {
'gpt-4': {'input': 0.03, 'output': 0.06},
'gpt-3.5-turbo': {'input': 0.001, 'output': 0.002},
'claude-3-sonnet': {'input': 0.003, 'output': 0.015},
'gemini-pro': {'input': 0.00025, 'output': 0.0005},
'local': {'input': 0.0, 'output': 0.0}
}
def estimate_cost(self, model: str, input_tokens: int, output_tokens: int):
if model not in self.pricing:
return 0.0
input_cost = (input_tokens / 1000) * self.pricing[model]['input']
output_cost = (output_tokens / 1000) * self.pricing[model]['output']
return input_cost + output_cost
def compare_costs(self, input_tokens: int, output_tokens: int):
costs = {}
for model in self.pricing:
costs[model] = self.estimate_cost(model, input_tokens, output_tokens)
return sorted(costs.items(), key=lambda x: x[1])
# Usage
analyzer = CostAnalyzer()
cost_comparison = analyzer.compare_costs(1000, 500)
print("Cost comparison (cheapest first):")
for model, cost in cost_comparison:
print(f"{model}: ${cost:.4f}")π‘οΈ Security and Privacy Considerations β
π Provider Security Comparison β
| Provider | Data Retention | Privacy Policy | Compliance | Audit Logs |
|---|---|---|---|---|
| OpenAI | 30 days (API) | Public | SOC 2 | Available |
| Anthropic | Not used for training | Strong | SOC 2 | Available |
| Varies by plan | Public | ISO 27001 | Available | |
| Local | Full control | Your choice | Your setup | Your logs |
π‘οΈ Secure Provider Configuration β
class SecureProviderManager:
def __init__(self):
self.secure_configs = {
'openai': {
'timeout': 30,
'max_retries': 3,
'request_timeout': 60
},
'anthropic': {
'timeout': 30,
'max_retries': 3
}
}
def get_secure_model(self, provider: str, sensitive_data: bool = False):
if sensitive_data:
# Force local model for sensitive data
return Ollama(model="llama2")
if provider == 'openai':
return ChatOpenAI(
model="gpt-3.5-turbo",
timeout=self.secure_configs['openai']['timeout'],
max_retries=self.secure_configs['openai']['max_retries']
)
# Add other providers...
return None
secure_manager = SecureProviderManager()π― Provider Selection Guide β
π Decision Matrix β
Use this guide to choose the right provider:
def recommend_provider(requirements: dict) -> str:
"""
Recommend provider based on requirements
Args:
requirements: {
'budget': 'low'|'medium'|'high',
'privacy': 'low'|'medium'|'high',
'complexity': 'low'|'medium'|'high',
'speed': 'low'|'medium'|'high',
'accuracy': 'low'|'medium'|'high'
}
"""
if requirements.get('privacy') == 'high':
return 'local (Ollama/HuggingFace)'
if requirements.get('budget') == 'low':
if requirements.get('accuracy') == 'high':
return 'Google Gemini'
else:
return 'local (Ollama)'
if requirements.get('complexity') == 'high':
return 'OpenAI GPT-4'
if requirements.get('speed') == 'high':
return 'OpenAI GPT-3.5-turbo'
# Balanced option
return 'Anthropic Claude'
# Example usage
requirements = {
'budget': 'medium',
'privacy': 'medium',
'complexity': 'high',
'speed': 'medium',
'accuracy': 'high'
}
recommended = recommend_provider(requirements)
print(f"Recommended provider: {recommended}")π Next Steps β
Ready to configure your chosen models? Continue with:
- Model Configuration - Advanced tuning and optimization
- Prompt Templates - Create effective prompts for any provider
- LCEL Basics - Chain models together regardless of provider
Provider Selection Summary:
- OpenAI: Best overall, great docs, most features
- Anthropic: Safety-focused, long context, nuanced reasoning
- Google: Free tier, multimodal, good performance
- Local: Privacy, control, no API costs, customizable
- Mix & Match: Use different providers for different tasks