Skip to content

Monitoring & Observability - LangChain in Production ​

Learn how to monitor, log, and observe LangChain applications for reliability, performance, and troubleshooting

πŸ“ˆ Monitoring Overview ​

Monitoring is essential for production LangChain systems to ensure reliability, detect issues, and optimize performance. This guide covers metrics, logging, tracing, alerting, and observability patterns.


πŸ“Š Key Metrics to Track ​

  • LLM Latency: Time taken for LLM calls
  • Chain Throughput: Number of requests processed per second
  • Error Rate: Failed requests, exceptions
  • Resource Usage: CPU, memory, GPU utilization
  • Vector DB Performance: Query latency, index health

πŸ“ Logging Best Practices ​

  • Use structured logging (JSON, key-value pairs)
  • Log request/response, errors, and performance data
  • Integrate with log aggregators (ELK, Azure Monitor, AWS CloudWatch)
python
import logging
import json

logger = logging.getLogger("langchain")
logger.setLevel(logging.INFO)

# Structured log example
def log_request(request, response, latency):
    log_entry = {
        "request": request,
        "response": response,
        "latency": latency
    }
    logger.info(json.dumps(log_entry))

πŸ” Distributed Tracing ​

  • Use tracing tools (OpenTelemetry, Jaeger, Azure Application Insights)
  • Trace LLM calls, retrieval, and chain execution
  • Visualize traces for bottleneck analysis
python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
span_processor = BatchSpanProcessor(ConsoleSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)

# Example trace
with tracer.start_as_current_span("llm_call"):
    # LLM call logic here
    pass

🚨 Alerting & Incident Response ​

  • Set up alerts for high error rates, latency, resource exhaustion
  • Use cloud alerting (Azure Monitor Alerts, AWS CloudWatch Alarms)
  • Automate incident response and escalation

πŸ› οΈ Observability Patterns ​

  • Health checks and readiness probes
  • Real-time dashboards (Grafana, Azure Dashboards)
  • Automated anomaly detection

🧩 Example: FastAPI Health Check Endpoint ​

python
from fastapi import FastAPI

app = FastAPI()

@app.get("/health")
def health():
    return {"status": "ok"}

πŸ”— Next Steps ​


Key Monitoring Takeaways:

  • Track latency, throughput, errors, and resource usage
  • Use structured logging and distributed tracing
  • Set up alerts and automate incident response
  • Build dashboards for real-time observability
  • Continuously improve monitoring coverage

Released under the MIT License.