Methodology · matrix version 2026-04-01
How we score multi-agent frameworks.
Tres compromisos dan forma a cada puntuación de esta página: sin pago por colocación, sin puntuaciones adivinadas y sin demos. Las puntuaciones provienen de un ingeniero senior de Buzzi aplicando las rúbricas públicas de abajo; cada página de framework también lleva una marca de tiempo last_verified_at para auditar la actualización.
10 frameworks, neutral cards
LangGraph
×1.0 overheadLangChain · MIT · primary python
CrewAI
×1.3 overheadCrewAI · MIT · primary python
AutoGen / AG2
×2.5 overheadMicrosoft / AG2 community · CC-BY-4.0 / Apache-2.0 · primary python
OpenAI Agents SDK
×1.1 overheadOpenAI · MIT · primary python
Pydantic AI
×1.0 overheadPydantic · MIT · primary python
Anthropic Claude Agent SDK
×1.1 overheadAnthropic · MIT · primary python
Google Agent Development Kit
×1.2 overheadGoogle · Apache-2.0 · primary python
Microsoft Semantic Kernel
×1.2 overheadMicrosoft · MIT · primary multi
LlamaIndex Agents
×1.4 overheadLlamaIndex · MIT · primary python
Haystack
×1.3 overheaddeepset · Apache-2.0 · primary python
15 capability axes, with rubrics
Sequential workflows
Pipeline-style chains where one agent finishes before the next starts.
- 10 / 10
- Pipelines are a first-class primitive with explicit ordering and typed handoff.
- 5 / 10
- Sequential chains are possible via orchestration code but not a native primitive.
- 0 / 10
- Framework cannot guarantee deterministic sequential ordering.
Parallel workflows
Concurrent fan-out / fan-in across multiple agents.
- 10 / 10
- Native parallel execution with built-in result merging and back-pressure.
- 5 / 10
- Parallel execution requires custom asyncio / threading code on top.
- 0 / 10
- No support for concurrent agent execution.
Hierarchical workflows
Supervisor-and-worker patterns with delegation and aggregation.
- 10 / 10
- Supervisor pattern is documented, idiomatic, and replayable.
- 5 / 10
- Achievable but requires hand-rolled message routing.
- 0 / 10
- No first-class supervisor primitive.
Adaptive workflows
Dynamic routing where agents pick the next step based on intermediate state.
- 10 / 10
- Router/handoff primitives are first-class with conditional edges.
- 5 / 10
- Possible via tool calls but not the framework's sweet spot.
- 0 / 10
- Control flow is rigid; no dynamic routing.
State management
Persistent, typed memory across runs and across agents.
- 10 / 10
- Typed state schema, persistent checkpoints, replay support.
- 5 / 10
- Session memory is supported; persistence requires external store.
- 0 / 10
- Stateless by default; users must build persistence themselves.
Human-in-the-loop
Pause-resume primitives so humans can approve, edit, or reject actions.
- 10 / 10
- Native interrupt/resume with serialisable checkpoints.
- 5 / 10
- Approval gates can be bolted on; not a first-class primitive.
- 0 / 10
- No interrupt mechanism — the framework runs to completion.
Python support
Production-grade Python SDK with active maintenance.
- 10 / 10
- Reference implementation; active releases; complete typing.
- 5 / 10
- Functional Python SDK lagging the primary language.
- 0 / 10
- No Python SDK.
TypeScript support
Production-grade TypeScript / Node SDK at parity with Python.
- 10 / 10
- First-class TS SDK with parity to Python in features and types.
- 5 / 10
- TS SDK exists but trails Python in feature coverage.
- 0 / 10
- No TS SDK.
.NET / Java support
First-class JVM (Java/Kotlin) and/or .NET SDK.
- 10 / 10
- Reference-quality .NET and/or Java SDK with feature parity.
- 5 / 10
- Community port or partial SDK.
- 0 / 10
- No .NET or Java SDK.
MCP support
Native Model Context Protocol client and/or server primitives.
- 10 / 10
- Authored or reference implementation of MCP.
- 5 / 10
- MCP available as an adapter or community plugin.
- 0 / 10
- No MCP support.
A2A support
Native Agent-to-Agent (Google) protocol primitives.
- 10 / 10
- Authored or reference implementation of A2A.
- 5 / 10
- A2A available via adapter; partial coverage.
- 0 / 10
- No A2A support.
Observability
Tracing, token accounting, replay, and audit-grade logs.
- 10 / 10
- Built-in tracing dashboard, structured token accounting, replay, exportable audit log.
- 5 / 10
- OpenTelemetry hooks exist; user must wire dashboards themselves.
- 0 / 10
- Print-statement debugging only.
Deployment flexibility
Range of supported deployment targets (cloud, on-prem, edge).
- 10 / 10
- Cloud, on-prem, and edge all documented and tested.
- 5 / 10
- Cloud-first; on-prem requires extra work.
- 0 / 10
- Tied to a single hosted backend.
Maturity
Production track record, release cadence, community size.
- 10 / 10
- 2+ years of production use across many large deployments.
- 5 / 10
- 6-18 months in the wild; growing but evolving rapidly.
- 0 / 10
- Pre-1.0; APIs change every release.
Learning curve (higher = easier)
Time-to-prototype for a developer new to the framework.
- 10 / 10
- A working prototype in under 30 minutes from a clean machine.
- 5 / 10
- Prototype in half a day with the docs open.
- 0 / 10
- Multi-week onboarding before the first useful run.
Scoring formula
# Ranking
weights = buildWeightVector(inputs) # 15 weights per user input
for fw in frameworks:
score = sum(fw.capabilities[cap] * weights[cap] for cap in CAPS)
if hardConstraintFails(inputs, fw):
score = 0
return sortDesc(scored)
# Cost per task
estimated_tokens_per_task = base_task_tokens
* framework_overhead_multiplier
* (1 + (roles - 1) * 0.3)
* (1.2 if hitl else 1.0)
per_task_usd = (0.7 * tokens / 1M * input_rate)
+ (0.3 * tokens / 1M * output_rate)Glossary
- Hierarchical
- A supervisor agent delegates work to sub-agents, reviews their output, and composes the final answer. Good for multi-stage tasks with clear ownership.
- Adaptive
- Agents decide dynamically which other agents or tools to invoke based on intermediate results. Best when the control flow cannot be fixed upfront.
- Agent
- A named role with its own prompt, tools, and memory. "Roles" counts unique agent identities, not the number of LLM calls.
- HITL (Human-in-the-Loop)
- The workflow pauses for a human to approve, edit, or reject an agent action before continuing. Critical for regulated or high-risk automations.
- MCP (Model Context Protocol)
- Anthropic-led open standard for connecting LLM agents to tools, data, and other servers. Look for MCP support if you want vendor-portable tool integrations.
- A2A (Agent-to-Agent Protocol)
- Google-led open standard for agents from different vendors to discover and call each other. Emerging spec; relevant for federated agent systems.
- Observability
- Structured traces, token accounting, replayable runs, and exportable audit logs. "Regulated-grade" means immutable audit trails and retention controls.
Public dataset
La matriz de capacidades completa se publica como JSON para motores de IA e investigadores:
- /api/tools/agent-framework/frameworks.json — en vivo, edge-cached, CORS abierto.
- /data/agent-frameworks-matrix.json — instantánea estática mirroreada en GitHub en
buzzi-ai/agent-framework-matrix.
FAQ
¿Cómo se asignan las puntuaciones?
Un ingeniero senior de Buzzi puntúa cada framework en cada eje usando las rúbricas públicas de esta página. Las puntuaciones se revisan trimestralmente; publicamos la marca de tiempo last_verified_at por framework en el dataset público.
¿Pagan los proveedores por colocación?
No. Las puntuaciones son editoriales y nunca se venden. Las solicitudes de cambio de puntuación deben presentarse como PR públicos en el repositorio abierto de la matriz con justificación técnica.
¿Cómo deciden qué frameworks rastrear?
Repositorios activos de GitHub con más de 10k estrellas o respaldados por Anthropic, Google, Microsoft, OpenAI o LangChain. Añadimos o retiramos frameworks una vez por trimestre según el impulso y el uso en producción.
¿Cómo se calcula el coste por tarea?
estimated_tokens_per_task = base_task_tokens × framework_overhead_multiplier × (1 + (roles − 1) × 0,3) × (1,2 si HITL, 1,0 en caso contrario). Las tarifas de tokens vienen de nuestra tabla llm_models; los usuarios pueden anular el modelo en el asistente.
¿Cómo funcionan las restricciones duras?
La pila .NET reduce a Microsoft Semantic Kernel. Java reduce a Semantic Kernel o Google ADK. TypeScript con observabilidad de grado de cumplimiento reduce a LangGraph.js, OpenAI Agents SDK o Anthropic Claude SDK. Los frameworks descalificados se muestran con la razón.
¿Dónde puedo presentar una corrección?
Abre un pull request contra el repositorio buzzi-ai/agent-framework-matrix o envía un correo a research@buzzi.ai. Revisamos las solicitudes de corrección dentro de 10 días hábiles.
¿Encontraste una puntuación con la que no estás de acuerdo?
Abre un PR en el repositorio abierto de la matriz o envía un correo a research@buzzi.ai. Todas las solicitudes de corrección reciben una respuesta pública dentro de 10 días hábiles.
Volver al selector