Error Handling
Error Handling
Section titled “Error Handling”Debug, recover from, and prevent common agent team errors. Includes hooks for quality enforcement and known limitations.
Related skills:
- Orchestrating - Primitives overview and quick reference
- Team Management - Shutdown and cleanup procedures
- Task System - Task status issues
- Messaging - Message debugging
- Spawn Backends - Backend troubleshooting
Common Errors
Section titled “Common Errors”| Error | Cause | Solution |
|---|---|---|
| ”Cannot cleanup with active members” | Teammates still running | Shutdown all teammates first, wait for approval |
| ”Already leading a team” | Team already exists | TeamDelete() first, or use different team name |
| ”Agent not found” | Wrong teammate name | Read config.json for actual names |
| ”Team does not exist” | No team created | Call TeamCreate() first |
| ”team_name is required” | Missing team context | Provide team_name parameter |
| ”Agent type not found” | Invalid subagent_type | Check available agents with proper prefix |
Quality Gate Hooks
Section titled “Quality Gate Hooks”Use hooks to enforce rules when teammates finish work or tasks complete.
TeammateIdle Hook
Section titled “TeammateIdle Hook”Runs when a teammate is about to go idle. Exit with code 2 to send feedback and keep the teammate working.
{ "hooks": { "TeammateIdle": [ { "matcher": "", "hooks": [ { "type": "command", "command": "python3 check_teammate_quality.py" } ] } ] }}Use cases:
- Verify teammate completed all assigned tasks before going idle
- Run linting or tests on teammate’s changes
- Enforce documentation requirements
Exit codes:
0- Allow teammate to go idle normally2- Send feedback to teammate, keep them working
TaskCompleted Hook
Section titled “TaskCompleted Hook”Runs when a task is being marked complete. Exit with code 2 to prevent completion and send feedback.
{ "hooks": { "TaskCompleted": [ { "matcher": "", "hooks": [ { "type": "command", "command": "python3 validate_task_completion.py" } ] } ] }}Use cases:
- Verify tests pass before marking a task complete
- Ensure code quality standards are met
- Validate documentation was updated
Exit codes:
0- Allow task completion2- Prevent completion, send feedback to teammate
Known Limitations
Section titled “Known Limitations”Agent teams are experimental. Current limitations:
-
No session resumption with in-process teammates:
/resumeand/rewinddo not restore in-process teammates. After resuming, the lead may try to message teammates that no longer exist. Tell the lead to spawn new teammates. -
Task status can lag: Teammates sometimes fail to mark tasks as completed, which blocks dependent tasks. Check whether work is done and update status manually, or tell the lead to nudge the teammate.
-
Shutdown can be slow: Teammates finish their current request or tool call before shutting down.
-
One team per session: A lead can only manage one team at a time. Clean up the current team before starting a new one.
-
No nested teams: Teammates cannot spawn their own teams or teammates. Only the lead can manage the team.
-
Lead is fixed: The session that creates the team is the lead for its lifetime. You cannot promote a teammate or transfer leadership.
-
Permissions set at spawn: All teammates start with the lead’s permission mode. You can change individual modes after spawning, but cannot set per-teammate modes at spawn time.
-
Split panes require tmux or iTerm2: Default in-process mode works in any terminal. Split-pane mode isn’t supported in VS Code’s integrated terminal, Windows Terminal, or Ghostty.
Graceful Shutdown Sequence
Section titled “Graceful Shutdown Sequence”See Team Management for the full shutdown procedure. In summary:
// 1. Request shutdown for all teammatesSendMessage({ type: "shutdown_request", recipient: "worker-1", content: "Done" })SendMessage({ type: "shutdown_request", recipient: "worker-2", content: "Done" })
// 2. Wait for shutdown approvals
// 3. Verify no active members
// 4. Only then cleanupTeamDelete()Handling Crashed Teammates
Section titled “Handling Crashed Teammates”Teammates have a 5-minute heartbeat timeout. If a teammate crashes:
- They are automatically marked as inactive after timeout
- Their tasks remain in the task list
- Another teammate can claim their tasks
- Cleanup will work after timeout expires
Recovery Strategies
Section titled “Recovery Strategies”Teammate Stops on Error
Section titled “Teammate Stops on Error”Teammates may stop after encountering errors instead of recovering.
Recovery:
- Check their output using Shift+Up/Down (in-process) or click pane (split mode)
- Give them additional instructions directly
- Or spawn a replacement teammate to continue the work
Lead Starts Implementing Instead of Delegating
Section titled “Lead Starts Implementing Instead of Delegating”The lead sometimes starts doing work itself instead of waiting for teammates.
Recovery: Tell it to wait:
Wait for your teammates to complete their tasks before proceedingOr enable delegate mode to restrict the lead to coordination-only tools.
Lead Shuts Down Prematurely
Section titled “Lead Shuts Down Prematurely”The lead may decide the team is finished before all tasks are complete.
Recovery: Tell it to keep going. You can also tell the lead to wait for teammates to finish before proceeding.
Task Appears Stuck
Section titled “Task Appears Stuck”A task stays in pending even though its dependencies are done.
Recovery:
- Check if the blocking task was actually marked completed
- If work is done but status wasn’t updated, update it manually
- Tell the lead to nudge the teammate
Too Many Permission Prompts
Section titled “Too Many Permission Prompts”Teammate permission requests bubble up to the lead.
Recovery: Pre-approve common operations in your permission settings before spawning teammates.
Orphaned tmux Sessions
Section titled “Orphaned tmux Sessions”A tmux session persists after the team ends.
Recovery:
tmux lstmux kill-session -t <session-name>Debugging Commands
Section titled “Debugging Commands”# Check team configcat ~/.claude/teams/{team}/config.json | jq '.members[] | {name, agentType, backendType}'
# Check teammate inboxescat ~/.claude/teams/{team}/inboxes/{agent}.json | jq '.'
# List all teamsls ~/.claude/teams/
# Check task statescat ~/.claude/tasks/{team}/*.json | jq '{id, subject, status, owner, blockedBy}'
# Watch for new messagestail -f ~/.claude/teams/{team}/inboxes/team-lead.jsonBest Practices for Error Prevention
Section titled “Best Practices for Error Prevention”Handle Worker Failures
Section titled “Handle Worker Failures”- Workers have 5-minute heartbeat timeout
- Tasks of crashed workers can be reclaimed
- Build retry logic into worker prompts
Avoid File Conflicts
Section titled “Avoid File Conflicts”Two teammates editing the same file leads to overwrites. Break work so each teammate owns a different set of files.
Monitor and Steer
Section titled “Monitor and Steer”Check in on teammate progress, redirect approaches that aren’t working, and synthesize findings as they come in. Letting a team run unattended too long increases risk of wasted effort.