v0.2.0 · macOS · Linux · Windows

The terminal AI agentthat runs on any model.

Halcón connects your terminal to Claude, GPT-4o, DeepSeek, Gemini, or any local Ollama model — with 21 built-in tools, layered permissions, and a native MCP server for your IDE.

One-line install

bash
$ curl -sSfL cuervo.cloud/install.sh | sh

SHA-256 verified · no sudo required · installs to ~/.local/bin

Works with every major AI provider

Switch providers and models with a single flag. No reconfiguration, no friction.

Anthropic 3 models

Models

Claude Opus 4.6
Claude Sonnet 4.5
Claude Haiku 4.5
OpenAI 4 models

Models

GPT-4o
GPT-4o mini
o1
o3-mini
DeepSeek 3 models

Models

deepseek-chat
deepseek-coder
deepseek-reasoner
Google Gemini 2 models

Models

Gemini 2.0 Flash
Gemini 2.5 Pro
Ollama 4 models
Local · Free

Models

Llama 3
Mistral
DeepSeek Coder V2
+ any model

Switch with one flag

halcon -p openai chat halcon -p ollama -m llama3 chat

Everything you need to automate your workflow

Production-grade with 2,200+ test coverage. Multi-model. Zero native dependencies.

21 built-in tools

File read/write/edit, bash execution, git operations, web search, regex grep, code symbol extraction, HTTP requests, and background jobs — all wired into the agent loop.

Permission-first by design

Every tool is classified: ReadOnly tools execute silently, ReadWrite and Destructive tools require your explicit confirmation. The agent never runs rm or git commit without you.

TUI Cockpit

A 3-zone terminal UI with live token counters, agent FSM state, plan progress, side panel metrics, and a real-time activity stream. Pause, step, or cancel the agent mid-execution.

Episodic memory

The agent remembers decisions, file paths, and learnings across sessions using BM25 semantic search, temporal decay scoring, and automatic consolidation. No setup required.

MCP Server built in

Run `halcon mcp-server` to expose all 21 tools via JSON-RPC over stdio. Wire it into VS Code, Cursor, or any IDE that supports the Model Context Protocol.

Multi-provider routing

Automatic fallback between providers, latency-aware model selection, and cost tracking per session. Set `balanced`, `fast`, or `cheap` routing strategies in config.

TUI Cockpit

Full situational
awareness.

The 3-zone TUI gives you live token counters, real-time agent FSM state, execution plan progress, side panel metrics, and a scrollable activity stream — all updating as the agent works.

  • Pause, step, or cancel the agent mid-execution
  • Live token cost tracked per round
  • Tool execution timeline with durations
  • Provider fallback indicators
  • Context compression events in real time
halcon chat --tui
halcon · TUI Cockpit
● RUNNING
SESSION a3f8c1 anthropic / claude-sonnet-4-5 R3   ↑1.2k4.8k $0.0024
╌╌ R1
Analyze project structure
⟳ directory_tree 240ms
✓ directory_tree
╌╌ R2
Read key source files
⟳ file_read src/main.rs
✓ file_read
╌╌ R3
◆ Halcón The project has 14 crates. The main entry point...

Plan

Analyze structure
Read sources
Generate report

Tokens

In 1.2k
Out 4.8k
Cost $0.002
Ask anything or give a task… ⏸ Space

Up and running in 60 seconds

1

Install Halcón CLI

Installs to ~/.local/bin/halcon. SHA-256 verified.

$ curl -sSfL cuervo.cloud/install.sh | sh
2

Configure your API key

Stores your key securely in the OS keychain.

$ halcon auth login anthropic
3

Start coding with AI

Or launch the full TUI cockpit with --tui

$ halcon chat "Refactor my login function to use async/await"

Supports Anthropic, OpenAI, DeepSeek, Gemini, and local Ollama.