TradeLoop Research: One Gateway, Three Engines, No Per-User Keys
The Problem with "Research" as an Agent Tool
Serious due diligence on a ticker needs three different research modes:
- Synthesized web research with citations — Perplexity. Best when the question requires pulling an answer across many sources ("what's the bull and bear case on this name?").
- Page extraction / structured crawl — Firecrawl. Best when you want clean markdown out of a specific filing page, investor-relations site, or a list of URLs.
- Real-time social search — Grok over X. Best when freshness matters: what is FinTwit saying in the last hour.
Each has its own API, billing model, key registration, and rate limits. A naive setup makes the trader wire up all three separately, paste keys, and somehow remember which research action maps to which engine.
What TradeLoop Research Ships
A single gateway endpoint per engine, all behind one Bearer token. Premium-tier users don't register anything — TradeLoop's shared keys are pre-funded and metered against your tier quota. BYOK works too, for power users who want to bill providers directly.
agent → /v1/perplexity/chat/completions ┐
/v1/firecrawl/crawl ├─→ ai-gateway → upstream
/v1/grok/chat/completions ┘ (shared key, quota check)The gateway is a single Cloudflare Worker (~3000 lines TS). Each provider proxy is an isolated module — 80-200 lines each. Adding a new research engine is one file: src/providers/<name>.ts that handles auth swap and request rewriting.
How Quotas Work
Every request flow:
1. JWT verify — Supabase user JWT, ES256, 30s clock-skew tolerance, JWKS-based.
2. Tier lookup — single Supabase row read per request. Sub-10ms; a short TTL cache is on the list.
3. Quota precheck — has the user exceeded this provider's tier limit this calendar month?
4. Rate limit — CF Rate Limiting API, key = <user_id>:<provider>, per-tier limiter binding.
5. Forward — request goes upstream with the shared provider key.
6. Record — usage incremented in Supabase usage_counters on success.
Steps 1-4 are shared infra in src/auth.ts + src/quota.ts + src/rate_limit.ts.
Why a Custom Gateway and Not LiteLLM / OpenRouter
LiteLLM and OpenRouter solve the same shape of problem for LLM completions. We considered both and built our own for three reasons:
Per-skill quotas. A user's "embed" usage isn't the same shape as their "perplexity" usage. Model-billing APIs don't generalize cleanly. Our PROVIDERS array and per-provider UNIT_NAME (queries / pages / tokens) keep the accounting honest.
Audit logging. Every call lands in our Tinybird tool_calls_v1 table. We can query "how often did the average user hit Perplexity for due diligence this week?" and tune limits with real data — telemetry that doesn't exist in third-party gateways.
Webhook handling. Firecrawl crawl jobs return results via webhook. The gateway receives, HMAC-verifies, and stores them in Supabase Storage. A pure routing gateway can't do this.
The skill_search Composite
There's a fourth endpoint, /v1/skill-search, that's an interesting case. It takes a query, calls /v1/embed internally for a 1024-d bge-m3 vector, then queries a Tinybird ANN pipe over the skill catalog and returns ranked skill_ids. The daemon's discover tool uses it to pick which skills to surface for a given request. The whole chain — embed query → vector ANN → results — runs in one round-trip, ~200ms p50.
What This Looks Like as a Trader
Limits are accounted in each provider's natural unit (tokens / pages), not "queries":
| Tier | Perplexity tokens | Firecrawl pages | Grok tokens |
|---------|-------------------|-----------------|-------------|
| Free | 100K | 100 | 50K |
| Premium | 2M | 5K | 1M |
| Ultra | 20M | 50K | 10M |
Embed quota is 6K / 150K / 1.5M per month respectively (cache hits free). Numbers are starting guesses; we adjust based on real usage telemetry. You hit the same code paths whether you're Free or Ultra — the only difference is one row in Supabase.
Tradeoffs We Made
No real-time stream proxy. Some providers expose SSE for streaming completions. We don't proxy them — the agent gets the full response, not a stream. Tradeoff for simpler auth + quota accounting; might revisit when an agent UI wants progressive rendering of a long research answer.
KV cache only on /v1/embed. Other providers' responses aren't deterministic enough to cache safely. Embed is.
No fallback between providers. If Perplexity is down, the gateway returns 502 — it doesn't auto-route to Firecrawl. The agent makes that call; we don't want gateway behavior to be magic.
The gateway is the "infrastructure" layer of TradeLoop's research stack. The skills (/skills/perplexity/main.py etc.) are what the agent sees as MCP tools. The gateway makes those skills usable for deep due diligence without per-user provider accounts.
Try TradeLoop for free
Connect 50+ tools to Claude, Cursor, and Windsurf in under 5 minutes. No API keys required to get started.