ENGINEERINGPRODUCT

TradeLoop Research: One Gateway, Three Engines, No Per-User Keys

TradeLoop Engineering·April 29, 2026·7 min read

The Problem with "Research" as an Agent Tool

Serious due diligence on a ticker needs three different research modes:

Synthesized web research with citations — Perplexity. Best when the question requires pulling an answer across many sources ("what's the bull and bear case on this name?").
Page extraction / structured crawl — Firecrawl. Best when you want clean markdown out of a specific filing page, investor-relations site, or a list of URLs.
Real-time social search — Grok over X. Best when freshness matters: what is FinTwit saying in the last hour.

Each has its own API, billing model, key registration, and rate limits. A naive setup makes the trader wire up all three separately, paste keys, and somehow remember which research action maps to which engine.

What TradeLoop Research Ships

A single gateway endpoint per engine, all behind one Bearer token. Premium-tier users don't register anything — TradeLoop's shared keys are pre-funded and metered against your tier quota. BYOK works too, for power users who want to bill providers directly.

agent → /v1/perplexity/chat/completions  ┐
        /v1/firecrawl/crawl              ├─→ ai-gateway → upstream
        /v1/grok/chat/completions        ┘   (shared key, quota check)

The gateway is a single Cloudflare Worker (~3000 lines TS). Each provider proxy is an isolated module — 80-200 lines each. Adding a new research engine is one file: src/providers/<name>.ts that handles auth swap and request rewriting.

How Quotas Work

Every request flow:

1. JWT verify — Supabase user JWT, ES256, 30s clock-skew tolerance, JWKS-based.

2. Tier lookup — single Supabase row read per request. Sub-10ms; a short TTL cache is on the list.

3. Quota precheck — has the user exceeded this provider's tier limit this calendar month?

4. Rate limit — CF Rate Limiting API, key = <user_id>:<provider>, per-tier limiter binding.

5. Forward — request goes upstream with the shared provider key.

6. Record — usage incremented in Supabase usage_counters on success.

Steps 1-4 are shared infra in src/auth.ts + src/quota.ts + src/rate_limit.ts.

Why a Custom Gateway and Not LiteLLM / OpenRouter

LiteLLM and OpenRouter solve the same shape of problem for LLM completions. We considered both and built our own for three reasons:

Per-skill quotas. A user's "embed" usage isn't the same shape as their "perplexity" usage. Model-billing APIs don't generalize cleanly. Our PROVIDERS array and per-provider UNIT_NAME (queries / pages / tokens) keep the accounting honest.

Audit logging. Every call lands in our Tinybird tool_calls_v1 table. We can query "how often did the average user hit Perplexity for due diligence this week?" and tune limits with real data — telemetry that doesn't exist in third-party gateways.

Webhook handling. Firecrawl crawl jobs return results via webhook. The gateway receives, HMAC-verifies, and stores them in Supabase Storage. A pure routing gateway can't do this.

The skill_search Composite

There's a fourth endpoint, /v1/skill-search, that's an interesting case. It takes a query, calls /v1/embed internally for a 1024-d bge-m3 vector, then queries a Tinybird ANN pipe over the skill catalog and returns ranked skill_ids. The daemon's discover tool uses it to pick which skills to surface for a given request. The whole chain — embed query → vector ANN → results — runs in one round-trip, ~200ms p50.

What This Looks Like as a Trader

Limits are accounted in each provider's natural unit (tokens / pages), not "queries":

|---------|-------------------|-----------------|-------------|

| Free | 100K | 100 | 50K |

| Premium | 2M | 5K | 1M |

| Ultra | 20M | 50K | 10M |

Embed quota is 6K / 150K / 1.5M per month respectively (cache hits free). Numbers are starting guesses; we adjust based on real usage telemetry. You hit the same code paths whether you're Free or Ultra — the only difference is one row in Supabase.

Tradeoffs We Made

No real-time stream proxy. Some providers expose SSE for streaming completions. We don't proxy them — the agent gets the full response, not a stream. Tradeoff for simpler auth + quota accounting; might revisit when an agent UI wants progressive rendering of a long research answer.

KV cache only on /v1/embed. Other providers' responses aren't deterministic enough to cache safely. Embed is.

No fallback between providers. If Perplexity is down, the gateway returns 502 — it doesn't auto-route to Firecrawl. The agent makes that call; we don't want gateway behavior to be magic.

The gateway is the "infrastructure" layer of TradeLoop's research stack. The skills (/skills/perplexity/main.py etc.) are what the agent sees as MCP tools. The gateway makes those skills usable for deep due diligence without per-user provider accounts.

Try TradeLoop for free

Connect 50+ tools to Claude, Cursor, and Windsurf in under 5 minutes. No API keys required to get started.

Get Started Free

$curl -fsSL https://tradeloop.top/install.sh | sh

TradeLoop Research: One Gateway, Three Engines, No Per-User Keys

The Problem with "Research" as an Agent Tool

What TradeLoop Research Ships

How Quotas Work

Why a Custom Gateway and Not LiteLLM / OpenRouter

The skill_search Composite

What This Looks Like as a Trader

Tradeoffs We Made

Try TradeLoop for free

More from the blog

Set & Forget: How TradeLoop's Market-Monitoring Loop Works

Your Trading Journal as Durable Agent Memory: An Agent-Maintained Knowledge Base

FinTwit & Reddit Sentiment Tracking for Tickers: An Agent-Driven Approach