Lite
Defaultmake demo API → Redis Lua. No Flink. Side projects & <100K eps.
- <10ms budget check
- Rollup worker
- Stripe Meters export
Open source · v2.5.0
Call GET /budget/{id}/check before every LLM request — sub-second enforcement, self-hostable, 1M+ events/sec.
git clone https://github.com/10kshuaizhang/fluxmeter.git
cd fluxmeter
make demoToken usage can run away much faster than traditional metering can react.
The problem
An agent loop burned through about $200 of tokens in under a minute because usage was only checked periodically. By the time the system noticed, the budget was already gone.
If your customers prepay for tokens and you need to cut them off the instant they run out — not 30 seconds later — you need pre-request enforcement, not batch analytics.
The FluxMeter answer
Check budget before each LLM call. Ingest usage after. For streaming, reserve an estimate, then reconcile when the stream ends.
available = balance − held
Effective balance accounts for in-flight streaming holds — so customers are stopped before the next call, not after a delayed batch query catches up.
Pre-request check blocks spend before tokens burn. Post-window deduction keeps balances accurate.
| Layer | Latency | What it does |
|---|---|---|
| Pre-request check | <10ms | GET /budget/{id}/check — blocks before tokens burn |
| Post-window deduction | 10–15s | Flink aggregates → atomic Lua deduction → kill signals |
# Set $50 budget, alert at $5 remaining
curl -X POST localhost:8000/budget/cust_123 \
-H 'Content-Type: application/json' \
-d '{"balance_usd": 50.0, "alert_threshold_usd": 5.0, "max_rpm": 100}'
# Pre-request check — call BEFORE every LLM request
curl "localhost:8000/budget/cust_123/check?estimated_cost_usd=0.05"
# → {"allowed": true, "balance_usd": 47.23, ...}
# → {"allowed": false, "reason": "budget_exhausted"}Start with Lite in one minute. Scale to Flink when volume demands it.
make demo API → Redis Lua. No Flink. Side projects & <100K eps.
make demo-full Kafka → Flink → Redis. 100K–1M eps with span attribution.
make start-saas Control plane on :8001. Tenant CRUD + plan limits scaffold.
Python SDK, HTTP API, or direct Kafka — pick what fits your stack.
from fluxmeter import FluxMeter
meter = FluxMeter(kafka_brokers="localhost:9094")
meter.track_openai("cust_123", openai_response, latency_ms=1200)Full API reference on GitHub docs · OpenAPI at spec/openapi/openapi.yaml
Incremental aggregation, atomic Lua deduction, exactly-once sinks.
[Your App] → [Kafka] → [Flink: aggregation] → [Redis] → [API]
│ │ │ │
SDK/HTTP budget-alerts keyed by Budget check
ingest ← kill signals (customer,model) (3-layer cache)
10s windows O(keys) memory, not O(events)
Microdollar precision — no float drift
SHA-256 + SET NX — no double-billing on replay
Cache → Redis → fail policy — never blocks hot path
Design rationale in docs/DESIGN.md
make load-test runs staged benchmarks from 10K to 1M events/sec.
| Environment | 10K eps | 50K eps | 500K+ target |
|---|---|---|---|
| Local docker-compose | ~9K avg / ~18K peak | ~49K avg / ~92K peak | ~40–45K avg |
| Reference cluster (2 TM) | Stable | Stable | 500K indefinite; 1M bursts |
Methodology and profiles in docs/load-testing.md
Spec and SDKs are the product surface. Engine is the reference implementation.
| Layer | Path | Purpose |
|---|---|---|
| Spec | spec/ | Event schema, OpenAPI, semantic conventions |
| SDKs | sdk/ | Python (PyPI) + JS clients |
| Community | contrib/ | Provider mappings, pricing, connectors |
| Engine | src/ | Flink reference implementation |