Difference between revisions of "AI compute"
KevinYager (talk | contribs) (→Cloud LLM Routers & Inference Providers) |
KevinYager (talk | contribs) (→Energy Use) |
||
Line 83: | Line 83: | ||
** 200 W * 8h = 1,600 Wh = 5,700 kJ | ** 200 W * 8h = 1,600 Wh = 5,700 kJ | ||
** 200 W * 1m = 3 Wh = 10kJ | ** 200 W * 1m = 3 Wh = 10kJ | ||
+ | |||
+ | ==Water Use== | ||
+ | * [https://andymasley.substack.com/p/the-ai-water-issue-is-fake The AI water issue is fake. On the national, local, and personal level.] |
Latest revision as of 15:38, 13 October 2025
Contents
Analysis
- 2024-03: AI and Memory Wall
Cloud GPU
Cloud Training Compute
Cloud LLM Routers & Inference Providers
- OpenRouter (open and closed models, no Enterprise tier)
- LiteLLM (closed models, Enterprise tier)
- Cent ML (open models, Enterprise tier)
- Fireworks AI (open models, Enterprise tier)
- Abacus AI (open and closed models, Enterprise tier)
- Portkey (open? and closed models, Enterprise tier)
- Together AI (open models, Enterprise tier)
- Hyperbolic AI (open models, Enterprise tier)
- Huggingface Inference Providers Hub
- AskSage
- Opencode Zen (for coding agents)
Multi-model with Model Selection
Multi-model Web Chat Interfaces
Multi-model Web Playground Interfaces
Local Router
Acceleration Hardware
- Nvidia GPUs
- Google TPU
- Etched: Transformer ASICs
- Cerebras
- Untether AI
- Graphcore
- SambaNova Systems
- Groq
- Tesla Dojo
- Deep Silicon: Combined hardware/software solution for accelerated AI (e.g. ternary math)
Energy Use
- 2021-04: Carbon Emissions and Large Neural Network Training
- 2023-10: From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
- 2024-01: Electricity 2024: Analysis and forecast to 2026
- 2024-02: The carbon emissions of writing and illustrating are lower for AI than for humans
- 2025-04: Why using ChatGPT is not bad for the environment - a cheat sheet
- A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; examples of 3 Wh energy usage)
- Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.
- 2025-07: Mistral: Our contribution to a global environmental standard for AI
- 2025-08: Measuring the environmental impact of delivering AI at Google Scale (blog)
Examples
- LLM query
- 3 kW * 4s = 3 Wh = 11 kJ
- Human brain
- 20 W * 8h = 106 Wh
- 20 W * 1h = 20 Wh
- 20 W * 10m = 3 Wh = 10 kJ
- Human brain excess thinking
- 2 W * 8h = 11 Wh
- 2 W * 1.7h = 3 Wh
- Regular computer
- 200 W * 8h = 1,600 Wh = 5,700 kJ
- 200 W * 1m = 3 Wh = 10kJ