Latest revision as of 07:54, 16 September 2025

Analysis

2024-03: AI and Memory Wall

Cloud GPU

Cloud Training Compute

Cloud LLM Routers & Inference Providers

OpenRouter (open and closed models, no Enterprise tier)
LiteLLM (closed models, Enterprise tier)
Cent ML (open models, Enterprise tier)
Fireworks AI (open models, Enterprise tier)
Abacus AI (open and closed models, Enterprise tier)
Portkey (open? and closed models, Enterprise tier)
Together AI (open models, Enterprise tier)
Hyperbolic AI (open models, Enterprise tier)
Huggingface Inference Providers Hub
AskSage
Opencode Zen (for coding agents)

Multi-model with Model Selection

Multi-model Web Chat Interfaces

Multi-model Web Playground Interfaces

Local Router

Ollama
LocalAI
ai-gradio: unified model interface (based on gradio)

Acceleration Hardware

Nvidia GPUs
Google TPU
Etched: Transformer ASICs
Cerebras
Untether AI
Graphcore
SambaNova Systems
Groq
Tesla Dojo
Deep Silicon: Combined hardware/software solution for accelerated AI (e.g. ternary math)

Energy Use

2021-04: Carbon Emissions and Large Neural Network Training
2023-10: From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
2024-01: Electricity 2024: Analysis and forecast to 2026
2024-02: The carbon emissions of writing and illustrating are lower for AI than for humans
2025-04: Why using ChatGPT is not bad for the environment - a cheat sheet
- A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; examples of 3 Wh energy usage)
- Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.
2025-07: Mistral: Our contribution to a global environmental standard for AI
2025-08: Measuring the environmental impact of delivering AI at Google Scale (blog)

Examples

LLM query
- 3 kW * 4s = 3 Wh = 11 kJ
Human brain
- 20 W * 8h = 106 Wh
- 20 W * 1h = 20 Wh
- 20 W * 10m = 3 Wh = 10 kJ
Human brain excess thinking
- 2 W * 8h = 11 Wh
- 2 W * 1.7h = 3 Wh
Regular computer
- 200 W * 8h = 1,600 Wh = 5,700 kJ
- 200 W * 1m = 3 Wh = 10kJ

@@ Line 1: / Line 1: @@
+=Analysis=
+* 2024-03: [https://arxiv.org/abs/2403.14123 AI and Memory Wall]
-==Cloud GPU==
+=Cloud GPU=
 * [https://lambdalabs.com/ Lambda]
 * [https://vast.ai/ Vast AI]
@@ Line 6: / Line 8: @@
 * [https://www.runpod.io/ RunPod]
 * [https://hpc-ai.com/ HPC-AI]
+=Cloud Training Compute=
+* [https://nebius.ai/ Nebius AI]
+* [https://glaive.ai/ Glaive AI]
+=Cloud LLM Routers & Inference Providers=
+* [https://openrouter.ai/ OpenRouter] (open and closed models, no Enterprise tier)
+* [https://www.litellm.ai/ LiteLLM] (closed models, Enterprise tier)
+* [https://centml.ai/ Cent ML] (open models, Enterprise tier)
+* [https://fireworks.ai/ Fireworks AI] (open models, Enterprise tier)
+* [https://abacus.ai/ Abacus AI] (open and closed models, Enterprise tier)
+* [https://portkey.ai/ Portkey] (open? and closed models, Enterprise tier)
+* [https://www.together.ai/ Together AI] (open models, Enterprise tier)
+* [https://hyperbolic.xyz/ Hyperbolic AI] (open models, Enterprise tier)
+* Huggingface [https://huggingface.co/blog/inference-providers Inference Providers Hub]
+* [https://www.asksage.ai/ AskSage]
+* [https://opencode.ai/docs/zen/ Opencode Zen] (for coding agents)
+==Multi-model with Model Selection==
+* [https://www.notdiamond.ai/ Not Diamond ¬⋄]
+* [https://withmartian.com/ Martian]
+==Multi-model Web Chat Interfaces==
+* [https://simtheory.ai/ SimTheory]
+* [https://abacus.ai/ Abacus AI] [https://chatllm.abacus.ai/ ChatLLM]
+* [https://poe.com/about Poe]
+* [https://gab.ai/ Gab AI]
+* [https://www.vectal.ai/login Vectal] ?
+* [https://www.blackbox.ai/ BlackboxAI]
+==Multi-model Web Playground Interfaces==
+* [https://www.together.ai/ Together AI]
+* [https://hyperbolic.xyz/ Hyperbolic AI]
+=Local Router=
+* [https://ollama.com/ Ollama]
+* [https://github.com/mudler/LocalAI LocalAI]
+* [https://github.com/AK391/ai-gradio ai-gradio]: unified model interface (based on [https://www.gradio.app/ gradio])
+=Acceleration Hardware=
+* [https://www.nvidia.com/ Nvidia] GPUs
+* Google [https://en.wikipedia.org/wiki/Tensor_Processing_Unit TPU]
+* [https://www.etched.com/ Etched]: Transformer ASICs
+* [https://cerebras.ai/ Cerebras]
+* [https://www.untether.ai/ Untether AI]
+* [https://www.graphcore.ai/ Graphcore]
+* [https://sambanova.ai/ SambaNova Systems]
+* [https://groq.com/ Groq]
+* Tesla [https://en.wikipedia.org/wiki/Tesla_Dojo Dojo]
+* [https://deepsilicon.com/ Deep Silicon]: Combined hardware/software solution for accelerated AI ([https://x.com/sdianahu/status/1833186687369023550 e.g.] ternary math)
+=Energy Use=
+* 2021-04: [https://arxiv.org/abs/2104.10350 Carbon Emissions and Large Neural Network Training]
+* 2023-10: [https://arxiv.org/abs/2310.03003 From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference]
+* 2024-01: [https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-bf08-952fdd62118a/Electricity2024-Analysisandforecastto2026.pdf Electricity 2024: Analysis and forecast to 2026]
+* 2024-02: [https://www.nature.com/articles/s41598-024-54271-x The carbon emissions of writing and illustrating are lower for AI than for humans]
+* 2025-04: [https://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about Why using ChatGPT is not bad for the environment - a cheat sheet]
+** A single LLM response uses only ~3 Wh = 11 kJ (~10 Google searches; [https://docs.google.com/document/d/1pDdpPq3MyPdEAoTkho9YABZ0NBEhBH2v4EA98fm3pXQ/edit?usp=sharing examples of 3 Wh energy usage])
+** Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.
+* 2025-07: Mistral: [https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai Our contribution to a global environmental standard for AI]
+* 2025-08: [https://services.google.com/fh/files/misc/measuring_the_environmental_impact_of_delivering_ai_at_google_scale.pdf Measuring the environmental impact of delivering AI at Google Scale] ([https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference blog])
+==Examples==
+* '''LLM query'''
+** 3 kW * 4s = 3 Wh = 11 kJ
+* '''Human brain'''
+** 20 W * 8h = 106 Wh
+** 20 W * 1h = 20 Wh
+** 20 W * 10m = 3 Wh = 10 kJ
+* '''Human brain excess thinking'''
+** 2 W * 8h = 11 Wh
+** 2 W * 1.7h = 3 Wh
+* '''Regular computer'''
+** 200 W * 8h = 1,600 Wh = 5,700 kJ
+** 200 W * 1m = 3 Wh = 10kJ

Difference between revisions of "AI compute"

Latest revision as of 07:54, 16 September 2025

Contents

Analysis

Cloud GPU

Cloud Training Compute

Cloud LLM Routers & Inference Providers

Multi-model with Model Selection

Multi-model Web Chat Interfaces

Multi-model Web Playground Interfaces

Local Router

Acceleration Hardware

Energy Use

Examples

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools