Latest revision as of 07:54, 16 September 2025

Analysis

Nvidia GPUs
Google TPU
Etched: Transformer ASICs
Cerebras
Untether AI
Graphcore
SambaNova Systems
Groq
Tesla Dojo
Deep Silicon: Combined hardware/software solution for accelerated AI (e.g. ternary math)

LLM query
- 3 kW * 4s = 3 Wh = 11 kJ
Human brain
- 20 W * 8h = 106 Wh
- 20 W * 1h = 20 Wh
- 20 W * 10m = 3 Wh = 10 kJ
Human brain excess thinking
- 2 W * 8h = 11 Wh
- 2 W * 1.7h = 3 Wh
Regular computer
- 200 W * 8h = 1,600 Wh = 5,700 kJ
- 200 W * 1m = 3 Wh = 10kJ

@@ Line 1: / Line 1: @@
+=Analysis=
+* 2024-03: [https://arxiv.org/abs/2403.14123 AI and Memory Wall]
 =Cloud GPU=
 * [https://lambdalabs.com/ Lambda]
@@ Line 21: / Line 24: @@
 * Huggingface [https://huggingface.co/blog/inference-providers Inference Providers Hub]
 * [https://www.asksage.ai/ AskSage]
+* [https://opencode.ai/docs/zen/ Opencode Zen] (for coding agents)
 ==Multi-model with Model Selection==
@@ Line 64: / Line 68: @@
 ** Reading an LLM-generated response (computer running for a few minutes) typically uses more energy than the LLM generation of the text.
 * 2025-07: Mistral: [https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai Our contribution to a global environmental standard for AI]
+* 2025-08: [https://services.google.com/fh/files/misc/measuring_the_environmental_impact_of_delivering_ai_at_google_scale.pdf Measuring the environmental impact of delivering AI at Google Scale] ([https://cloud.google.com/blog/products/infrastructure/measuring-the-environmental-impact-of-ai-inference blog])
+==Examples==
+* '''LLM query'''
+** 3 kW * 4s = 3 Wh = 11 kJ
+* '''Human brain'''
+** 20 W * 8h = 106 Wh
+** 20 W * 1h = 20 Wh
+** 20 W * 10m = 3 Wh = 10 kJ
+* '''Human brain excess thinking'''
+** 2 W * 8h = 11 Wh
+** 2 W * 1.7h = 3 Wh
+* '''Regular computer'''
+** 200 W * 8h = 1,600 Wh = 5,700 kJ
+** 200 W * 1m = 3 Wh = 10kJ