Computer Help and Support

usonian

(22,660 posts) Sat Nov 8, 2025, 11:50 PM Nov 8

China's AI Upstart Moonshot Stuns Silicon Valley Again With a $4.6 Million Wonder

Oops.
Moonshot AI's open-source Kimi K2 Thinking model claims GPT-5-class benchmarks for a $4.6 million training bill, forcing U.S. rivals to reconsider their moats.

https://entropytown.com/articles/2025-11-07-kimi-k2-thinking/

Moonshot AI, the two-year-old Beijing lab backed by Alibaba, just released Kimi K2 Thinking, an open-source reasoning model it says can match or beat OpenAI’s GPT-5 and Anthropic’s Claude 4.5 Sonnet—while costing just $4.6 million to train. (CNBC, ZDNet) The reported bill, sourced to an anonymous insider quoted by CNBC and amplified by open-source communities, is roughly equivalent to the fully loaded cost of a small Silicon Valley engineering pod. (CNBC, Reddit)

Built on top of the July Kimi K2 release, the new model leans on the same DeepSeek-derived architecture but layers heavier tool use and autonomous planning. Benchmarks reported by observers show Kimi K2 Thinking hitting GPT-5-class scores on BrowseComp and Humanity’s Last Exam, positioning it as the first fully open model to challenge the Western frontier. (Interconnects, X/Twitter)

snip

Why This Moment Matters
Kimi K2 Thinking vaults Moonshot from a long-context specialist to a frontier competitor. The model posts 44.9% on Humanity’s Last Exam with tools, compared with GPT-5’s 41.7%, and 60.2% on BrowseComp versus GPT-5’s 54.9% and Claude 4.5’s 24.1%. (36Kr, Cybernews, LinkedIn) If the $4.6 million training tab holds up, it signals a dramatic efficiency gap powered by architecture reuse, data curation, and cheaper Chinese compute. Industry watchers still peg U.S. frontier training runs at hundreds of millions to billions of dollars. (Reddit, Gelonghui, Ifeng Tech, ZDNet)

Open-weight access plus near-frontier scores is forcing U.S. incumbents to reassess their moats. As one analyst noted after testing the agentic workflows, the old assumption that Chinese labs trailed by years now looks dangerously outdated. (ZDNet, LinkedIn)

Introduction:
Introducing Kimi K2 Thinking
https://moonshotai.github.io/Kimi-K2/thinking.html

Built as a thinking agent, it reasons step by step while using tools, achieving state-of-the-art performance on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks, with major gains in reasoning, agentic search, coding, writing, and general capabilities.

Kimi K2 Thinking can execute up to 200 – 300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems.

It marks our latest efforts in test-time scaling, by scaling both thinking tokens and tool calling steps.

K2 Thinking is now live on kimi.com under the chat mode [1], with its full agentic mode available soon. It is also accessible through the Kimi K2 Thinking API.

Comparison (meme)

Enormous discussion on Hacker News.
https://news.ycombinator.com/item?id=45836070