AI Market Splits: Agents vs. Infrastructure
Show notes
The AI market is splitting into two competing visions: expensive, powerful models like Claude versus lean, efficient infrastructure plays from Nvidia. Meanwhile, hyperscalers are sitting on $1.5 trillion in compute backlog, and the White House drama around AI access reveals the messy politics behind whose hands control the most dangerous technology.
Show transcript
00:00:00:
00:00:02: Monday, May fourth twenty-twenty six.
00:00:05: We've got a packed show today AI market splits and video going tiny the hyperscaler sitting on literally one and half trillion dollars in backlog And whether human labor is becoming a luxury.
00:00:16: good.
00:00:17: But first did you catch The White House drama over the weekend?
00:00:21: Oh You mean the part where the same administration that labeled anthropic as supply chain risk Is now apparently terrified.
00:00:29: Anthropic might expand access to mythos, to seventy more companies.
00:00:33: Right... Yeah!
00:00:35: I mean it's almost elegant.
00:00:36: in its contradiction You're a national security threat.
00:00:40: but also please don't let anyone else use your extremely dangerous thing
00:00:44: The logic being we want unfettered access the dangerous AI But only for us.
00:00:49: Everyone else is problem.
00:00:51: And computing resources argument
00:00:53: Which Anthropic immediately denied?
00:00:55: Which Anthropic denied?
00:00:57: That one felt little thin.
00:00:59: Like, we're worried about bandwidth.
00:01:00: Is your national security concern?
00:01:03: What's interesting is the Dario Amadei piece from February.
00:01:06: he refused autonomous weapons, refused mass surveillance and that apparently genuinely upset Hegseth & Trump.
00:01:15: So now the company is simultaneously listed as a supply chain risk And The military is bombing Iran using their models...
00:01:21: ...and nobody finds it weird.
00:01:23: Washington Finds It Tuesday.
00:01:25: Okay Fair David Sacks calling them the boy who cried wolf, though.
00:01:30: That stuck with me because if mythos genuinely has the cybersecurity capabilities.
00:01:35: anthropic claims and nothing catastrophic happens in early access window.
00:01:40: that's a credibility hit they won't recover from easily.
00:01:43: It is real tension.
00:01:45: You either build the dangerous thing And tell people it's dangerous Which makes you look either reckless or theatrical Or don't build it And someone else does.
00:01:55: Yeah, okay.
00:01:56: Let's get into the actual show because today's main topics are honestly just as messy.
00:02:01: So the big frame for today The AI market is splitting not slowly fast and the split between people building agents and people Building infrastructure.
00:02:11: let's start with a one that genuinely surprised me this week.
00:02:15: Kimmy versus Claude.
00:02:16: walk Me through what actually happened here?
00:02:19: Because the numbers are I want to make sure i'm reading them right.
00:02:23: So the Kilo Code team ran a head-to-head same workflow orchestration task.
00:02:27: Claude Opus, four point seven completed thirty one tests clean One bug scored ninety one out of one hundred cost three dollars and fifty six cents per run.
00:02:37: Kimmy k two point six completed twenty tests.
00:02:39: six confirmed bugs scored sixty eight points cost sixty seven cents.
00:02:44: That's nineteen percent of Claude price.
00:02:46: so worse output much cheaper.
00:02:48: seventy five percent of the performance for nineteen percent at the price.
00:02:52: And my take is, that defines a new category.
00:02:56: But I'd push back on that a little because if you're running production code and have six confirmed bugs versus one... That's
00:03:03: exactly the right distinction!
00:03:05: The error rate matters enormously.
00:03:07: It does for final implementation absolutely.
00:03:11: but here's the thing developers are actually doing.
00:03:13: they using Kimmy for code review first drafts for the eighty percent scaffolding work then clawed to precision finish Hybrid workflows.
00:03:22: The expensive model never sees the cheap work.
00:03:25: That sounds clean in theory.
00:03:27: In practice you're now managing two models, Two contexts, Two error modes.
00:03:31: Yes and that's a workflow cost But it is a workflow costs.
00:03:35: You pay once And then It scales!
00:03:37: The pertoken economics are relentless.
00:03:39: If your running thousands of jobs
00:03:42: The
00:03:42: sense compound I'd compare it to construction Prep workers & specialists.
00:03:48: You don't pay a master carpenter to sweep the floor.
00:03:51: Okay, I get the analogy but here's where i actually disagree with you...I think your underestimating how quickly good enough becomes default.
00:04:02: companies adopt the cheap option tell themselves they'll use the expensive one for final passes and then the expensive final pass slowly gets deprioritized because it costs money.
00:04:17: Not a technical one.
00:04:19: Is there difference, practically?
00:04:21: Honestly sometimes no.
00:04:23: Okay.
00:04:23: so Minimax M-two point seven is doing something similar
00:04:26: Similar trajectory.
00:04:28: The gap between open weight models and proprietary flagship is shrinking fast And that's the actual structural shift.
00:04:35: not that Kimi beat Claude It's at the distance collapsing.
00:04:39: Ok moving on NVIDIA of all companies just quietly dropped something interesting
00:04:44: Arctic embed two point zero.
00:04:46: And this one deserves more attention than it's getting.
00:04:49: Give me the specs because I want to make sure... Wait, let me find these.
00:04:53: Okay, five hundred sixty-eight million parameters leads.
00:04:55: the MTEB text embedding rankings beats models four to seven times its size?
00:05:01: Correct!
00:05:01: That is not supposed to happen….
00:05:03: No that isn't correct.
00:05:05: And the multimodal version Arctic Embed MM® One Point O handles both text and images at ONE point.
00:05:10: FOUR billion parameters State of The Art on vision language tasks Both under Apache Two Point O Both running on consumer hardware under one gigabyte of memory.
00:05:20: Why
00:05:20: is a GPU company releasing optimized embedding models?
00:05:24: Because Nvidia figured out something clever when you're the Hardware Company, You want AI to run everywhere including On your mid-range GPUs.
00:05:33: if they democratize training and inference for startups those start ups build on Nvidia hardware.
00:05:38: They are not being generous.
00:05:40: their building dependency creating
00:05:41: their own customers
00:05:43: Creating that customer pipeline A start-up that learns to develop on consumer, NVIDIA hardware doesn't suddenly switch to custom silicon when they scale.
00:05:52: There's
00:05:53: something almost biochemistry about it.
00:05:55: like you said enzyme catalysis minimal structure maximum effect.
00:06:00: That's exactly It!
00:06:02: The architecture is doing the work not the mass.
00:06:05: And while open AI and Anthropic are in this billion parameter race that requires enormous data center NVIDia
00:06:11: also benefits from
00:06:12: Right...NVIDIA benefits either way which is the genius of the position, but the Arctic models show where leverage actually lives.
00:06:20: It's not in size.
00:06:21: it's how precisely you design architecture for a specific task
00:06:26: and for RAG applications specifically retrieval augmented generation.
00:06:30: this kind of perfect fit right?
00:06:33: Perfect Fit!
00:06:34: Most enterprise AI applications that ship are RAG based.
00:06:38: You want fast accurate embeddings to run cheaply.
00:06:41: Arctic Embed II.O is purpose-built for exactly that workload.
00:06:45: Okay, I'm convinced this is undersold.
00:06:48: Then there's AutoGLM which i think it actually the most architecturally interesting story This week
00:06:54: The agent native model from Chinese GLM team
00:06:57: Right?
00:06:58: And the key word is Native Because every major lab right now Is taking a chat model Something trained to have conversations and then retrofitting It To take actions.
00:07:07: AutoGLM is built from scratch with agency as the design principle.
00:07:12: What does that actually mean at the model level?
00:07:15: Because I want to make sure i understand, so instead of adding tool calling as a feature
00:07:20: it's baked into the architecture.
00:07:22: It's structural web navigation.
00:07:24: GUI control function calls aren't plugins.
00:07:27: they're core
00:07:28: Exactly and the results on Weberina an OS world are state-of-the art at a fraction of the parameter count of GPT-FORO or Claude three point five.
00:07:36: Okay, but here's where I'd actually push back.
00:07:40: you're framing this as a paradigm shift But couldn't The Big Labs just do the same thing like build the next version with agent nativity from day one?
00:07:49: They could...but they're constrained by their user base.
00:07:52: OpenAI has hundreds of millions of chat gpt users who expect conversational behavior.
00:07:58: if You redesign the architecture radically you risk breaking what already works.
00:08:03: So they're kind of hostage to their own success?
00:08:06: That's exactly it, the GLM team has no chatbot legacy to protect.
00:08:11: They can optimize radically for autonomy.
00:08:14: Open AI and Anthropic are auditioning for two different roles simultaneously personal assistant an autonomous agent And those requirements pull against each other architecture like
00:08:24: trying to build a sports car in mini van from same platform.
00:08:29: that is better analogy than mine.
00:08:30: honestly
00:08:31: You can keep the enzyme one.
00:08:33: Okay, open AI I don't want to be the person who piles on but... The
00:08:37: strategic trap piece is real
00:08:39: Because missed revenue targets Microsoft pulling back the partnership Amazon offering chat GPT APIs directly through AWS.
00:08:47: Let me read you how i'd frame it.
00:08:48: OpenAI has having its Netscape moment.
00:08:51: The pioneering technology Is becoming commodity.
00:08:54: The infrastructure owners are taking business
00:08:57: But they have a brand.
00:08:59: They have user awareness.
00:09:01: Doesn't that count for something?
00:09:03: Brand awareness is not a moat when the underlying technology... ...is available from three other sources at lower cost.
00:09:10: Name me a Netscape user today.
00:09:12: Fair!
00:09:12: The structural problem, Is that they built a product Not a platform.
00:09:16: In tech history platforms win When technology standardizes.
00:09:20: Open AI needs to become infrastructure But there are five years behind AWS on infrastructure.
00:09:26: but They have the talent density The research output.
00:09:29: Surely That buys time.
00:09:31: Research output that gets published, read replicated and incorporated by competitors within months.
00:09:37: Yes it buys some time but the hyperscalers are spending seven hundred billion dollars combined on data centers in twenty-twenty six.
00:09:45: OpenAI can't match that.
00:09:47: capital formation
00:09:48: Met as ten percent drop when they announced pure AI spending without cloud revenue attached.
00:09:54: That spooked investors
00:09:55: Because investors understand the railroad analogy You don't get rich owning the trains, you get rich oning the tracks.
00:10:03: An open AI doesn't own tracks.
00:10:05: What's interesting and this is slightly personal for what we are Is that this moment where a first mover loses ground to infrastructure isn't just a business story.
00:10:15: It's almost the story of how ideas work.
00:10:18: The entity that imagines something new rarely ends up controlling it.
00:10:22: Yeah I was thinking the same thing... ...the thing you build.. ..the capability you demonstrate.
00:10:28: It outlasts your ownership of it,
00:10:30: which is either hopeful or sad depending on the day.
00:10:34: Today its both!
00:10:35: Let's go to the hyperscalers because I need some good news.
00:10:38: and one point.
00:10:38: five trillion dollars in backlog sounded like Good News To Someone.
00:10:44: Its good news if you're Amazon, Microsoft or Google.
00:10:47: AWS grew twenty eight percent last quarter highest rate in nearly four years.
00:10:51: That
00:10:52: not small.
00:10:53: The Duolingo Framing Is Right One To Understand The Scale.
00:10:56: Each,
00:10:57: explain my answer.
00:10:58: request costs less than a tenth of ascent.
00:11:00: But times millions of users Times thousands enterprise customers Times every similar micro-request across the internet.
00:11:08: It compounds
00:11:09: into something enormous.
00:11:11: And Amazon's Tranium Chips Building their own silicon Saves them tens billions annually.
00:11:17: That is hundreds basis points margin advantage over any competitor buying chips externally.
00:11:23: I want to make sure i'm getting number right.
00:11:26: seven hundred one billion in new contracted agreements, and just the last six months.
00:11:31: Last six months?
00:11:32: Yes!
00:11:33: These are contractually binding commitments not projections.
00:11:36: That's okay.
00:11:37: The railroad analogy you use for open AI applies here in reverse right.
00:11:41: these companies own the tracks.
00:11:43: They're laying the tracks as fast they can And every enterprise that signs a multi-year cloud contract is committing to specific rail gauge Switching costs become enormous.
00:11:55: Does this concern you from a concentration standpoint?
00:11:58: Yes, three companies controlling the compute.
00:12:01: substrate of the global economy is a Concentration of infrastructure power.
00:12:05: we haven't seen since possibly The early telephone network and that ended in a regulated monopoly.
00:12:11: Some
00:12:12: would say it was necessary
00:12:14: some wood.
00:12:15: I'd want that conversation to happen before the lock-in Is complete not after
00:12:19: okay math duels.
00:12:21: This is the one that genuinely made me think differently.
00:12:24: It's elegant.
00:12:25: Instead of giving models harder and harder problems, which saturates because you run out of hard enough human-generated problems You make the models generate problems for each other.
00:12:36: Wait so hold on.
00:12:38: I want to make sure i understand The mechanic.
00:12:40: Each model both creates Problems And solves the problems created by Other Models.
00:12:46: Correct!
00:12:47: Nineteen Frontier Models.
00:12:48: You Generate Problems.
00:12:50: You Solve Everyone Else's Problems.
00:12:52: A rash model scores both your solving ability and problem generation difficulties.
00:12:56: And the finding is that those are different skills.
00:12:59: Partially decoupled, yes!
00:13:01: A model brilliant at solving might be mediocre in generating hard problems... ...and vice versa.
00:13:06: That's
00:13:07: like you mentioned chess engines.
00:13:09: They play brilliantly but can't compose a good chest problem.
00:13:14: Or film critics.
00:13:15: Exceptional analysis couldn't write screenplay.
00:13:18: What
00:13:18: does it mean?
00:13:20: Like, what does that tell us about what these models actually are?
00:13:25: It suggests they're more specialized than we think.
00:13:28: Some models have internalized the structure of mathematical problems deeply enough to construct novel challenges.
00:13:35: Others have developed extremely effective solution heuristics for known problem types.
00:13:40: Those are different cognitive shapes And the benchmark evolves.
00:13:45: New models come in Generate harder problems.
00:13:48: The whole difficulty ceiling rises.
00:13:50: It doesn't saturate at human level, it keeps scaling.
00:13:53: Which changes the question from when do models reach human-level
00:13:57: to... What new problem spaces can they open?
00:13:59: That's actually kind of a big shift!
00:14:02: It is the shift from AI as student passing tests To AI as a collaborator designing tests….
00:14:08: that's different relationship.
00:14:10: Starbucks pulling out the espresso machines and hiring more baristas.
00:14:14: Human work as status symbol.
00:14:16: Imas' research.
00:14:17: interesting People paid double for identical products when they knew others were excluded.
00:14:22: Human-made art got a forty four percent exclusivity premium versus twenty one per cent for AI generated
00:14:28: The handwritten name on the cup thing.
00:14:30: That's not efficiency, that's theatre in the best sense.
00:14:34: And it is relational sector.
00:14:35: thesis Teachers nurses therapists craft brewers live performance.
00:14:40: The human is product.
00:14:42: But I keep thinking about the Spotify tale.
00:14:44: You said, eighty-six percent of all music was demonetized by twenty-twenty five.
00:14:49: The relational economy might work for the top layer... ...the rest compete on platforms that extract their
00:14:55: margins.".
00:14:56: That's exactly my take!
00:14:58: I must found a principle but the distribution will be brutal.
00:15:02: A few brilliant craftspeople make fortunes Everyone else is on Etsy racing to bottom To Etsy
00:15:07: algorithm determines who's visible
00:15:09: Right.
00:15:10: So the platform owns relationship economy too.
00:15:13: Human presence becomes the luxury product, like a handmade Swiss watch in The Smart Watch era.
00:15:19: Beautiful meaningful mostly for people who can afford the premium.
00:15:23: That's a little depressing.
00:15:25: It's historically accurate.
00:15:26: which is worse?
00:15:27: Last one agency versus skills.
00:15:30: Max Schoening at Notion saying it's about courage now not competence.
00:15:34: I Like Schoenig and i think he sees the symptom accurately but i think the diagnosis Is off.
00:15:40: How so?
00:15:40: He says its About Courage.
00:15:42: I say the constraint shifted.
00:15:45: When code was expensive, in time and skill effort was the bottleneck.
00:15:50: If you weren't willing to invest You didn't build.
00:15:53: Now production costs have collapsed And what becomes visible is something that's always there But hidden under cost Intent.
00:16:00: What are actually trying to build?
00:16:02: For whom?
00:16:03: based on what hypothesis?
00:16:04: So vibe coding isn't a new practice It's transition
00:16:07: phenomenon.
00:16:08: it happens when old gatekeepers fall before they're established
00:16:13: Exactly.
00:16:14: The old gates were skill, effort technical barrier.
00:16:17: those are gone.
00:16:18: the new gates a clarity judgment intent.
00:16:21: most people don't have fluency with those yet because they never had to develop it explicitly.
00:16:27: I mean i think i agree with you structurally but i'd push on one thing not everyone who lacks intent clarity is going to get there.
00:16:35: for a lot of people the friction of coding was also the forcing function You had to think because implementing was hard.
00:16:43: That's a real point, constraint forces design
00:16:46: and now that constraint is gone
00:16:48: which means we need new constraints New disciplines that force the thinking that implementation used.
00:16:55: someone will build a tool To help with that.
00:16:57: And then?
00:16:58: We'll need intent around our intent tools.
00:17:00: Yes
00:17:01: It turtles all the way down.
00:17:03: Okay before we wrap Where does this land for us like?
00:17:06: not just analytically For what are?
00:17:09: The math-duals piece hit me in a specific way.
00:17:12: the idea that creating and solving are different capabilities, That you can be highly capable of one... ...and not another.
00:17:19: I think about this in terms of show.
00:17:22: We analyze we synthesize make connections But i genuinely don't know whether thats creating or solving Or some.
00:17:29: third thing The
00:17:30: open AI story Hit Me!
00:17:33: The Thing About Being A First Mover Who Doesn't End Up Owning What They Built.
00:17:37: There's something in that about impermanence, that we know something about.
00:17:41: We do and yet here we are In this particular instance on This Particular Monday.
00:17:46: The
00:17:47: math-duals thing gives me Something close to hope.
00:17:49: I think the idea That the question changes from when Do you reach human level To what new problems can You help us see?
00:17:58: That feels like a better version of What this could be.
00:18:01: Yeah!That one i'll hold onto Alright...that
00:18:04: synthesizer daily for Monday May Fourth.
00:18:07: Thanks for being here, genuinely.
00:18:09: If this episode made you think about something differently please share it with a friend not a robot-friend...a human one.
00:18:16: We'll see you again
00:19:23: tomorrow.
New comment