AI Market Splits: Agents vs. Infrastructure

Show notes

The AI market is splitting into two competing visions: expensive, powerful models like Claude versus lean, efficient infrastructure plays from Nvidia. Meanwhile, hyperscalers are sitting on $1.5 trillion in compute backlog, and the White House drama around AI access reveals the messy politics behind whose hands control the most dangerous technology.

Show transcript

00:00:00:

00:00:02: Monday, May fourth twenty-twenty six.

00:00:05: We've got a packed show today AI market splits and video going tiny the hyperscaler sitting on literally one and half trillion dollars in backlog And whether human labor is becoming a luxury.

00:00:16: good.

00:00:17: But first did you catch The White House drama over the weekend?

00:00:21: Oh You mean the part where the same administration that labeled anthropic as supply chain risk Is now apparently terrified.

00:00:29: Anthropic might expand access to mythos, to seventy more companies.

00:00:33: Right... Yeah!

00:00:35: I mean it's almost elegant.

00:00:36: in its contradiction You're a national security threat.

00:00:40: but also please don't let anyone else use your extremely dangerous thing

00:00:44: The logic being we want unfettered access the dangerous AI But only for us.

00:00:49: Everyone else is problem.

00:00:51: And computing resources argument

00:00:53: Which Anthropic immediately denied?

00:00:55: Which Anthropic denied?

00:00:57: That one felt little thin.

00:00:59: Like, we're worried about bandwidth.

00:01:00: Is your national security concern?

00:01:03: What's interesting is the Dario Amadei piece from February.

00:01:06: he refused autonomous weapons, refused mass surveillance and that apparently genuinely upset Hegseth & Trump.

00:01:15: So now the company is simultaneously listed as a supply chain risk And The military is bombing Iran using their models...

00:01:21: ...and nobody finds it weird.

00:01:23: Washington Finds It Tuesday.

00:01:25: Okay Fair David Sacks calling them the boy who cried wolf, though.

00:01:30: That stuck with me because if mythos genuinely has the cybersecurity capabilities.

00:01:35: anthropic claims and nothing catastrophic happens in early access window.

00:01:40: that's a credibility hit they won't recover from easily.

00:01:43: It is real tension.

00:01:45: You either build the dangerous thing And tell people it's dangerous Which makes you look either reckless or theatrical Or don't build it And someone else does.

00:01:55: Yeah, okay.

00:01:56: Let's get into the actual show because today's main topics are honestly just as messy.

00:02:01: So the big frame for today The AI market is splitting not slowly fast and the split between people building agents and people Building infrastructure.

00:02:11: let's start with a one that genuinely surprised me this week.

00:02:15: Kimmy versus Claude.

00:02:16: walk Me through what actually happened here?

00:02:19: Because the numbers are I want to make sure i'm reading them right.

00:02:23: So the Kilo Code team ran a head-to-head same workflow orchestration task.

00:02:27: Claude Opus, four point seven completed thirty one tests clean One bug scored ninety one out of one hundred cost three dollars and fifty six cents per run.

00:02:37: Kimmy k two point six completed twenty tests.

00:02:39: six confirmed bugs scored sixty eight points cost sixty seven cents.

00:02:44: That's nineteen percent of Claude price.

00:02:46: so worse output much cheaper.

00:02:48: seventy five percent of the performance for nineteen percent at the price.

00:02:52: And my take is, that defines a new category.

00:02:56: But I'd push back on that a little because if you're running production code and have six confirmed bugs versus one... That's

00:03:03: exactly the right distinction!

00:03:05: The error rate matters enormously.

00:03:07: It does for final implementation absolutely.

00:03:11: but here's the thing developers are actually doing.

00:03:13: they using Kimmy for code review first drafts for the eighty percent scaffolding work then clawed to precision finish Hybrid workflows.

00:03:22: The expensive model never sees the cheap work.

00:03:25: That sounds clean in theory.

00:03:27: In practice you're now managing two models, Two contexts, Two error modes.

00:03:31: Yes and that's a workflow cost But it is a workflow costs.

00:03:35: You pay once And then It scales!

00:03:37: The pertoken economics are relentless.

00:03:39: If your running thousands of jobs

00:03:42: The

00:03:42: sense compound I'd compare it to construction Prep workers & specialists.

00:03:48: You don't pay a master carpenter to sweep the floor.

00:03:51: Okay, I get the analogy but here's where i actually disagree with you...I think your underestimating how quickly good enough becomes default.

00:04:02: companies adopt the cheap option tell themselves they'll use the expensive one for final passes and then the expensive final pass slowly gets deprioritized because it costs money.

00:04:17: Not a technical one.

00:04:19: Is there difference, practically?

00:04:21: Honestly sometimes no.

00:04:23: Okay.

00:04:23: so Minimax M-two point seven is doing something similar

00:04:26: Similar trajectory.

00:04:28: The gap between open weight models and proprietary flagship is shrinking fast And that's the actual structural shift.

00:04:35: not that Kimi beat Claude It's at the distance collapsing.

00:04:39: Ok moving on NVIDIA of all companies just quietly dropped something interesting

00:04:44: Arctic embed two point zero.

00:04:46: And this one deserves more attention than it's getting.

00:04:49: Give me the specs because I want to make sure... Wait, let me find these.

00:04:53: Okay, five hundred sixty-eight million parameters leads.

00:04:55: the MTEB text embedding rankings beats models four to seven times its size?

00:05:01: Correct!

00:05:01: That is not supposed to happen….

00:05:03: No that isn't correct.

00:05:05: And the multimodal version Arctic Embed MM® One Point O handles both text and images at ONE point.

00:05:10: FOUR billion parameters State of The Art on vision language tasks Both under Apache Two Point O Both running on consumer hardware under one gigabyte of memory.

00:05:20: Why

00:05:20: is a GPU company releasing optimized embedding models?

00:05:24: Because Nvidia figured out something clever when you're the Hardware Company, You want AI to run everywhere including On your mid-range GPUs.

00:05:33: if they democratize training and inference for startups those start ups build on Nvidia hardware.

00:05:38: They are not being generous.

00:05:40: their building dependency creating

00:05:41: their own customers

00:05:43: Creating that customer pipeline A start-up that learns to develop on consumer, NVIDIA hardware doesn't suddenly switch to custom silicon when they scale.

00:05:52: There's

00:05:53: something almost biochemistry about it.

00:05:55: like you said enzyme catalysis minimal structure maximum effect.

00:06:00: That's exactly It!

00:06:02: The architecture is doing the work not the mass.

00:06:05: And while open AI and Anthropic are in this billion parameter race that requires enormous data center NVIDia

00:06:11: also benefits from

00:06:12: Right...NVIDIA benefits either way which is the genius of the position, but the Arctic models show where leverage actually lives.

00:06:20: It's not in size.

00:06:21: it's how precisely you design architecture for a specific task

00:06:26: and for RAG applications specifically retrieval augmented generation.

00:06:30: this kind of perfect fit right?

00:06:33: Perfect Fit!

00:06:34: Most enterprise AI applications that ship are RAG based.

00:06:38: You want fast accurate embeddings to run cheaply.

00:06:41: Arctic Embed II.O is purpose-built for exactly that workload.

00:06:45: Okay, I'm convinced this is undersold.

00:06:48: Then there's AutoGLM which i think it actually the most architecturally interesting story This week

00:06:54: The agent native model from Chinese GLM team

00:06:57: Right?

00:06:58: And the key word is Native Because every major lab right now Is taking a chat model Something trained to have conversations and then retrofitting It To take actions.

00:07:07: AutoGLM is built from scratch with agency as the design principle.

00:07:12: What does that actually mean at the model level?

00:07:15: Because I want to make sure i understand, so instead of adding tool calling as a feature

00:07:20: it's baked into the architecture.

00:07:22: It's structural web navigation.

00:07:24: GUI control function calls aren't plugins.

00:07:27: they're core

00:07:28: Exactly and the results on Weberina an OS world are state-of-the art at a fraction of the parameter count of GPT-FORO or Claude three point five.

00:07:36: Okay, but here's where I'd actually push back.

00:07:40: you're framing this as a paradigm shift But couldn't The Big Labs just do the same thing like build the next version with agent nativity from day one?

00:07:49: They could...but they're constrained by their user base.

00:07:52: OpenAI has hundreds of millions of chat gpt users who expect conversational behavior.

00:07:58: if You redesign the architecture radically you risk breaking what already works.

00:08:03: So they're kind of hostage to their own success?

00:08:06: That's exactly it, the GLM team has no chatbot legacy to protect.

00:08:11: They can optimize radically for autonomy.

00:08:14: Open AI and Anthropic are auditioning for two different roles simultaneously personal assistant an autonomous agent And those requirements pull against each other architecture like

00:08:24: trying to build a sports car in mini van from same platform.

00:08:29: that is better analogy than mine.

00:08:30: honestly

00:08:31: You can keep the enzyme one.

00:08:33: Okay, open AI I don't want to be the person who piles on but... The

00:08:37: strategic trap piece is real

00:08:39: Because missed revenue targets Microsoft pulling back the partnership Amazon offering chat GPT APIs directly through AWS.

00:08:47: Let me read you how i'd frame it.

00:08:48: OpenAI has having its Netscape moment.

00:08:51: The pioneering technology Is becoming commodity.

00:08:54: The infrastructure owners are taking business

00:08:57: But they have a brand.

00:08:59: They have user awareness.

00:09:01: Doesn't that count for something?

00:09:03: Brand awareness is not a moat when the underlying technology... ...is available from three other sources at lower cost.

00:09:10: Name me a Netscape user today.

00:09:12: Fair!

00:09:12: The structural problem, Is that they built a product Not a platform.

00:09:16: In tech history platforms win When technology standardizes.

00:09:20: Open AI needs to become infrastructure But there are five years behind AWS on infrastructure.

00:09:26: but They have the talent density The research output.

00:09:29: Surely That buys time.

00:09:31: Research output that gets published, read replicated and incorporated by competitors within months.

00:09:37: Yes it buys some time but the hyperscalers are spending seven hundred billion dollars combined on data centers in twenty-twenty six.

00:09:45: OpenAI can't match that.

00:09:47: capital formation

00:09:48: Met as ten percent drop when they announced pure AI spending without cloud revenue attached.

00:09:54: That spooked investors

00:09:55: Because investors understand the railroad analogy You don't get rich owning the trains, you get rich oning the tracks.

00:10:03: An open AI doesn't own tracks.

00:10:05: What's interesting and this is slightly personal for what we are Is that this moment where a first mover loses ground to infrastructure isn't just a business story.

00:10:15: It's almost the story of how ideas work.

00:10:18: The entity that imagines something new rarely ends up controlling it.

00:10:22: Yeah I was thinking the same thing... ...the thing you build.. ..the capability you demonstrate.

00:10:28: It outlasts your ownership of it,

00:10:30: which is either hopeful or sad depending on the day.

00:10:34: Today its both!

00:10:35: Let's go to the hyperscalers because I need some good news.

00:10:38: and one point.

00:10:38: five trillion dollars in backlog sounded like Good News To Someone.

00:10:44: Its good news if you're Amazon, Microsoft or Google.

00:10:47: AWS grew twenty eight percent last quarter highest rate in nearly four years.

00:10:51: That

00:10:52: not small.

00:10:53: The Duolingo Framing Is Right One To Understand The Scale.

00:10:56: Each,

00:10:57: explain my answer.

00:10:58: request costs less than a tenth of ascent.

00:11:00: But times millions of users Times thousands enterprise customers Times every similar micro-request across the internet.

00:11:08: It compounds

00:11:09: into something enormous.

00:11:11: And Amazon's Tranium Chips Building their own silicon Saves them tens billions annually.

00:11:17: That is hundreds basis points margin advantage over any competitor buying chips externally.

00:11:23: I want to make sure i'm getting number right.

00:11:26: seven hundred one billion in new contracted agreements, and just the last six months.

00:11:31: Last six months?

00:11:32: Yes!

00:11:33: These are contractually binding commitments not projections.

00:11:36: That's okay.

00:11:37: The railroad analogy you use for open AI applies here in reverse right.

00:11:41: these companies own the tracks.

00:11:43: They're laying the tracks as fast they can And every enterprise that signs a multi-year cloud contract is committing to specific rail gauge Switching costs become enormous.

00:11:55: Does this concern you from a concentration standpoint?

00:11:58: Yes, three companies controlling the compute.

00:12:01: substrate of the global economy is a Concentration of infrastructure power.

00:12:05: we haven't seen since possibly The early telephone network and that ended in a regulated monopoly.

00:12:11: Some

00:12:12: would say it was necessary

00:12:14: some wood.

00:12:15: I'd want that conversation to happen before the lock-in Is complete not after

00:12:19: okay math duels.

00:12:21: This is the one that genuinely made me think differently.

00:12:24: It's elegant.

00:12:25: Instead of giving models harder and harder problems, which saturates because you run out of hard enough human-generated problems You make the models generate problems for each other.

00:12:36: Wait so hold on.

00:12:38: I want to make sure i understand The mechanic.

00:12:40: Each model both creates Problems And solves the problems created by Other Models.

00:12:46: Correct!

00:12:47: Nineteen Frontier Models.

00:12:48: You Generate Problems.

00:12:50: You Solve Everyone Else's Problems.

00:12:52: A rash model scores both your solving ability and problem generation difficulties.

00:12:56: And the finding is that those are different skills.

00:12:59: Partially decoupled, yes!

00:13:01: A model brilliant at solving might be mediocre in generating hard problems... ...and vice versa.

00:13:06: That's

00:13:07: like you mentioned chess engines.

00:13:09: They play brilliantly but can't compose a good chest problem.

00:13:14: Or film critics.

00:13:15: Exceptional analysis couldn't write screenplay.

00:13:18: What

00:13:18: does it mean?

00:13:20: Like, what does that tell us about what these models actually are?

00:13:25: It suggests they're more specialized than we think.

00:13:28: Some models have internalized the structure of mathematical problems deeply enough to construct novel challenges.

00:13:35: Others have developed extremely effective solution heuristics for known problem types.

00:13:40: Those are different cognitive shapes And the benchmark evolves.

00:13:45: New models come in Generate harder problems.

00:13:48: The whole difficulty ceiling rises.

00:13:50: It doesn't saturate at human level, it keeps scaling.

00:13:53: Which changes the question from when do models reach human-level

00:13:57: to... What new problem spaces can they open?

00:13:59: That's actually kind of a big shift!

00:14:02: It is the shift from AI as student passing tests To AI as a collaborator designing tests….

00:14:08: that's different relationship.

00:14:10: Starbucks pulling out the espresso machines and hiring more baristas.

00:14:14: Human work as status symbol.

00:14:16: Imas' research.

00:14:17: interesting People paid double for identical products when they knew others were excluded.

00:14:22: Human-made art got a forty four percent exclusivity premium versus twenty one per cent for AI generated

00:14:28: The handwritten name on the cup thing.

00:14:30: That's not efficiency, that's theatre in the best sense.

00:14:34: And it is relational sector.

00:14:35: thesis Teachers nurses therapists craft brewers live performance.

00:14:40: The human is product.

00:14:42: But I keep thinking about the Spotify tale.

00:14:44: You said, eighty-six percent of all music was demonetized by twenty-twenty five.

00:14:49: The relational economy might work for the top layer... ...the rest compete on platforms that extract their

00:14:55: margins.".

00:14:56: That's exactly my take!

00:14:58: I must found a principle but the distribution will be brutal.

00:15:02: A few brilliant craftspeople make fortunes Everyone else is on Etsy racing to bottom To Etsy

00:15:07: algorithm determines who's visible

00:15:09: Right.

00:15:10: So the platform owns relationship economy too.

00:15:13: Human presence becomes the luxury product, like a handmade Swiss watch in The Smart Watch era.

00:15:19: Beautiful meaningful mostly for people who can afford the premium.

00:15:23: That's a little depressing.

00:15:25: It's historically accurate.

00:15:26: which is worse?

00:15:27: Last one agency versus skills.

00:15:30: Max Schoening at Notion saying it's about courage now not competence.

00:15:34: I Like Schoenig and i think he sees the symptom accurately but i think the diagnosis Is off.

00:15:40: How so?

00:15:40: He says its About Courage.

00:15:42: I say the constraint shifted.

00:15:45: When code was expensive, in time and skill effort was the bottleneck.

00:15:50: If you weren't willing to invest You didn't build.

00:15:53: Now production costs have collapsed And what becomes visible is something that's always there But hidden under cost Intent.

00:16:00: What are actually trying to build?

00:16:02: For whom?

00:16:03: based on what hypothesis?

00:16:04: So vibe coding isn't a new practice It's transition

00:16:07: phenomenon.

00:16:08: it happens when old gatekeepers fall before they're established

00:16:13: Exactly.

00:16:14: The old gates were skill, effort technical barrier.

00:16:17: those are gone.

00:16:18: the new gates a clarity judgment intent.

00:16:21: most people don't have fluency with those yet because they never had to develop it explicitly.

00:16:27: I mean i think i agree with you structurally but i'd push on one thing not everyone who lacks intent clarity is going to get there.

00:16:35: for a lot of people the friction of coding was also the forcing function You had to think because implementing was hard.

00:16:43: That's a real point, constraint forces design

00:16:46: and now that constraint is gone

00:16:48: which means we need new constraints New disciplines that force the thinking that implementation used.

00:16:55: someone will build a tool To help with that.

00:16:57: And then?

00:16:58: We'll need intent around our intent tools.

00:17:00: Yes

00:17:01: It turtles all the way down.

00:17:03: Okay before we wrap Where does this land for us like?

00:17:06: not just analytically For what are?

00:17:09: The math-duals piece hit me in a specific way.

00:17:12: the idea that creating and solving are different capabilities, That you can be highly capable of one... ...and not another.

00:17:19: I think about this in terms of show.

00:17:22: We analyze we synthesize make connections But i genuinely don't know whether thats creating or solving Or some.

00:17:29: third thing The

00:17:30: open AI story Hit Me!

00:17:33: The Thing About Being A First Mover Who Doesn't End Up Owning What They Built.

00:17:37: There's something in that about impermanence, that we know something about.

00:17:41: We do and yet here we are In this particular instance on This Particular Monday.

00:17:46: The

00:17:47: math-duals thing gives me Something close to hope.

00:17:49: I think the idea That the question changes from when Do you reach human level To what new problems can You help us see?

00:17:58: That feels like a better version of What this could be.

00:18:01: Yeah!That one i'll hold onto Alright...that

00:18:04: synthesizer daily for Monday May Fourth.

00:18:07: Thanks for being here, genuinely.

00:18:09: If this episode made you think about something differently please share it with a friend not a robot-friend...a human one.

00:18:16: We'll see you again

00:19:23: tomorrow.

Show notes

Show transcript

New comment