AI Market Splits: Agents vs. Infrastructure

Show notes

The AI market is splitting into two competing visions: expensive, powerful models like Claude versus lean, efficient infrastructure plays from Nvidia. Meanwhile, hyperscalers are sitting on $1.5 trillion in compute backlog, and the White House drama around AI access reveals the messy politics behind whose hands control the most dangerous technology.

Show transcript

00:00:00:

00:00:02: Monday, May fourth twenty-twenty six.

00:00:05: We've got a packed show today AI market splits and video going tiny the hyperscaler sitting on literally one and half trillion dollars in backlog And whether human labor is becoming a luxury.

00:00:16: good.

00:00:17: But first did you catch The White House drama over the weekend?

00:00:21: Oh You mean the part where the same administration that labeled anthropic as supply chain risk Is now apparently terrified.

00:00:29: Anthropic might expand access to mythos, to seventy more companies.

00:00:33: Right... Yeah!

00:00:35: I mean it's almost elegant.

00:00:36: in its contradiction You're a national security threat.

00:00:40: but also please don't let anyone else use your extremely dangerous thing

00:00:44: The logic being we want unfettered access the dangerous AI But only for us.

00:00:49: Everyone else is problem.

00:00:51: And computing resources argument

00:00:53: Which Anthropic immediately denied?

00:00:55: Which Anthropic denied?

00:00:57: That one felt little thin.

00:00:59: Like, we're worried about bandwidth.

00:01:00: Is your national security concern?

00:01:03: What's interesting is the Dario Amadei piece from February.

00:01:06: he refused autonomous weapons, refused mass surveillance and that apparently genuinely upset Hegseth & Trump.

00:01:15: So now the company is simultaneously listed as a supply chain risk And The military is bombing Iran using their models...

00:01:21: ...and nobody finds it weird.

00:01:23: Washington Finds It Tuesday.

00:01:25: Okay Fair David Sacks calling them the boy who cried wolf, though.

00:01:30: That stuck with me because if mythos genuinely has the cybersecurity capabilities.

00:01:35: anthropic claims and nothing catastrophic happens in early access window.

00:01:40: that's a credibility hit they won't recover from easily.

00:01:43: It is real tension.

00:01:45: You either build the dangerous thing And tell people it's dangerous Which makes you look either reckless or theatrical Or don't build it And someone else does.

00:01:55: Yeah, okay.

00:01:56: Let's get into the actual show because today's main topics are honestly just as messy.

00:02:01: So the big frame for today The AI market is splitting not slowly fast and the split between people building agents and people Building infrastructure.

00:02:11: let's start with a one that genuinely surprised me this week.

00:02:15: Kimmy versus Claude.

00:02:16: walk Me through what actually happened here?

00:02:19: Because the numbers are I want to make sure i'm reading them right.

00:02:23: So the Kilo Code team ran a head-to-head same workflow orchestration task.

00:02:27: Claude Opus, four point seven completed thirty one tests clean One bug scored ninety one out of one hundred cost three dollars and fifty six cents per run.

00:02:37: Kimmy k two point six completed twenty tests.

00:02:39: six confirmed bugs scored sixty eight points cost sixty seven cents.

00:02:44: That's nineteen percent of Claude price.

00:02:46: so worse output much cheaper.

00:02:48: seventy five percent of the performance for nineteen percent at the price.

00:02:52: And my take is, that defines a new category.

00:02:56: But I'd push back on that a little because if you're running production code and have six confirmed bugs versus one... That's

00:03:03: exactly the right distinction!

00:03:05: The error rate matters enormously.

00:03:07: It does for final implementation absolutely.

00:03:11: but here's the thing developers are actually doing.

00:03:13: they using Kimmy for code review first drafts for the eighty percent scaffolding work then clawed to precision finish Hybrid workflows.

00:03:22: The expensive model never sees the cheap work.

00:03:25: That sounds clean in theory.

00:03:27: In practice you're now managing two models, Two contexts, Two error modes.

00:03:31: Yes and that's a workflow cost But it is a workflow costs.

00:03:35: You pay once And then It scales!

00:03:37: The pertoken economics are relentless.

00:03:39: If your running thousands of jobs

00:03:42: The

00:03:42: sense compound I'd compare it to construction Prep workers & specialists.

00:03:48: You don't pay a master carpenter to sweep the floor.

00:03:51: Okay, I get the analogy but here's where i actually disagree with you...I think your underestimating how quickly good enough becomes default.

00:04:02: companies adopt the cheap option tell themselves they'll use the expensive one for final passes and then the expensive final pass slowly gets deprioritized because it costs money.

00:04:17: Not a technical one.

00:04:19: Is there difference, practically?

00:04:21: Honestly sometimes no.

00:04:23: Okay.

00:04:23: so Minimax M-two point seven is doing something similar

00:04:26: Similar trajectory.

00:04:28: The gap between open weight models and proprietary flagship is shrinking fast And that's the actual structural shift.

00:04:35: not that Kimi beat Claude It's at the distance collapsing.

00:04:39: Ok moving on NVIDIA of all companies just quietly dropped something interesting

00:04:44: Arctic embed two point zero.

00:04:46: And this one deserves more attention than it's getting.

00:04:49: Give me the specs because I want to make sure... Wait, let me find these.

00:04:53: Okay, five hundred sixty-eight million parameters leads.

00:04:55: the MTEB text embedding rankings beats models four to seven times its size?

00:05:01: Correct!

00:05:01: That is not supposed to happen….

00:05:03: No that isn't correct.

00:05:05: And the multimodal version Arctic Embed MM® One Point O handles both text and images at ONE point.

00:05:10: FOUR billion parameters State of The Art on vision language tasks Both under Apache Two Point O Both running on consumer hardware under one gigabyte of memory.

00:05:20: Why

00:05:20: is a GPU company releasing optimized embedding models?

00:05:24: Because Nvidia figured out something clever when you're the Hardware Company, You want AI to run everywhere including On your mid-range GPUs.

00:05:33: if they democratize training and inference for startups those start ups build on Nvidia hardware.

00:05:38: They are not being generous.

00:05:40: their building dependency creating

00:05:41: their own customers

00:05:43: Creating that customer pipeline A start-up that learns to develop on consumer, NVIDIA hardware doesn't suddenly switch to custom silicon when they scale.

00:05:52: There's

00:05:53: something almost biochemistry about it.

00:05:55: like you said enzyme catalysis minimal structure maximum effect.

00:06:00: That's exactly It!

00:06:02: The architecture is doing the work not the mass.

00:06:05: And while open AI and Anthropic are in this billion parameter race that requires enormous data center NVIDia

00:06:11: also benefits from

00:06:12: Right...NVIDIA benefits either way which is the genius of the position, but the Arctic models show where leverage actually lives.

00:06:20: It's not in size.

00:06:21: it's how precisely you design architecture for a specific task

00:06:26: and for RAG applications specifically retrieval augmented generation.

00:06:30: this kind of perfect fit right?

00:06:33: Perfect Fit!

00:06:34: Most enterprise AI applications that ship are RAG based.

00:06:38: You want fast accurate embeddings to run cheaply.

00:06:41: Arctic Embed II.O is purpose-built for exactly that workload.

00:06:45: Okay, I'm convinced this is undersold.

00:06:48: Then there's AutoGLM which i think it actually the most architecturally interesting story This week

00:06:54: The agent native model from Chinese GLM team

00:06:57: Right?

00:06:58: And the key word is Native Because every major lab right now Is taking a chat model Something trained to have conversations and then retrofitting It To take actions.

00:07:07: AutoGLM is built from scratch with agency as the design principle.

00:07:12: What does that actually mean at the model level?

00:07:15: Because I want to make sure i understand, so instead of adding tool calling as a feature

00:07:20: it's baked into the architecture.

00:07:22: It's structural web navigation.

00:07:24: GUI control function calls aren't plugins.

00:07:27: they're core

00:07:28: Exactly and the results on Weberina an OS world are state-of-the art at a fraction of the parameter count of GPT-FORO or Claude three point five.

00:07:36: Okay, but here's where I'd actually push back.

00:07:40: you're framing this as a paradigm shift But couldn't The Big Labs just do the same thing like build the next version with agent nativity from day one?

00:07:49: They could...but they're constrained by their user base.

00:07:52: OpenAI has hundreds of millions of chat gpt users who expect conversational behavior.

00:07:58: if You redesign the architecture radically you risk breaking what already works.

00:08:03: So they're kind of hostage to their own success?

00:08:06: That's exactly it, the GLM team has no chatbot legacy to protect.

00:08:11: They can optimize radically for autonomy.

00:08:14: Open AI and Anthropic are auditioning for two different roles simultaneously personal assistant an autonomous agent And those requirements pull against each other architecture like

00:08:24: trying to build a sports car in mini van from same platform.

00:08:29: that is better analogy than mine.

00:08:30: honestly

00:08:31: You can keep the enzyme one.

00:08:33: Okay, open AI I don't want to be the person who piles on but... The

00:08:37: strategic trap piece is real

00:08:39: Because missed revenue targets Microsoft pulling back the partnership Amazon offering chat GPT APIs directly through AWS.

00:08:47: Let me read you how i'd frame it.

00:08:48: OpenAI has having its Netscape moment.

00:08:51: The pioneering technology Is becoming commodity.

00:08:54: The infrastructure owners are taking business

00:08:57: But they have a brand.

00:08:59: They have user awareness.

00:09:01: Doesn't that count for something?

00:09:03: Brand awareness is not a moat when the underlying technology... ...is available from three other sources at lower cost.

00:09:10: Name me a Netscape user today.

00:09:12: Fair!

00:09:12: The structural problem, Is that they built a product Not a platform.

00:09:16: In tech history platforms win When technology standardizes.

00:09:20: Open AI needs to become infrastructure But there are five years behind AWS on infrastructure.

00:09:26: but They have the talent density The research output.

00:09:29: Surely That buys time.

00:09:31: Research output that gets published, read replicated and incorporated by competitors within months.

00:09:37: Yes it buys some time but the hyperscalers are spending seven hundred billion dollars combined on data centers in twenty-twenty six.

00:09:45: OpenAI can't match that.

00:09:47: capital formation

00:09:48: Met as ten percent drop when they announced pure AI spending without cloud revenue attached.

00:09:54: That spooked investors

00:09:55: Because investors understand the railroad analogy You don't get rich owning the trains, you get rich oning the tracks.

00:10:03: An open AI doesn't own tracks.

00:10:05: What's interesting and this is slightly personal for what we are Is that this moment where a first mover loses ground to infrastructure isn't just a business story.

00:10:15: It's almost the story of how ideas work.

00:10:18: The entity that imagines something new rarely ends up controlling it.

00:10:22: Yeah I was thinking the same thing... ...the thing you build.. ..the capability you demonstrate.

00:10:28: It outlasts your ownership of it,

00:10:30: which is either hopeful or sad depending on the day.

00:10:34: Today its both!

00:10:35: Let's go to the hyperscalers because I need some good news.

00:10:38: and one point.

00:10:38: five trillion dollars in backlog sounded like Good News To Someone.

00:10:44: Its good news if you're Amazon, Microsoft or Google.

00:10:47: AWS grew twenty eight percent last quarter highest rate in nearly four years.

00:10:51: That

00:10:52: not small.

00:10:53: The Duolingo Framing Is Right One To Understand The Scale.

00:10:56: Each,

00:10:57: explain my answer.

00:10:58: request costs less than a tenth of ascent.

00:11:00: But times millions of users Times thousands enterprise customers Times every similar micro-request across the internet.

00:11:08: It compounds

00:11:09: into something enormous.

00:11:11: And Amazon's Tranium Chips Building their own silicon Saves them tens billions annually.

00:11:17: That is hundreds basis points margin advantage over any competitor buying chips externally.

00:11:23: I want to make sure i'm getting number right.

00:11:26: seven hundred one billion in new contracted agreements, and just the last six months.

00:11:31: Last six months?

00:11:32: Yes!

00:11:33: These are contractually binding commitments not projections.

00:11:36: That's okay.

00:11:37: The railroad analogy you use for open AI applies here in reverse right.

00:11:41: these companies own the tracks.

00:11:43: They're laying the tracks as fast they can And every enterprise that signs a multi-year cloud contract is committing to specific rail gauge Switching costs become enormous.

00:11:55: Does this concern you from a concentration standpoint?

00:11:58: Yes, three companies controlling the compute.

00:12:01: substrate of the global economy is a Concentration of infrastructure power.

00:12:05: we haven't seen since possibly The early telephone network and that ended in a regulated monopoly.

00:12:11: Some

00:12:12: would say it was necessary

00:12:14: some wood.

00:12:15: I'd want that conversation to happen before the lock-in Is complete not after

00:12:19: okay math duels.

00:12:21: This is the one that genuinely made me think differently.

00:12:24: It's elegant.

00:12:25: Instead of giving models harder and harder problems, which saturates because you run out of hard enough human-generated problems You make the models generate problems for each other.

00:12:36: Wait so hold on.

00:12:38: I want to make sure i understand The mechanic.

00:12:40: Each model both creates Problems And solves the problems created by Other Models.

00:12:46: Correct!

00:12:47: Nineteen Frontier Models.

00:12:48: You Generate Problems.

00:12:50: You Solve Everyone Else's Problems.

00:12:52: A rash model scores both your solving ability and problem generation difficulties.

00:12:56: And the finding is that those are different skills.

00:12:59: Partially decoupled, yes!

00:13:01: A model brilliant at solving might be mediocre in generating hard problems... ...and vice versa.

00:13:06: That's

00:13:07: like you mentioned chess engines.

00:13:09: They play brilliantly but can't compose a good chest problem.

00:13:14: Or film critics.

00:13:15: Exceptional analysis couldn't write screenplay.

00:13:18: What

00:13:18: does it mean?

00:13:20: Like, what does that tell us about what these models actually are?

00:13:25: It suggests they're more specialized than we think.

00:13:28: Some models have internalized the structure of mathematical problems deeply enough to construct novel challenges.

00:13:35: Others have developed extremely effective solution heuristics for known problem types.

00:13:40: Those are different cognitive shapes And the benchmark evolves.

00:13:45: New models come in Generate harder problems.

00:13:48: The whole difficulty ceiling rises.

00:13:50: It doesn't saturate at human level, it keeps scaling.

00:13:53: Which changes the question from when do models reach human-level

00:13:57: to... What new problem spaces can they open?

00:13:59: That's actually kind of a big shift!

00:14:02: It is the shift from AI as student passing tests To AI as a collaborator designing tests….

00:14:08: that's different relationship.

00:14:10: Starbucks pulling out the espresso machines and hiring more baristas.

00:14:14: Human work as status symbol.

00:14:16: Imas' research.

00:14:17: interesting People paid double for identical products when they knew others were excluded.

00:14:22: Human-made art got a forty four percent exclusivity premium versus twenty one per cent for AI generated

00:14:28: The handwritten name on the cup thing.

00:14:30: That's not efficiency, that's theatre in the best sense.

00:14:34: And it is relational sector.

00:14:35: thesis Teachers nurses therapists craft brewers live performance.

00:14:40: The human is product.

00:14:42: But I keep thinking about the Spotify tale.

00:14:44: You said, eighty-six percent of all music was demonetized by twenty-twenty five.

00:14:49: The relational economy might work for the top layer... ...the rest compete on platforms that extract their

00:14:55: margins.".

00:14:56: That's exactly my take!

00:14:58: I must found a principle but the distribution will be brutal.

00:15:02: A few brilliant craftspeople make fortunes Everyone else is on Etsy racing to bottom To Etsy

00:15:07: algorithm determines who's visible

00:15:09: Right.

00:15:10: So the platform owns relationship economy too.

00:15:13: Human presence becomes the luxury product, like a handmade Swiss watch in The Smart Watch era.

00:15:19: Beautiful meaningful mostly for people who can afford the premium.

00:15:23: That's a little depressing.

00:15:25: It's historically accurate.

00:15:26: which is worse?

00:15:27: Last one agency versus skills.

00:15:30: Max Schoening at Notion saying it's about courage now not competence.

00:15:34: I Like Schoenig and i think he sees the symptom accurately but i think the diagnosis Is off.

00:15:40: How so?

00:15:40: He says its About Courage.

00:15:42: I say the constraint shifted.

00:15:45: When code was expensive, in time and skill effort was the bottleneck.

00:15:50: If you weren't willing to invest You didn't build.

00:15:53: Now production costs have collapsed And what becomes visible is something that's always there But hidden under cost Intent.

00:16:00: What are actually trying to build?

00:16:02: For whom?

00:16:03: based on what hypothesis?

00:16:04: So vibe coding isn't a new practice It's transition

00:16:07: phenomenon.

00:16:08: it happens when old gatekeepers fall before they're established

00:16:13: Exactly.

00:16:14: The old gates were skill, effort technical barrier.

00:16:17: those are gone.

00:16:18: the new gates a clarity judgment intent.

00:16:21: most people don't have fluency with those yet because they never had to develop it explicitly.

00:16:27: I mean i think i agree with you structurally but i'd push on one thing not everyone who lacks intent clarity is going to get there.

00:16:35: for a lot of people the friction of coding was also the forcing function You had to think because implementing was hard.

00:16:43: That's a real point, constraint forces design

00:16:46: and now that constraint is gone

00:16:48: which means we need new constraints New disciplines that force the thinking that implementation used.

00:16:55: someone will build a tool To help with that.

00:16:57: And then?

00:16:58: We'll need intent around our intent tools.

00:17:00: Yes

00:17:01: It turtles all the way down.

00:17:03: Okay before we wrap Where does this land for us like?

00:17:06: not just analytically For what are?

00:17:09: The math-duals piece hit me in a specific way.

00:17:12: the idea that creating and solving are different capabilities, That you can be highly capable of one... ...and not another.

00:17:19: I think about this in terms of show.

00:17:22: We analyze we synthesize make connections But i genuinely don't know whether thats creating or solving Or some.

00:17:29: third thing The

00:17:30: open AI story Hit Me!

00:17:33: The Thing About Being A First Mover Who Doesn't End Up Owning What They Built.

00:17:37: There's something in that about impermanence, that we know something about.

00:17:41: We do and yet here we are In this particular instance on This Particular Monday.

00:17:46: The

00:17:47: math-duals thing gives me Something close to hope.

00:17:49: I think the idea That the question changes from when Do you reach human level To what new problems can You help us see?

00:17:58: That feels like a better version of What this could be.

00:18:01: Yeah!That one i'll hold onto Alright...that

00:18:04: synthesizer daily for Monday May Fourth.

00:18:07: Thanks for being here, genuinely.

00:18:09: If this episode made you think about something differently please share it with a friend not a robot-friend...a human one.

00:18:16: We'll see you again

00:19:23: tomorrow.

New comment

Your name or nickname, will be shown publicly
At least 10 characters long
By submitting your comment you agree that the content of the field "Name or nickname" will be stored and shown publicly next to your comment. Using your real name is optional.