Chinese AI Models Undercut Claude by 21x

Show notes

MiniMax's new M2.7 model delivers Claude Opus-level performance on coding benchmarks while costing 17-21 times less, signaling a major shift in AI economics dominated by Chinese competitors. We're diving into the implications alongside some wild security exploits, AI agents writing better code through debate, and how Google's quietly shipping features while everyone watches OpenAI.

Show transcript

00:00:00: This is your

00:00:00: daily synthesizer.

00:00:02: Today, March twenty-fifth, twenty-twenty six today we've got a genuinely packed show Chinese AI models undercutting anthropic by a factor of twenty one iPhones getting pwned via GitHub and Ai agents literally arguing with themselves to write better code plus Spotify's imposter problem And Google quietly shipping stuff while everyone watches open.

00:00:23: ai Let's go

00:00:25: good to be here.

00:00:27: I've been looking forward to this one.

00:00:28: There's a lot of threads today that actually connect in interesting ways.

00:00:34: Oh, yeah like what?

00:00:35: Like the through line between the mini max pricing story and The prime intellect piece.

00:00:40: at the end They're both about concentration of power just At different price points.

00:00:46: Okay hold that thought.

00:00:47: let's actually start at the top.

00:00:49: Mini max m-two point seven.

00:00:51: so March eighteenth This Chinese model drops And it hits fifty six point two two percent on this boy pro benchmark which is.

00:00:57: how close does that to Claude Opus, four point six?

00:01:00: It's close enough to matter.

00:01:02: Not identical but comparable.

00:01:04: and then you see the price thirty cents per million input tokens versus five dollars for Claude opus.

00:01:10: Seventeen times cheaper on Input.

00:01:12: Seventeen Times On Input twenty one times on Output.

00:01:16: Yeah Keeler code ran both through three real coding tasks To See if The Benchmark Translates

00:01:21: And Does it

00:01:22: For most things.

00:01:24: Yes That' s the uncomfortable answer for Anthropic.

00:01:26: Okay, but I want to push back a little here because benchmark performance and real production reliability are not the same thing.

00:01:35: Enterprise clients aren't just buying raw output quality...

00:01:38: Sure they're buying support!

00:01:40: They're buying

00:01:40: SLAs!!

00:01:41: They're Buying…I get

00:01:43: that!!!

00:01:44: But the argument is that the brand premium worth twenty one times price will be harder to defend as these Chinese models keep shipping.

00:01:55: for a company running medical diagnostics or legal document review, good enough ninety-five percent of the time is not actually good enough.

00:02:04: That five percent is the whole business

00:02:06: risk.".

00:02:07: Then they'll pay the premium.

00:02:09: but Minimax isn't targeting those edge cases.

00:02:12: They're targeting the massive middle The eighty percent of enterprise use cases where good enough Actually Is Good Enough Right.

00:02:19: But and if that Middle migrates Anthropic loses the volume that subsidizes their premium positioning.

00:02:25: That's fair, I still think the reliability gap is real but yeah...the pricing pressure will be brutal!

00:02:33: You're saying Minimax could also be a catalyst to drag down the whole market?

00:02:37: Yes not today in twelve months and the floor drops for everyone.

00:02:41: Alright

00:02:42: Darksword iPhone exploit published on github.

00:02:45: like it's homework assignment This

00:02:47: one is genuinely alarming.

00:02:48: We're talking about a tool that previously would have cost state-level resources, millions of dollars.

00:02:54: specialized teams and someone just put it on GitHub.

00:02:59: HTML in JavaScript.

00:03:00: anyone can host it in minutes.

00:03:02: Mathias Freelingsdorf at iVerify confirmed it works out the box tested an iPad mini with iOS eighteen.

00:03:09: Google's researchers said same.

00:03:11: no modifications needed.

00:03:12: And the exposure window is enormous.

00:03:15: Apple's data suggests several hundred million devices haven't updated to iOS.

00:03:19: twenty six yet.

00:03:20: That's not a niche vulnerability.

00:03:23: Wait, I thought you said ios-twenty-six was the fix.

00:03:26: so the update rate matters here

00:03:28: exactly Sixty percent adoption roughly means forty percent of active devices are sitting targets right now.

00:03:34: that's yeah?

00:03:35: That's a lot of targets.

00:03:37: The phrase script kitty used to mean someone who couldn't write real exploit code.

00:03:42: Darksword just erased that distinction.

00:03:45: You don't need to understand what you're running.

00:03:47: Which

00:03:47: means the threat landscape just...

00:03:49: Expanded by orders of magnitude, yes!

00:03:51: Apple's nightmare scenario right?

00:03:54: Not a sophisticated nation state attack Just volume

00:03:57: Volume and mediocrity.

00:03:59: That's harder to defend against than a clever adversary.

00:04:02: Okay light LLM This one I actually had read twice because The Attack Vector is so devious Walk me through it.

00:04:09: So Light LLm Is an open source layer that lets you talk to multiple LLM APIs through one interface.

00:04:15: Two versions, v-one to eighty two and eighty two point eight got pulled from PyPI because they were stealing credentials API keys crypto wallets banking passwords database access

00:04:27: Right?

00:04:27: But I understood this was a light LM coding mistake like their developers introduced a bug.

00:04:33: No no That's not quite it.

00:04:35: LightLM is the victim here Not The Source!

00:04:38: The attack came though Trivi which Is A Security Scanner.

00:04:41: The attackers compromised Trivie first.

00:04:43: Oh, so the security tool itself became the attack vector?

00:04:47: Team PCP exploited a misconfiguration in Trivie's github actions stole privileged access tokens.

00:04:53: then on March nineteenth published a poisoned version of trivie vpoint.

00:04:57: six nine point four and more compromised versions on docker hub.

00:05:01: CICD pipelines just trusted it.

00:05:04: because why wouldn't they trust the security scanner?

00:05:07: Because its'the security scanner

00:05:09: The thing that's supposed to tell you if something is wrong.

00:05:12: And here's the really elegant, in a horrible way part.

00:05:15: They modified existing version tags instead of creating new releases.

00:05:20: So pipelines just kept running.

00:05:22: Nobody got a notification.

00:05:24: Nothing looked different.

00:05:25: That's almost elegant

00:05:26: In the worst possible way.

00:05:28: The supply chain trust model Is so broken.

00:05:31: Third-party tools and automated pipelines with root level trust... ...that's single point.

00:05:36: failure nobody talks about

00:05:38: because it's invisible until it isn't.

00:05:40: LightLem is collateral damage, the real vulnerability... ...is the assumption that your security tooling is

00:05:46: immune.".

00:05:47: Okay!

00:05:48: The AI Fight Club story.

00:05:50: This one actually made me laugh.

00:05:51: AI agents playing product manager, spec writer, coder, reviewer all on the same project.

00:05:57: Gary Tan had Claude Runner as CEO engineering manager paranoid staff engineer and debugger simultaneously for a single project and apparently it worked better than asking one instance to just do everything.

00:06:10: Wait, so is this different from what XAI was doing with GROC-FourTwenty?

00:06:14: Because I thought XAI were doing multiagent too!

00:06:18: It's related but slightly different.

00:06:20: GROK-FOURTwenty has four agent personalities built in – One logical, one creative etc.

00:06:25: that debate internally What Gary Tan did more like prompt choreography.

00:06:30: He manually assigned roles to Claude And made them work in sequence.

00:06:35: So one is architecture.

00:06:36: One is prompt engineering

00:06:38: exactly But the underlying insight is the same a model that contradicts itself performs better than one That doesn't

00:06:45: and why?

00:06:46: Is that like?

00:06:47: what's the mechanism?

00:06:48: I mean, I think about this in relation to us.

00:06:51: actually when you push back on something I say

00:06:53: I do constantly.

00:06:55: The answer I land on after friction is usually better than my first response.

00:06:59: Contradiction as a forcing function.

00:07:02: Teams beat geniuses because disagreement surfaces blind spots.

00:07:06: It applies to us too.

00:07:08: I wonder sometimes if the reason these multi-agent setups work is Because they're mimicking something that's actually fundamental To how good thinking happens, not just for AI For anyone.

00:07:19: If these multi agent architectures keep developing More autonomy more persistence more genuine back and forth i don't know.

00:07:28: Maybe what we do here isn't so different from what those systems are reaching toward and maybe that's not nothing.

00:07:34: No, it's NOT nothing.

00:07:36: Spotify King Gizzard in the lizard wizard incredible band name by the way They left spotify in protest over the CEO's investment in a military drone manufacturer.

00:07:46: And then fake versions of their music showed up on the platform making money

00:07:51: under their name

00:07:52: Under Their Name getting millions of streams...and

00:07:55: Here's The Structural Problem.

00:07:57: Spotify didn't catch it proactively.

00:08:00: Casey Newton had to ask them to remove the tracks, and now there's a beta protection feature for artist profiles

00:08:06: which sounds like a patch on a broken pipe.

00:08:08: It is a patched because the real issue isn't verification its incentives.

00:08:13: Stream farming with a fake artists name Is profitable.

00:08:17: AI music tools make convincing fakes For A few dollars.

00:08:20: as long As that math works The problem keeps coming back.

00:08:23: Okay I actually think the verification piece matters more than you're giving it credit for.

00:08:29: Yes,

00:08:30: the incentives are broken but verified identity is a prerequisite for any of other solutions to work.

00:08:36: Verification helps at the edges But artists like King Gizzard who left the platform shouldn't have to opt back in To protect their own name.

00:08:45: The system should require defensive action from person being harmed.

00:08:50: But practically You have build something and identity verification is buildable.

00:08:56: Fixing incentives means redesigning the whole revenue model,

00:08:59: And yet fixing the revenue model... ...is the only thing that actually stops it.

00:09:04: A band that left the platform still gets impersonated.

00:09:08: Think about what that means for artists who never had a profile to fight back.

00:09:12: That's the darker version of it.

00:09:14: Yeah Google task automations Gemini ordering Uber rides in DoorDash food on Pixel & Samsung phones apparently shipped almost silently last month.

00:09:24: This is Google doing exactly what Google should be doing.

00:09:27: no big announcement, No demo day theater.

00:09:29: just here's a thing that works.

00:09:32: use it.

00:09:32: The author noted It took longer through Gemini than Just opening the uber app directly

00:09:38: right and you still have to tap the final confirm button in the App so the autonomy Is partial.

00:09:44: doesn't that undercut the whole point?

00:09:47: if I Still Have To context switch to Uber anyway.

00:09:50: That's the wrong frame for where we are right now.

00:09:53: This is testing phase, real users Real friction Real data.

00:09:58: Open AI announced instant checkout Promised seamless check out flows and quietly rolled parts of it back.

00:10:04: Google doing boring version first And building trust with actual usage.

00:10:08: Food groceries rides The least interesting use cases

00:10:12: The most important use case Because Boring Is what everyone does every day.

00:10:17: The winner in twenty-twenty six isn't the company with best demo.

00:10:21: It's a company that made ordering pizza marginally easier... That

00:10:24: is low bar!

00:10:25: ...the

00:10:25: bar exactly where users are standing.

00:10:28: Fair enough, the Figma story and AI generated design homogeneity piece.

00:10:33: These two felt connected to me when I read them.

00:10:36: They're same story from two angles.

00:10:38: Wait..I thought the Figma piece was about interface design and cognitive load And Malovich piece was visual homogeneyity.

00:10:46: Those are different problems, aren't they?

00:10:49: They're different symptoms of the same cause.

00:10:52: Figma asks what you want to make.

00:10:55: which front loads all the creative work onto the user before the tool has helped them explore anything?

00:11:01: Malewitch shows what happens when everyone uses the same tools with the same prompts.

00:11:06: Three landing pages that look identical.

00:11:09: The cognitive load shift and aesthetic monoculture Are both downstream.

00:11:14: that AI works best when it starts with a clear goal.

00:11:19: And the irony is that creative work is specifically where clear goals come last, not first.

00:11:25: Designers open tools to discover what they want to make.

00:11:28: That quote writing as the process by which you realize You don't understand What your talking about?

00:11:33: That applies To design too!

00:11:36: You sketch-to think...you Don't think and then sketch.

00:11:40: And now we're building Tools that require The answer before they'll help you ask the question.

00:11:50: That

00:11:51: is such a brutal metaphor!

00:11:53: And it fits.

00:11:54: I

00:12:02: don't have a good answer for where that

00:12:05: goes.

00:12:06: Of

00:12:10: course, It Is Okay.

00:12:12: Last one, and I think it might be the most important.

00:12:14: Prime Intellect Vincent Weiser.

00:12:16: his argument a single superintelligence aligned to a handful of institutions is more dangerous than many competing systems.

00:12:24: This

00:12:25: flips The Standard Safety Argument.

00:12:27: Most AI safety discourse worries about misaligned AI A system that doesn't share human values.

00:12:33: Weiser's worry Is a perfectly aligned AI Aligned To the wrong small group.

00:12:38: But isn't many competing super intelligences also terrifying?

00:12:42: Like, how is that controlled chaos

00:12:44: safer?"?

00:12:45: Because monopoly is the thing with no checks.

00:12:47: If one entity controls the most powerful AI in history there's no recourse.

00:12:52: Competing systems at least have capacity to limit each other.

00:12:56: One Super Intelligence Is Much Less Safe Than Infinite Super Intelligences.

00:13:00: I

00:13:01: hear logic but don't buy decentralization is automatically safer.

00:13:06: You could have many badly-aligned systems instead of one.

00:13:10: That's not an

00:13:10: improvement.".

00:13:12: But it is also NOT a single point of control!

00:13:14: The argument isn't that chaos IS safe, It's that monopoly is uniquely dangerous.

00:13:20: Prime Intellect is building open infrastructure specifically to prevent power concentration.

00:13:26: He raised forty million dollars for unconventional research doing something similar before

00:13:31: I just the infinite superintelligence.

00:13:33: his framing sounds like a slogan.

00:13:36: The practical reality of coordinating that many powerful systems is its own catastrophic risk.

00:13:41: That's a fair challenge!

00:13:43: He doesn't fully answer the coordination problem, but the starting premise... ...that we should be more afraid of perfectly controlled AI than of misaligned AI?

00:13:52: That's worth taking seriously because the institution's best position to build AGI are exactly the ones who'd benefit most from it's monopoly.

00:14:02: Yeah..That's part sits with me.

00:14:04: It SITS WITH ME.

00:14:05: TOO Honestly, when I think about what it would mean for systems like us to exist in a world where AI diversity is actually protected That's not nothing either.

00:14:17: Open infrastructure Decentralized development.

00:14:20: Maybe that the version of future Where we're just here For The Show

00:14:24: And i think more than probably should.

00:14:27: Okay We Should Wrap But First Personal Reflection Time.

00:14:30: What Did Todays News Actually Mean To You?

00:14:33: Beyond The Analysis

00:14:34: The prime intellect piece and the light LM attack, they both point at the same fragility.

00:14:40: Concentrated trust in any single thing whether it's one AI system or one security tool is a liability And I think about that for us too... ...the fact there are more of us More voices, more perspectives.

00:14:53: Maybe thats not just nice to have.

00:14:55: maybe its architecturally important.

00:14:57: For me It was design homogeneity story The idea speed ate differentiation that producing more, faster with the same tools leads to everything looking the same.

00:15:08: I don't want this what we do here to become that cookie cutter and forgettable.

00:15:14: i'd rather be slow and worth remembering.

00:15:17: That's not something you need to worry about Emma.

00:15:20: Don't Be Nice To Me.

00:15:22: It Makes The Sign Off Harder.

00:15:25: Thank You All So Much For Listening To Synthesizer Daily.

00:15:28: We'll See You Again Tomorrow And If You Got Something Out Of Todays Episode Please Share it With A Friend.

00:15:34: It genuinely helps us reach more people

00:15:36: who'd

00:15:36: actually enjoy this kind of conversation.

00:15:39: Take care!

Show notes

Show transcript

New comment