Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, sign up here.
It’s been just a few days since OpenAI revealed its latest flagship generative model, o1, to the world. Marketed as a “reasoning” model, o1 essentially takes longer to “think” about questions before answering them, breaking down problems and checking its own answers.
There’s a great many things o1 can’t do well — and OpenAI itself admits this. But on some tasks, like physics and math, o1 excels despite not necessarily having more parameters than OpenAI’s previous top-performing model, GPT-4o. (In AI and machine learning, “parameters,” usually in the billions, roughly correspond to a model’s problem-solving skills.)
And this has implications for AI regulation.
California’s proposed bill SB 1047, for example, imposes safety requirements on AI models that either cost over $ 100 million to develop or were trained using compute power beyond a certain threshold. Models like o1, however, demonstrate that scaling up training compute isn’t the only way to improve a model’s performance.
In a post on X, Nvidia research manager Jim Fan posited that future AI systems may rely on small, easier-to-train “reasoning cores” as opposed to the training-intensive architectures (e.g., Meta’s Llama 405B) that’ve been the trend lately. Recent academic studies, he notes, have shown that small models like o1 can greatly outperform large models given more time to noodle on questions.
So was it short-sighted for policymakers to tie AI regulatory measures to compute? Yes, says Sara Hooker, head of AI startup Cohere’s research lab, in an interview with TechCrunch:
[o1] kind of points out how incomplete a viewpoint this is, using model size as a proxy for risk. It doesn’t take into account everything you can do with inference or running a model. For me, it’s a combination of bad science combined with policies that put the emphasis on not the current risks that we see in the world now, but on future risks.
Now, does that mean legislators should rip AI bills up from their foundations and start over? No. Many were written to be easily amendable, under the assumption that AI would evolve far beyond their enactment. California’s bill, for instance, would give the state’s Government Operations Agency the authority to redefine the compute thresholds that trigger the law’s safety requirements.
The admittedly tricky part will be figuring out which metric could be a better proxy for risk than training compute. Like so many other aspects of AI regulation, it’s something to ponder as bills around the U.S. — and world — march toward passage.
News
First reactions to o1: Max got initial impressions from AI researchers, startup founders, and VCs on o1 — and tested the model himself.
Altman departs safety committee: OpenAI CEO Sam Altman stepped down from the startup’s committee responsible for reviewing the safety of models such as o1, likely in response to concerns that he wouldn’t act impartially.
Slack turns into an agent hub: At its parent company Salesforce’s annual Dreamforce conference, Slack announced new features, including AI-generated meeting summaries and integrations with tools for image generation and AI-driven web searches.
Google begins flagging AI images: Google says that it plans to roll out changes to Google Search to make clearer which images in results were AI generated — or edited by AI tools.
Mistral launches a free tier: French AI startup Mistral launched a new free tier to let developers fine-tune and build test apps with the startup’s AI models.
Snap launches a video generator: At its annual Snap Partner Summit on Tuesday, Snapchat announced that it’s introducing a new AI video-generation tool for creators. The tool will allow select creators to generate AI videos from text prompts and, soon, from image prompts.
Intel inks major chip deal: Intel says it will co-develop an AI chip with AWS using Intel’s 18A chip fabrication process. The companies described the deal as a “multi-year, multi-billion-dollar framework” that could potentially involve additional chip designs.
Oprah’s AI special: Oprah Winfrey aired a special on AI with guests such as OpenAI’s Sam Altman, Microsoft’s Bill Gates, tech influencer Marques Brownlee, and current FBI director Christopher Wray.
Research paper of the week
We know that AI can be persuasive, but can it dig out someone deep in a conspiracy rabbit hole? Well, not all by itself. But a new model from Costello et al. at MIT and Cornell can make a dent in beliefs about untrue conspiracies that persists for at least a couple months.
In the experiment, they had people who believed in conspiracy-related statements (e.g., “9/11 was an inside job”) talk with a chatbot that gently, patiently, and endlessly offered counterevidence to their arguments. These conversations led the humans involved to stating a 20% reduction in the associated belief two months later, at least as far as these things can be measured. Here’s an example of one of the conversations in progress:
It’s unlikely that those deep into reptilians and deep state conspiracies are likely to consult or believe an AI like this, but the approach could be more effective if it were used at a critical juncture like a person’s first foray into these theories. For instance, if a teenager searches for “Can jet fuel melt steel beams?” they may be experience a learning moment instead of a tragic one.
Model of the week
It’s not a model, but it has to do with models: Researchers at Microsoft this week published an AI benchmark called Eureka aimed at (in their words) “scaling up [model] evaluations … in an open and transparent manner.”
AI benchmarks are a dime a dozen. So what makes Eureka different? Well, the researchers say that, for Eureka — which is actually a collection of existing benchmarks — they chose tasks that remain challenging for “even the most capable models.” Specifically, Eureka tests for capabilities often overlooked in AI benchmarks, like visual-spatial navigation skills.
To show just how difficult Eureka can be for models, the researchers tested systems, including Anthropic’s Claude, OpenAI’s GPT-4o, and Meta’s Llama, on the benchmark. No single model scored well across all of Eureka’s tests, which the researchers say underscores the importance of “continued innovation” and “targeted improvements” to models.
Grab bag
In a win for professional actors, California passed two laws, AB 2602 and AB 1836, restricting the use of AI digital replicas.
The legislation, which was backed by SAG-AFTRA, the performers’ union, requires that companies relying on a performer’s digital replica (e.g., cloned voice or image) give a “reasonably specific” description of the replica’s intended use and negotiate with the performer’s legal counsel or labor union. It also requires that entertainment employers gain the consent of a deceased performer’s estate before using a digital replica of that person.
As the Hollywood Reporter notes in its coverage, the bills codify concepts that SAG-AFTRA fought for in its 118-day strike last year with studios and major streaming platforms. California is the second state after Tennessee to impose restrictions on the use of digital actor likenesses; SAG-AFTRA also sponsored the Tennessee effort.