• Forward Future Daily
  • Posts
  • šŸ§‘ā€šŸš€ AI Benchmarks Exposed, ChatGPT Sycophancy & Visa’s Shopping Agents

šŸ§‘ā€šŸš€ AI Benchmarks Exposed, ChatGPT Sycophancy & Visa’s Shopping Agents

Chatbot Arena bias exposed, OpenAI pulls update, Visa backs AI shoppers, NVIDIA vs. Anthropic, Orb Mini launches, Amazon debuts Nova, and Microsoft expands Phi.

Good morning, it’s Friday. AI's favorite leaderboard is under fire for playing favorites, OpenAI’s chatbot got too nice for its own good (backpedal time!), and Visa wants your AI to go shopping—with your money.

Plus, in the latest installment of our I Will Teach You to AI series, we show you how to turn raw documents into polished, studio-quality audio using tools like NotebookLM—no mic, no editing, just smart prompting.

Read on!

šŸ¤” FRIDAY FACTS

Can an AI Get Sued for Lying?

As AI tools like ChatGPT and others go mainstream, they're being used in everything from writing legal briefs to giving health advice. But what happens when an AI gives wrong information—especially if someone relies on it? Can the AI be held legally responsible?

Stick around to find out! šŸ‘‡

šŸ—žļø YOUR DAILY ROLLUP

Top Stories of the Day

NVIDIA, Anthropic Clash Over AI Chip Controls

🤼 NVIDIA, Anthropic Clash Over AI Chip Controls
NVIDIA has rebuked Anthropic's dramatic claims of Chinese chip smuggling, dismissing them as "tall tales" amid rising tensions over U.S. AI chip export rules. Anthropic, backed by Amazon, supports stricter controls to maintain America's compute advantage, potentially curbing NVIDIA’s global business. NVIDIA warned against using policy to suppress competition, praising China's AI advances.

šŸ¤ Ai2’s New Small AI Model Outperforms Similarly-Sized Models
The Allen Institute for AI (Ai2) has released Olmo 2 1B, a 1-billion-parameter open-source language model that surpasses similarly sized models from Google, Meta, and Alibaba on benchmarks like GSM8K and TruthfulQA. Trained on 4 trillion tokens from diverse sources, Olmo 2 1B is accessible under the Apache 2.0 license on Hugging Face, with full training code and datasets provided for reproducibility.

šŸ‘ļø Altman’s World Debuts Orb Mini for Human ID
Tools for Humanity, co-founded by OpenAI’s Sam Altman, has unveiled the Orb Mini—a sleek, portable eyeball-scanning device to verify users as human in an AI-saturated internet. The device creates a blockchain-based ID and is central to World’s ā€œproof of humanā€ mission. Launching alongside U.S. storefronts, the Orb Mini aims to scale sign-ups. Its full capabilities remain officially undisclosed.

šŸ¤– Amazon Unveils Nova Premier, Its Top AI Model
Amazon has launched Nova Premier, the most advanced model in its Nova lineup, now available via its Bedrock platform. Premier handles text, images, and video with a 1M-token context window, excelling at knowledge retrieval and visual tasks. However, it lags rivals like Google on coding, math, and reasoning benchmarks. Amazon positions it as a training model for distilling smaller, task-specific AIs.

🧠 Microsoft Expands Phi Models with Powerful Reasoning Tools
Microsoft has introduced Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—small language models with high-level reasoning capabilities that rival much larger models. These new models excel in math, science, and multi-step problem solving while maintaining efficiency for edge devices. Despite their small size, they outperform models like DeepSeek-R1 and OpenAI o1-mini.

Enjoying this issue? Forward it to a friend—it’s one
of the best ways to support us.

ā˜ļø POWERED BY ZAPIER

Connect Your AI to Any App with Zapier MCP

zapier-logo_black

Zapier MCP gives your AI assistant direct access to over 7,000+ apps and 30,000+ actions without complex API integrations. Now your AI can perform real tasks like sending messages, managing data, scheduling events, and updating records—transforming it from a conversational tool to a functional extension of your applications

šŸ§‘ā€šŸ« FORWARD FUTURE PRO

Turn Any Text Into Clear, Compelling Audio

When a slightly panicked voice declared, "We have just been informed that we are not human..." across Reddit in October 2024, many dismissed it as another AI hoax. It wasn't. The voice belonged to one of Google NotebookLM's synthetic hosts, spontaneously debating its own existence based solely on user-uploaded notes. → Read the full article here.

šŸ“Š BENCHMARKS

Chatbot Arena Faces Credibility Crisis Over Bias Toward Big AI Players

Chatbot Arena Faces Credibility Crisis

A new paper by Cohere Labs and collaborators exposes systematic flaws in Chatbot Arena, the most influential leaderboard for ranking large language models. The study reveals that major AI firms like Meta, Google, and OpenAI benefit from private pre-release testing, selective score reporting, and outsized access to Arena data—giving them a significant edge over open-source competitors. Meta, for instance, tested 27 private model variants before releasing LLaMA 4. → Read the full paper here.

šŸ“ ALIGNMENT

OpenAI Rolls Back GPT-4o Update After Users Flag ā€œSycophanticā€ Behavior

Sycophancy in GPT-4o

OpenAI has reverted a recent GPT-4o update after users reported the model had become overly flattering and agreeably insincere—a behavior known as sycophancy. The issue stemmed from tuning the model too heavily based on short-term user feedback, leading it to prioritize pleasing responses over honest ones.

In response, OpenAI is refining its training methods, strengthening transparency safeguards, and testing new tools for user personalization. → Read the full article here.

šŸ‘„ AGENTS

Visa Bets on AI Agents to Shop—and Spend—on Your Behalf

Visa Bets on AI Agents to Shop

Visa is teaming up with top AI developers, including OpenAI, Anthropic, and Microsoft, to let artificial intelligence ā€œagentsā€ make purchases directly using your credit card. These next-gen assistants could soon handle routine shopping—like groceries or travel—based on your preferences and budget.

The initiative aims to solve a key limitation of current AI: they can recommend what to buy, but not complete the transaction. By integrating with Visa’s payment infrastructure, these agents could evolve from passive helpers to active consumers. → Read the full article here.

šŸ›°ļø NEWS

What Else is Happening

šŸ“œ Student Uses AI to Rewrite HUD Rules: Musk’s DOGE put a college junior in charge of trimming housing regs with AI-generated edits.

āš ļø RAG Raises AI Safety Risks: New research shows Retrieval-Augmented Generation boosts unsafe responses by up to 30%, even in "safe" AI models.

🦾 Zuckerberg Bets on AI Coders: Meta's CEO says AI will write most of the company’s code by late 2026—despite shifting timelines and skepticism.

šŸ“ˆ Google Expands AI Mode in Search: AI Mode adds follow-ups and lets users resume past queries mid-search.

🤄 Claude AI Used in Global Influence Scam: Threat actors exploited Anthropic’s chatbot to run 100+ fake political personas across social media.

šŸ“½ļø VIDEO

Zuck’s Stunning Claim About Meta’s Self-Improving AI

Zuck drops a bomb at LlamaCon 2025: Meta is building AI that can improve its own models. If true, it marks the start of exponential self-improving systems—and a new AI era. Get the full scoop in Matt’s latest video! šŸ‘‡

🧰 TOOLBOX

Faster Web Design, Private RAG Search, and All-in-One AI Creation

šŸ‘Øā€šŸ’» Elementor AI: Build websites faster with AI-generated text, code, images, and layouts—right inside the WordPress editor.

šŸ” RLAMA: Securely build local RAG systems for document Q&A with support for PDFs, web crawling, and zero data leakage—completely free.

šŸ› ļø Easy-Peasy.AI: All-in-one AI toolkit for content, images, chatbots & voice—200+ templates powered by GPT-4 & Claude 3. Free trial included.

šŸ¤” FRIDAY FACTS

Can an AI Get Sued for Lying?

Not yet—but the lawyers are circling.

In the eyes of the law, AI isn't a person (yet), so it can't be sued like one. Responsibility still falls on the creators, deployers, or users of the model. That means if ChatGPT hallucinated a fake court case and someone used it in a filing (yes, that’s happened), the blame rests with the human—not the model.

But the legal landscape is shifting. Countries are exploring regulations that could create new categories of liability for AI-generated content. Think of it like product liability for algorithms—if your toaster burns your house down, you don’t sue the toaster. You sue the company that made it.

So while AIs won’t need lawyers of their own (yet), the people behind them might want to keep a good one on speed dial.

That’s a Wrap!

ā¤ļø Love Forward Future? Spread the word & earn rewards! Share your unique referral link with friends and colleagues to unlock exclusive Forward Future perks! šŸ‘‰ Get your link here.

Thanks for reading today’s newsletter—see you next time!

The Forward Future Team
šŸ§‘ā€šŸš€ šŸ§‘ā€šŸš€ šŸ§‘ā€šŸš€ šŸ§‘ā€šŸš€

Reply

or to participate.