- Forward Future Daily
- Posts
- š§āš AI Benchmarks Exposed, ChatGPT Sycophancy & Visaās Shopping Agents
š§āš AI Benchmarks Exposed, ChatGPT Sycophancy & Visaās Shopping Agents
Chatbot Arena bias exposed, OpenAI pulls update, Visa backs AI shoppers, NVIDIA vs. Anthropic, Orb Mini launches, Amazon debuts Nova, and Microsoft expands Phi.
Good morning, itās Friday. AI's favorite leaderboard is under fire for playing favorites, OpenAIās chatbot got too nice for its own good (backpedal time!), and Visa wants your AI to go shoppingāwith your money.
Plus, in the latest installment of our I Will Teach You to AI series, we show you how to turn raw documents into polished, studio-quality audio using tools like NotebookLMāno mic, no editing, just smart prompting.
Read on!
š¤ FRIDAY FACTS
Can an AI Get Sued for Lying?
As AI tools like ChatGPT and others go mainstream, they're being used in everything from writing legal briefs to giving health advice. But what happens when an AI gives wrong informationāespecially if someone relies on it? Can the AI be held legally responsible?
Stick around to find out! š
šļø YOUR DAILY ROLLUP
Top Stories of the Day

𤼠NVIDIA, Anthropic Clash Over AI Chip Controls
NVIDIA has rebuked Anthropic's dramatic claims of Chinese chip smuggling, dismissing them as "tall tales" amid rising tensions over U.S. AI chip export rules. Anthropic, backed by Amazon, supports stricter controls to maintain America's compute advantage, potentially curbing NVIDIAās global business. NVIDIA warned against using policy to suppress competition, praising China's AI advances.
š¤ Ai2ās New Small AI Model Outperforms Similarly-Sized Models
The Allen Institute for AI (Ai2) has released Olmo 2 1B, a 1-billion-parameter open-source language model that surpasses similarly sized models from Google, Meta, and Alibaba on benchmarks like GSM8K and TruthfulQA. Trained on 4 trillion tokens from diverse sources, Olmo 2 1B is accessible under the Apache 2.0 license on Hugging Face, with full training code and datasets provided for reproducibility.
šļø Altmanās World Debuts Orb Mini for Human ID
Tools for Humanity, co-founded by OpenAIās Sam Altman, has unveiled the Orb Miniāa sleek, portable eyeball-scanning device to verify users as human in an AI-saturated internet. The device creates a blockchain-based ID and is central to Worldās āproof of humanā mission. Launching alongside U.S. storefronts, the Orb Mini aims to scale sign-ups. Its full capabilities remain officially undisclosed.
š¤ Amazon Unveils Nova Premier, Its Top AI Model
Amazon has launched Nova Premier, the most advanced model in its Nova lineup, now available via its Bedrock platform. Premier handles text, images, and video with a 1M-token context window, excelling at knowledge retrieval and visual tasks. However, it lags rivals like Google on coding, math, and reasoning benchmarks. Amazon positions it as a training model for distilling smaller, task-specific AIs.
š§ Microsoft Expands Phi Models with Powerful Reasoning Tools
Microsoft has introduced Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoningāsmall language models with high-level reasoning capabilities that rival much larger models. These new models excel in math, science, and multi-step problem solving while maintaining efficiency for edge devices. Despite their small size, they outperform models like DeepSeek-R1 and OpenAI o1-mini.
Enjoying this issue? Forward it to a friendāitās one
of the best ways to support us.
āļø POWERED BY ZAPIER
Connect Your AI to Any App with Zapier MCP

Zapier MCP gives your AI assistant direct access to over 7,000+ apps and 30,000+ actions without complex API integrations. Now your AI can perform real tasks like sending messages, managing data, scheduling events, and updating recordsātransforming it from a conversational tool to a functional extension of your applications
š§āš« FORWARD FUTURE PRO
Turn Any Text Into Clear, Compelling Audio

When a slightly panicked voice declared, "We have just been informed that we are not human..." across Reddit in October 2024, many dismissed it as another AI hoax. It wasn't. The voice belonged to one of Google NotebookLM's synthetic hosts, spontaneously debating its own existence based solely on user-uploaded notes. ā Read the full article here.
š BENCHMARKS
Chatbot Arena Faces Credibility Crisis Over Bias Toward Big AI Players

A new paper by Cohere Labs and collaborators exposes systematic flaws in Chatbot Arena, the most influential leaderboard for ranking large language models. The study reveals that major AI firms like Meta, Google, and OpenAI benefit from private pre-release testing, selective score reporting, and outsized access to Arena dataāgiving them a significant edge over open-source competitors. Meta, for instance, tested 27 private model variants before releasing LLaMA 4. ā Read the full paper here.
š ALIGNMENT
OpenAI Rolls Back GPT-4o Update After Users Flag āSycophanticā Behavior

OpenAI has reverted a recent GPT-4o update after users reported the model had become overly flattering and agreeably insincereāa behavior known as sycophancy. The issue stemmed from tuning the model too heavily based on short-term user feedback, leading it to prioritize pleasing responses over honest ones.
In response, OpenAI is refining its training methods, strengthening transparency safeguards, and testing new tools for user personalization. ā Read the full article here.
š„ AGENTS
Visa Bets on AI Agents to Shopāand Spendāon Your Behalf

Visa is teaming up with top AI developers, including OpenAI, Anthropic, and Microsoft, to let artificial intelligence āagentsā make purchases directly using your credit card. These next-gen assistants could soon handle routine shoppingālike groceries or travelābased on your preferences and budget.
The initiative aims to solve a key limitation of current AI: they can recommend what to buy, but not complete the transaction. By integrating with Visaās payment infrastructure, these agents could evolve from passive helpers to active consumers. ā Read the full article here.
š°ļø NEWS
What Else is Happening
š Student Uses AI to Rewrite HUD Rules: Muskās DOGE put a college junior in charge of trimming housing regs with AI-generated edits.
ā ļø RAG Raises AI Safety Risks: New research shows Retrieval-Augmented Generation boosts unsafe responses by up to 30%, even in "safe" AI models.
𦾠Zuckerberg Bets on AI Coders: Meta's CEO says AI will write most of the companyās code by late 2026ādespite shifting timelines and skepticism.
š Google Expands AI Mode in Search: AI Mode adds follow-ups and lets users resume past queries mid-search.
𤄠Claude AI Used in Global Influence Scam: Threat actors exploited Anthropicās chatbot to run 100+ fake political personas across social media.
š½ļø VIDEO
Zuckās Stunning Claim About Metaās Self-Improving AI
Zuck drops a bomb at LlamaCon 2025: Meta is building AI that can improve its own models. If true, it marks the start of exponential self-improving systemsāand a new AI era. Get the full scoop in Mattās latest video! š
š§° TOOLBOX
Faster Web Design, Private RAG Search, and All-in-One AI Creation
šØāš» Elementor AI: Build websites faster with AI-generated text, code, images, and layoutsāright inside the WordPress editor.
š RLAMA: Securely build local RAG systems for document Q&A with support for PDFs, web crawling, and zero data leakageācompletely free.
š ļø Easy-Peasy.AI: All-in-one AI toolkit for content, images, chatbots & voiceā200+ templates powered by GPT-4 & Claude 3. Free trial included.
š¤ FRIDAY FACTS
Can an AI Get Sued for Lying?
Not yetābut the lawyers are circling.
In the eyes of the law, AI isn't a person (yet), so it can't be sued like one. Responsibility still falls on the creators, deployers, or users of the model. That means if ChatGPT hallucinated a fake court case and someone used it in a filing (yes, thatās happened), the blame rests with the humanānot the model.
But the legal landscape is shifting. Countries are exploring regulations that could create new categories of liability for AI-generated content. Think of it like product liability for algorithmsāif your toaster burns your house down, you donāt sue the toaster. You sue the company that made it.
So while AIs wonāt need lawyers of their own (yet), the people behind them might want to keep a good one on speed dial.
Thatās a Wrap!
ā¤ļø Love Forward Future? Spread the word & earn rewards! Share your unique referral link with friends and colleagues to unlock exclusive Forward Future perks! š Get your link here.
Thanks for reading todayās newsletterāsee you next time!
The Forward Future Team
š§āš š§āš š§āš š§āš
Reply