👾 The Second Half of 2025: What Can We Expect in the Area of AI?

From assistants to autonomous agents: how AI's evolving capabilities could reshape work and compliance by 2026.

Obviously, many people are skeptical that powerful AI will be built soon and some are skeptical that it will ever be built at all. I think it could come as early as 2026, though there are also ways it could take much longer.

Dario Amodei (Machine Loving Grace)

It's December 2025: between a four-column PDF study, a database with 800,000 documents and an ongoing video conference, an invisible AI orchestrates the workflow, answers queries in real time, writes the final report and simultaneously reserves the train ride to the next appointment. What is still considered a tech-savvy demonstration today could become everyday office life within a few months. The dynamics of the first half of 2025 certainly point in this direction: OpenAI has presented GPT-4.1, a language model that loads entire specialist libraries into a single session thanks to a context length of one million tokens, while Google is already looking to exceed the two million mark with Gemini 2.5.

But size alone is no longer the decisive factor. At the same time, research teams are experimenting with agentic architectures in which specialized models cooperate, operate complete software stacks and autonomously reflect decisions back to humans. Initial field tests such as OpenAI's “Operator” or Amazon's planned multi-agent framework “Kiro” indicate that this approach is more than just a PR maneuver; at the same time, experts warn of an “office full of algorithmic interns” who act extremely quickly, but not necessarily reliably.

Against this backdrop, the following article examines which breakthroughs appear realistic in the second half of 2025, where gradual improvements are to be expected and which promises must be considered speculative for the time being. The guiding question is whether the next six months will actually mark the transition from assisting to acting artificial intelligence - or whether the technology will remain stuck at regulatory, economic and methodological hurdles.

The Race for Context

OpenAI sets a new mark with GPT-4.1: one million tokens with no decay in response quality and 26% lower costs per token - a signal that model efficiency and context depth are no longer at odds.

All three models can process up to one million tokens of context — the text, images, or videos included in a prompt. That’s far more than GPT-4o’s 128,000-token limit. “We trained GPT‑4.1 to reliably attend to information across the full 1 million context length,” OpenAI says in a post announcing the models.

“We’ve also trained it to be far more reliable than GPT‑4o at noticing relevant text, and ignoring distractors across long and short context lengths.”GPT 4.1 is also 26 percent cheaper than GPT-4o, a metric that has become more important following the debut of DeepSeek’s ultra-efficient AI model.

The Verge

Google is following up with Gemini 2.5 and openly mentions “2 million tokens soon”, with early beta testers already reporting test runs with 1.2 million.

The increasingly long windows are shifting product boundaries: instead of retrieval pipelines, huge knowledge bases will have to fit directly into the session in future. At the same time, there is growing pressure to control hallucinations in long dialogs - a field that recent studies describe as a “fragmentation problem”. The industry is countering with hierarchy prompts, distributed memories and semantic compression sampling, techniques that are likely to be incorporated into commercial SDKs by the end of the year.

Agentic Systems: From Assistant to Executor

In January, OpenAI demonstrated “Operator” - an agent that controls browsers and desktops, buys concert tickets and fills out forms. Amazon counters with “Kiro”, a multi-agent framework for real-time code production; internal documents state the end of June as the earliest preview.

Microsoft, on the other hand, is expanding Copilot Studio into an agent platform by September, in which developers can pack their own micro-agents as reusable building blocks. In a much-cited analysis, IBM warns that autonomous agents are “not a license for complete automation” and must remain embedded in workflows with clear handover points.  Practical experience confirms this: pilot projects in finance departments show that human supervision initially reduces costs because escalations are recognized early on.

In a much-cited analysis, IBM warns that autonomous agents are “not a license for complete automation” and must remain embedded in workflows with clear handover points. Practical experience confirms this: pilot projects in finance departments show that human supervision initially reduces costs because escalations are recognized early on.

And while Hay is hopeful about the potential for agentic development in 2025, he sees a problem in another area: “Most organizations aren't agent-ready. What's going to be interesting is exposing the APIs that you have in your enterprises today. That's where the exciting work is going to be. And that's not about how good the models are going to be. That's going to be about how enterprise-ready you are.

IBM

Multimodal Media Revolution

The text-to-video boundary is shifting. Research prototypes generate one-minute, consistent animated film sequences from a single prompt for the first time.

Open source teams such as FramePack prove that 60-second clips now run on consumer GPUs with 6 GB VRAM. The 13 trillion parameter models launched today, such as Lightrick's LTX, reduce rendering times by a factor of 30. The leap into widespread use now depends less on computing power than on licensing and copyright issues - points that the EU regulation addresses emphatically.

Tool Cognition and Open Protocols

OpenAI merges chat, assistants and function calling API into a single “Responses API” - including integrated web search, code interpreter and table analysis. In parallel, Anthropic is driving forward the Model Context Protocol (MCP) as an open standard through which agents agree on shared memory areas. This convergence reduces development effort; start-ups report 40% shorter iterations since the introduction of multifunctional calls.

Regulatory Realities

On August 2, 2025, the governance obligations of the EU AI Act for general purpose models come into force. Providers must roughly disclose training data and sign a code of practice - “central proof of compliance”, according to the Commission. Although tech companies are trying to relax passages, Brussels has so far been unperturbed. The result: from H2 / 2025, every European product launch will be accompanied by transparency reports; some US providers are considering issuing EU models with a separate parameter seal.

Personalization: From Long Context to Permanent Memory

Perhaps the most radical change in the coming months will not concern parameter sizes, but the ability of language models to “remember” individual users. In April, OpenAI introduced an “extended memory” that goes beyond explicitly stored facts to analyze the entire chat history and incorporate it into new answers - including the option to rewrite these memories for web searches and thus refine search queries.

This shifts personalization from static “custom instruction” fine-tuning to a dynamic memory that functions like a private vector archive: Relevant sentences from previous sessions are vectorized, weighted according to their semantic proximity and fed into each prompt as an additional latent representation. In combination with context windows beyond the million-token mark, this creates something like an ongoing, individually curated knowledge base - a difference that is already evident in tests in that ChatGPT automatically picks up on personal style preferences or past projects without users having to mention them again.

The competition is also accelerating: Google has added a “Recall” function to Gemini Advanced, which summarizes previous conversations and continues them automatically if required.  Anthropic takes a more conservative approach and allows the storage of “work results”, but deliberately does not allow tacit personality profiling - a compromise that is particularly popular in regulated industries. Meta, on the other hand, relies on open weights: with Llama 3.2, a personal language style can be trained on AWS Bedrock or local GPUs via a LoRA adapter within a few hours - at a tenth of the cost of classic fine-tuning runs.

It is therefore likely that two models will emerge - a cloud memory with granular opt-ins for open markets and client-side mini-models that keep sensitive preferences encrypted on the end device. Both variants ultimately pursue the same goal: an AI that not only understands what is being said, but also who is speaking - and thus develops the dialog from a one-off session to ongoing, personal collaboration.

Preliminary Results at the End of the Quarter

The dynamics indicate that context augmentation and agentics will determine the pace of innovation in the second half of the year, while multimodal video rendering remains spectacular but is still struggling with infrastructure and the legal framework. At the same time, EU regulation is forcing large labs to be transparent without noticeably slowing down the pace of development - at least until financial penalties take full effect in 2026.

  • Launch of GPT-5 with fully integrated voice, image and audio processing in fall: ≈ 95% (most of it already announced)

  • 2 million public token context windows in cloud offerings by December: ≈ 85% ( source: blog.google)

  • Widely available, semi-autonomous office agents (Operator, Kiro, Copilot) in productive environments: ≈ 65 % (still many hurdles but good advances)

  • One-minute AI videos with consistent characters for marketing campaigns: ≈ 55 %

  • On-device LLMs < 10 B parameters in high-end smartphones: ≈ 40% (indirect trend from chip roadmaps, no official announcement)

  • Standardized tool APIs (Responses API / MCP) as an industry default: ≈ 85 %

Conclusion

The next six months will be decisive in determining whether AI matures from a dialog-oriented assistant to a cooperative project partner. If the stabilization of millions of contexts succeeds, research, planning and execution will merge into a single interaction. At the same time, the wave of agent-based systems is pushing its way into everyday work processes, from code commits to financial bookings. Regulation creates clarity without throttling the innovation engine.

Will the second half of the year bring the big breakthrough? The indications are that we will see the first scalable, multimodal agents that learn, decide and act in real time - albeit under the watchful eye of compliance teams and a public that demands data transparency. The most exciting open question is therefore: will 2026 be the year in which people not only commission agents, but also give them real responsibility?

Ready for more content from Kim Isenberg? Subscribe to FF Daily for free!

Kim Isenberg

Kim studied sociology and law at a university in Germany and has been impressed by technology in general for many years. Since the breakthrough of OpenAI's ChatGPT, Kim has been trying to scientifically examine the influence of artificial intelligence on our society.

Reply

or to participate.