OpenAI DevDay 2025: Apps SDK, Agent Kit, MCP, and Why Prompting Remains Critical for AI Success

OpenAI DevDay 2025: Apps SDK, Agent Kit, MCP, and Why Prompting Remains Critical for AI Success

AI Agents Developer Tools OpenAI Prompting

Introduction

OpenAI’s DevDay 2025 marked a significant milestone in the evolution of AI development infrastructure. The event showcased three major technological announcements that are reshaping how developers build, deploy, and scale AI applications: the Apps SDK, Agent Kit, and the adoption of the Model Context Protocol (MCP). Beyond these technical launches, a critical theme emerged throughout the conference—the realization that prompting is more important than ever in the age of autonomous AI agents. This comprehensive guide explores each of these developments, their implications for developers, and why mastering the art of prompting has become a fundamental skill for anyone building with modern AI systems.

Understanding the Evolution of AI Developer Tools

The journey from simple API endpoints to sophisticated agentic systems represents a fundamental shift in how artificial intelligence is deployed and distributed. When OpenAI first launched its API, the company made a deliberate choice to open its technology to developers worldwide, recognizing that no single organization could bring the benefits of advanced AI to every corner of the globe. This philosophy has remained consistent throughout OpenAI’s evolution, but the mechanisms for achieving this distribution have become increasingly sophisticated. The original API model allowed developers to call specific endpoints and receive responses, but it was fundamentally reactive—developers had to orchestrate the entire workflow themselves. Today’s landscape is dramatically different, with developers expecting tools that enable autonomous agents, seamless integrations, and rich user experiences that feel native to the platforms where they’re deployed.

The growth metrics tell a compelling story about this evolution. OpenAI now serves over 800 million weekly active ChatGPT users, making it one of the largest websites globally. More importantly for developers, the platform now supports 4 million developers building applications, up from 3 million the previous year. This explosive growth reflects not just increased adoption, but a fundamental shift in how developers view AI—no longer as a novelty feature to add to existing products, but as a core capability that can transform entire business models. The infrastructure supporting this ecosystem has had to evolve accordingly, moving from simple API calls to complex orchestration systems that can handle tool calling, context management, and sophisticated user interactions.

What is the Model Context Protocol and Why It Matters

The Model Context Protocol represents a watershed moment in AI infrastructure development. Rather than OpenAI building proprietary solutions for every integration challenge, the company recognized that an open standard would benefit the entire ecosystem. MCP is essentially a standardized way for applications to provide context and tools to large language models, functioning like a universal connector that works across different AI platforms and applications. The protocol was originally developed by Anthropic, but OpenAI’s decision to embrace and integrate it demonstrates a commitment to open standards that transcends individual company interests. This is particularly significant because it means developers can build integrations once and deploy them across multiple AI platforms, rather than creating separate implementations for each system.

The beauty of MCP lies in its simplicity and generality. Rather than requiring developers to learn platform-specific integration patterns, MCP provides a consistent interface that works whether you’re connecting to Claude, ChatGPT, or other AI systems. OpenAI’s integration of MCP into its Agent SDK in March 2025 was a pivotal moment, signaling that the company viewed this open protocol as the natural evolution of how AI systems should connect to external tools and data sources. The protocol handles everything from simple tool definitions to complex context management, allowing developers to focus on building valuable integrations rather than wrestling with integration mechanics. By having team members like Nick Cooper participate in the MCP steering committee, OpenAI ensures that the protocol continues to evolve in ways that serve the broader developer community while meeting the specific needs of different AI platforms.

The Apps SDK: Inverting the AI Integration Model

For years, the standard approach to integrating AI into applications followed a predictable pattern: you had a website or application, and somewhere in the corner was a chatbot powered by AI. The Apps SDK fundamentally inverts this relationship. Now, ChatGPT becomes the primary interface, and applications are embedded within it as rich, interactive experiences. This inversion is more than cosmetic—it represents a profound shift in how users interact with AI and how developers think about distribution. Instead of trying to drive users to your website or application, developers can now meet users where they already are: in ChatGPT, which has become a primary destination for millions of people seeking information, assistance, and solutions.

The Apps SDK builds directly on MCP, allowing developers to create applications that feel native to ChatGPT while maintaining complete control over the user experience. This is a crucial distinction from earlier plugin systems, which were criticized for limiting developer control. With the Apps SDK, companies like Canva can create experiences that look and feel like Canva, complete with custom UI components and brand-consistent design, while still being accessible directly within ChatGPT. Users can chat with the AI, get recommendations, and then interact with the embedded application without ever leaving the ChatGPT interface. This seamless integration is possible because the Apps SDK provides developers with the tools to define custom UI components, manage state, and create experiences that feel like natural extensions of ChatGPT rather than bolted-on features.

The learning from previous iterations is evident in the Apps SDK’s design. When OpenAI launched plugins in March 2023, developers provided feedback that they wanted more control over how their integrations appeared and functioned within ChatGPT. The company listened, and the Apps SDK represents the culmination of that feedback loop. Developers can now own the entire experience, from how their application appears to how it functions within the ChatGPT environment. This shift from tool-based integration to experience-based integration is particularly important for companies that have invested heavily in their brand and user experience—they no longer have to compromise on their identity to reach ChatGPT’s massive user base.

Agent Kit: Democratizing Autonomous AI Development

Agent Kit represents OpenAI’s most ambitious attempt yet to democratize the development of autonomous AI systems. Launched at DevDay 2025, Agent Kit provides developers with a comprehensive toolkit for building agents that can perform complex, multi-step tasks with minimal human intervention. The toolkit includes APIs specifically designed for agentic applications, evaluation capabilities for testing agent behavior, and integration with MCP for connecting to external tools and data sources. What makes Agent Kit particularly significant is that it reduces the barrier to entry for building sophisticated agents—developers no longer need to be AI researchers or have deep expertise in prompt engineering to create agents that work effectively.

The Agent Kit includes several critical components that work together to enable agent development. The Agents API allows developers to define how agents should behave, what tools they have access to, and how they should handle different scenarios. Evaluation capabilities enable developers to test their agents systematically, using datasets and trace grading to understand where agents succeed and where they fail. Automated prompt optimization helps developers refine their system prompts without manual trial and error. Third-party integrations mean developers can connect their agents to existing tools and services, creating workflows that span multiple systems. Together, these components create an environment where developers can focus on defining what they want their agents to do, rather than wrestling with the technical details of how to make agents work.

The significance of Agent Kit extends beyond just technical capabilities. By providing a standardized toolkit, OpenAI is essentially saying that building autonomous agents should be as accessible as building traditional applications. This democratization has profound implications for how AI will be deployed across industries. Companies that previously would have needed to hire specialized AI talent can now use Agent Kit to build agents that handle customer service, data analysis, content creation, and countless other tasks. The toolkit abstracts away much of the complexity, allowing developers to focus on the business logic and user experience rather than the underlying AI mechanics.

FlowHunt and the Future of AI Workflow Automation

In this evolving landscape of AI development tools and frameworks, platforms like FlowHunt are emerging as essential infrastructure for developers and teams building with these new capabilities. FlowHunt recognizes that while tools like the Apps SDK, Agent Kit, and MCP provide the building blocks for AI applications, developers still need a unified platform to orchestrate, monitor, and optimize these workflows. FlowHunt integrates with modern AI tools and protocols, allowing developers to build complex AI workflows without managing multiple disconnected systems. By providing a centralized platform for workflow management, FlowHunt enables developers to focus on creating value rather than managing infrastructure.

The platform’s approach aligns perfectly with the philosophy behind the Apps SDK and Agent Kit—providing developers with tools that abstract away complexity while maintaining flexibility and control. FlowHunt allows teams to define workflows that span multiple AI models, integrate with external services through MCP, and monitor performance across their entire AI application portfolio. This is particularly valuable as organizations scale their AI initiatives, moving from single-use cases to enterprise-wide AI deployment. FlowHunt’s integration with these emerging standards ensures that developers can build on solid foundations while maintaining the flexibility to adapt as the AI landscape continues to evolve.

Why Prompting is More Important Than Ever

Perhaps the most important insight from DevDay 2025 is the recognition that prompting—the art and science of instructing AI systems—has become more critical than ever. As AI agents become more autonomous and capable, the quality of the prompts that guide them directly determines their effectiveness, reliability, and alignment with user intentions. This represents a fundamental shift in how developers should think about AI development. In the early days of large language models, prompting was often treated as a secondary concern, something you could figure out through trial and error. Today, prompting is a first-class concern that deserves the same rigor and attention as traditional software engineering.

The reason prompting has become so critical is rooted in how modern AI agents operate. Unlike traditional software, which follows explicit instructions encoded in code, AI agents interpret natural language instructions and make decisions based on their understanding of those instructions. The quality of that interpretation depends almost entirely on the clarity, specificity, and completeness of the prompt. A well-crafted system prompt can guide an agent to make consistently good decisions, handle edge cases gracefully, and maintain alignment with user intentions even in novel situations. Conversely, a poorly crafted prompt can lead to unpredictable behavior, hallucinations, and failures that are difficult to debug because they emerge from the agent’s interpretation of ambiguous instructions.

Effective prompting for AI agents requires thinking about several key dimensions. First, clarity is paramount—system prompts should use simple, direct language that presents ideas at the right level of abstraction for the agent. Rather than trying to be comprehensive, effective prompts focus on the most important constraints and behaviors. Second, context matters enormously. Agents need to understand not just what they should do, but why they should do it and what constraints they should operate within. Third, examples are invaluable. Providing concrete examples of desired behavior helps agents understand patterns and apply them to novel situations. Finally, iterative refinement is essential. Even well-crafted prompts can be improved through systematic testing and evaluation, using tools like those provided in Agent Kit to understand where agents succeed and where they fail.

The importance of prompting extends beyond just technical correctness. System prompts are also the mechanism through which developers can encode ethical guidelines, safety constraints, and values into AI agents. By carefully crafting prompts, developers can define processes that ensure AI is used thoughtfully and responsibly, rather than simply optimizing for narrow metrics that might lead to unintended consequences. This makes prompting not just a technical skill, but a critical responsibility for anyone building AI systems. As AI agents become more autonomous and capable, the prompts that guide them become increasingly important for ensuring that AI systems behave in ways that are beneficial and aligned with human values.

Building Effective AI Agents: Practical Insights

The practical implications of these developments are significant for developers at all levels. Building effective AI agents requires a systematic approach that combines technical understanding with careful attention to prompting and evaluation. The first step is to clearly define what you want your agent to do. This might seem obvious, but many developers jump into implementation without fully thinking through the agent’s objectives, constraints, and success criteria. Taking time to write clear specifications for your agent’s behavior makes everything that follows easier. What decisions should the agent make? What tools should it have access to? What should it do if it encounters ambiguous situations? These questions should be answered before you write a single line of code.

Once you have clear specifications, the next step is to craft your system prompt. This is where the art of prompting becomes critical. Your system prompt should clearly communicate the agent’s role, its objectives, and the constraints it should operate within. It should provide examples of desired behavior and explain how the agent should handle edge cases. Rather than trying to be exhaustive, focus on the most important behaviors and let the agent’s training handle the rest. Many developers make the mistake of writing overly long, complex prompts that try to cover every possible scenario. In practice, shorter, more focused prompts often work better because they’re easier for the agent to understand and apply consistently.

The third step is systematic evaluation. Agent Kit provides tools for this, but the principle applies regardless of what tools you’re using. You should test your agent against a variety of scenarios, including both typical cases and edge cases. Use datasets to systematically evaluate performance, and use trace grading to understand where the agent succeeds and where it fails. This evaluation process is not a one-time activity—it should be ongoing as you refine your agent and as the world changes. By treating evaluation as a first-class concern, you can catch problems early and continuously improve your agent’s performance. This iterative approach to agent development is fundamentally different from traditional software development, where you might write code once and then maintain it. With AI agents, continuous refinement based on evaluation is essential for maintaining quality.

The Developer Ecosystem at Scale

The growth to 4 million developers represents a fundamental shift in how AI is being deployed. This is no longer a niche community of AI researchers and early adopters—it’s a mainstream developer ecosystem spanning every industry and geography. This scale brings both opportunities and challenges. On the opportunity side, the large developer community means that best practices are being shared, tools are being built to address common problems, and the ecosystem is becoming increasingly sophisticated. On the challenge side, this scale means that the quality bar for developer tools has risen dramatically. Developers expect tools that are easy to use, well-documented, and reliable at scale.

The Apps SDK and Agent Kit are designed with this scale in mind. They provide abstractions that make it easy for developers to build sophisticated applications without needing to understand all the underlying complexity. At the same time, they provide enough flexibility for advanced developers to customize behavior and optimize for their specific use cases. This balance between simplicity and flexibility is crucial for tools that need to serve a diverse developer community. The adoption of MCP as an open standard is also important for scale—it means that developers can build integrations that work across multiple platforms, rather than being locked into a single vendor’s ecosystem.

The implications of this scale extend beyond just technical considerations. With 4 million developers building on OpenAI’s platform, the company has a responsibility to ensure that these developers have the tools, documentation, and support they need to succeed. This is why DevDay 2025 included not just technical announcements, but also a focus on developer experience. The podcast studio at the event, the arcade games, and the art installations were all designed to create an engaging environment where developers could learn, network, and feel valued. These might seem like small details, but they reflect a recognition that developer experience is as important as technical capabilities for building a thriving ecosystem.

The Inversion of AI Integration: From Chatbot to Platform

One of the most profound insights from DevDay 2025 is the recognition that the relationship between applications and AI has fundamentally inverted. For years, the model was: you have an application, and you add a chatbot to it. Now, the model is: you have ChatGPT, and you embed applications within it. This inversion has massive implications for how developers should think about building AI-powered products. Rather than trying to drive users to your application, you can now meet users where they already are. ChatGPT has become a primary destination for millions of people, and the Apps SDK makes it possible to create rich, interactive experiences within that platform.

This inversion is enabled by the combination of the Apps SDK and MCP. The Apps SDK provides the mechanism for creating rich experiences within ChatGPT, while MCP provides the standardized way to connect those experiences to external tools and data. Together, they create an environment where developers can build applications that feel native to ChatGPT while maintaining complete control over the user experience. This is fundamentally different from earlier approaches, where integrations felt like they were bolted onto ChatGPT rather than being integral to it. The Canva example from the keynote perfectly illustrates this—users can chat with ChatGPT about design ideas, and then interact with Canva directly within the ChatGPT interface, all without leaving the platform.

The implications of this inversion extend to how developers should think about distribution and user acquisition. Traditionally, getting users to your application required marketing, SEO, and other acquisition strategies. With the Apps SDK, distribution becomes a function of building a great experience that users want to use. If your application provides value within ChatGPT, users will discover it and use it. This doesn’t eliminate the need for marketing, but it changes the nature of the challenge. Rather than trying to drive traffic to your website, you’re trying to build an experience that users will want to use within ChatGPT. This is a more direct path to users, but it also means that the quality of your experience becomes even more critical.

Evaluating and Optimizing AI Agents

As developers build more sophisticated agents, the ability to evaluate and optimize them becomes increasingly important. Agent Kit includes several tools for this purpose, but the principles apply regardless of what tools you’re using. Evaluation should be systematic, ongoing, and focused on the metrics that matter for your use case. Rather than just measuring accuracy, you should measure things like user satisfaction, task completion rate, and the quality of the agent’s reasoning. Different applications will have different success metrics, so it’s important to think carefully about what you’re actually trying to optimize for.

One of the most valuable features in Agent Kit is automated prompt optimization. This tool uses systematic evaluation to suggest improvements to your system prompt, helping you refine your agent’s behavior without manual trial and error. This is particularly valuable because prompt optimization can be tedious and time-consuming if done manually. By automating this process, Agent Kit allows developers to focus on higher-level concerns while the tool handles the details of prompt refinement. However, it’s important to remember that automated optimization is a tool to assist human judgment, not a replacement for it. Developers should still understand what their agents are doing and why, even if they’re using automated tools to optimize performance.

The evaluation process should also include testing for edge cases and failure modes. What happens when your agent encounters a situation it wasn’t trained for? How does it handle ambiguous requests? What does it do when it doesn’t have enough information to make a decision? By systematically testing these scenarios, you can identify problems before they affect users. This is particularly important for agents that will be deployed in production environments where failures can have real consequences. The trace grading feature in Agent Kit is valuable for this purpose—it allows you to examine exactly what your agent did in specific scenarios and understand why it made the decisions it made.

The Future of AI Development Infrastructure

Looking forward, the trajectory is clear: AI development infrastructure will continue to become more sophisticated, more accessible, and more standardized. The adoption of MCP as an open standard is a signal that the industry is moving toward interoperability and away from vendor lock-in. This is good for developers because it means they can build on solid foundations without worrying that their investments will become obsolete if a particular vendor changes direction. The Apps SDK and Agent Kit represent the current state of the art in making AI development accessible to mainstream developers, but they’re not the end of the story. As the ecosystem matures, we can expect to see even more sophisticated tools that make it easier to build, deploy, and scale AI applications.

One area that will likely see significant development is the tooling around prompting and evaluation. As more developers build agents, the need for better tools to manage prompts, test agents, and optimize performance will become increasingly acute. We’re already seeing the beginning of this with features like automated prompt optimization in Agent Kit, but this is just the start. In the future, we can expect to see more sophisticated tools that help developers understand agent behavior, identify problems, and optimize performance. These tools will likely incorporate machine learning themselves, using data from millions of agents to suggest improvements and identify best practices.

Another area of development will be around safety and alignment. As AI agents become more autonomous and capable, ensuring that they behave in ways that are safe and aligned with human values becomes increasingly important. This will likely drive development of better tools for specifying constraints, testing for unintended behaviors, and monitoring agents in production. The emphasis on prompting as a mechanism for encoding values and constraints is a step in this direction, but more sophisticated approaches will likely emerge as the field matures. This is an area where developers have a responsibility to think carefully about the implications of the systems they’re building and to use the tools available to ensure that their agents behave responsibly.

Practical Steps for Developers Getting Started

For developers looking to take advantage of these new tools and capabilities, there are several practical steps to get started. First, familiarize yourself with the Apps SDK and Agent Kit documentation. These tools are designed to be accessible, but they do require some learning. Take time to understand the core concepts, work through the tutorials, and build a simple application to get hands-on experience. Second, think carefully about what you want to build. Rather than trying to build the most sophisticated agent possible, start with a clear, well-defined use case. This will make it easier to evaluate whether your agent is working correctly and to iterate on improvements.

Third, invest time in crafting your system prompt. This is where the art of prompting becomes critical. Write a clear, focused prompt that communicates your agent’s role and objectives. Test it against a variety of scenarios and refine it based on the results. Don’t try to make your prompt perfect on the first try—treat it as an iterative process where you continuously improve based on evaluation. Fourth, use the evaluation tools available in Agent Kit to systematically test your agent. Create datasets that cover typical scenarios and edge cases, and use trace grading to understand where your agent succeeds and where it fails. This evaluation process is essential for building agents that work reliably in production.

Finally, engage with the developer community. There are now millions of developers building with these tools, and many of them are sharing their experiences, best practices, and solutions to common problems. Participate in forums, read blog posts, and learn from others’ experiences. The AI development community is still relatively young, and there’s a lot of learning happening in real-time. By engaging with this community, you can accelerate your own learning and contribute to the collective knowledge that will help the entire ecosystem mature.

Conclusion

OpenAI’s DevDay 2025 announcements represent a significant milestone in the evolution of AI development infrastructure. The Apps SDK, Agent Kit, and adoption of MCP collectively create an environment where developers can build sophisticated AI applications without needing to be AI researchers or have deep expertise in machine learning. The inversion of the AI integration model—from chatbot-in-application to application-in-ChatGPT—opens new possibilities for how AI can be distributed and accessed. Most importantly, the recognition that prompting is more important than ever reflects a fundamental shift in how developers should approach AI development. As AI agents become more autonomous and capable, the quality of the prompts that guide them becomes the primary lever for ensuring they behave effectively and responsibly. For developers building in this space, the combination of powerful tools, clear standards, and a thriving community creates unprecedented opportunities to build valuable AI applications that reach millions of users.

Supercharge Your Workflow with FlowHunt

Experience how FlowHunt automates your AI content and SEO workflows — from research and content generation to publishing and analytics — all in one place.

Frequently asked questions

What is the Model Context Protocol (MCP)?

The Model Context Protocol is an open specification that standardizes how applications provide context to large language models. Think of it like a USB-C port for AI applications—it enables seamless integration between LLM clients and external tools and resources.

How does the Apps SDK differ from previous plugin systems?

The Apps SDK gives developers significantly more control over the user experience compared to earlier plugin systems. Developers can now create custom UI components, preserve their brand identity, and steer the entire experience within ChatGPT, rather than being limited to simple tool calls.

Why is prompting more important than ever for AI agents?

As AI agents become more autonomous and capable of performing complex tasks, the quality of system prompts directly determines agent behavior, reliability, and effectiveness. Clear, well-structured prompts are essential for defining processes, ensuring ethical use, and achieving consistent results.

How many developers are now building with OpenAI tools?

OpenAI reported that 4 million developers are actively building with their platform, up from 3 million the previous year. This growing ecosystem reflects the increasing adoption of AI-powered applications across industries.

Arshia is an AI Workflow Engineer at FlowHunt. With a background in computer science and a passion for AI, he specializes in creating efficient workflows that integrate AI tools into everyday tasks, enhancing productivity and creativity.

Arshia Kahani
Arshia Kahani
AI Workflow Engineer

Build Smarter AI Workflows with FlowHunt

Leverage advanced AI agent capabilities and automation to streamline your development process. FlowHunt integrates seamlessly with modern AI tools and protocols.

Learn more

Magic MCP
Magic MCP

Magic MCP

Integrate FlowHunt with 21st.dev Magic Component Platform (MCP) for AI-powered UI component generation, seamless IDE integration, and real-time access to a mode...

4 min read
AI UI Generation +4