- Many AI pilots fail real-world operations and 95% of GenAI pilots don’t reach production, Salesforce claims
- CRMArena-Pro lets enterprises stress-test their AI agents with digital twins
- Two new benchmarks are used for stress-testing AI agents
Salesforce says enterprises are struggling with their AI pilots failing in real-world operations, and has launched CRMArena-Pro, a new service to allow businesses to create a digital twin of their operations to stress-test AI agents before they get deployed.
The company cited recent MIT research which found 95% of generative AI pilots don’t even reach the production stage.
CRMArena-Pro evaluates AI agents on real tasks, like customer service, sales forecasting and supply chain disruptions, but using synthetic data that’s been validated by experts.
Salesforce lets you stress-test AI agents using digital twins
“CRMArena-Pro creates a rigorous, context-rich simulated enterprise environment framework with synthetic data, where it can safely evaluate API calls to relevant systems, as well as the ability to safeguard PII data,” the company wrote in an announcement.
By adding real-world noise into the test environment, CRMArena-Pro can better evaluate performance, strengthen resilience and bridge the gap between pre- and post-deployment.
“The result is AI agents that are capable, consistent, trustworthy, and agentic enterprise-ready.”
Companies can also see how AI agents handle real-world challenges like messy data, legacy systems and complex workflows.
Salesforce noted part of the complexity comes from the vast array of models available to choose today, and knowing which specific model or combination of models to use isn’t so simple.
To that tune, the company has published two new benchmarks to measure agent performance: MCP-Eval for evaluation through synthetic tasks and MCP-Universe, which adds real-world tasks and execution-based evaluators to stress-test agents in complex scenarios.
In a previous post, Salesforce noted that CRMArena-Pro “lays the groundwork for the next frontier: Enterprise General Intelligence” – and for now, users can expect “safe, capable and impactful” AI for all organizations.
You might also like
- We’ve listed the best AI tools and best AI writers
- Salesforce unveils Agentforce 3, its smartest agent platform yet
- Give your workers a helpful boost with the best productivity tools
Leave A Comment