Crusoe launches Managed Inference, delivering breakthrough speed for production AI

Table of contents

This is some text inside of a div block.

‍SAN FRANCISCO – Nov 20, 2025 – Crusoe, the industry’s first vertically integrated AI infrastructure provider, today announced the general availability of Crusoe Managed Inference, a new service designed to run leading model inference on Crusoe Cloud with ultra-low latency, breakthrough time-to-first-token (TTFT) speed, and resilient scaling. Optimized for the most demanding inference workloads, including large context and long-form text generation, AI developers can use Crusoe Managed Inference to rapidly deploy and automatically scale production-ready models, instantly enabling new capabilities like AI agents and complex task automation.

The new service is powered by Crusoe's proprietary inference engine, the only inference engine with MemoryAlloy technology, a cluster-wide KV cache that eliminates duplicate prefills by allowing GPUs to fetch prefix caches from local and remote nodes instantly. Crusoe MemoryAlloy is a proprietary cluster-native memory fabric that enables persistent sessions, contextual continuity, and seamless scaling across an entire cluster. This results in faster and more cost-effective inference for AI developers.

“Developers today are forced to choose between blazing fast inference speed, throughput, and manageable infrastructure costs – a trade-off that throttles innovation,” said Erwan Menard, SVP of Product, Crusoe. “With Crusoe Managed Inference, we are not just hosting models; we are solving the most complex parts of the inference stack for AI developers. Crusoe MemoryAlloy, our inference engine’s cluster-native memory fabric, allows us to deliver unmatched time-to-first-token and throughput, accelerating our customers’ ability to deliver complex, large-scale AI applications cost-effectively.”

Crusoe Managed Inference is designed for AI developers who need to move from model to production without managing complex infrastructure. The service delivers quantifiable performance gains that directly impact user experience, as well as flexible pricing:

- Breakthrough speed: Achieve up to 9.9x faster TTFT* with our inference engine featuring MemoryAlloy, a cluster-wide KV cache with intelligent routing that eliminates duplicate prefills.

- Superior throughput: Process up to 5x tokens per second* for workloads with frequent prefix re-use and benefit from dynamic batching.

- Seamless scaling: Meet changing workload demands with scaling that is managed for you, offering pay-per-token and provisioned throughput pricing options for production-scale deployments.

*Compared to vLLM for Llama 3.3 70B model; see our blog for more details.

‍

Introducing the Crusoe Intelligence Foundry‍

Crusoe Managed Inference is accessible through the new Crusoe Intelligence Foundry, a unified hub designed to provide AI developers with a fast path to production. The foundry accelerates model discovery and experimentation, allowing users to generate API keys in minutes.

Key features include:

- Leading open-source models: Run the world's top open-source models including Kimi-K2, Llama 3.3 70B Instruct, Gemma 3 12B, Gpt-oss-120b, Qwen3 235B A22B Instruct 2507, DeepSeek V3 0324, and DeepSeek-R1 0528; plus experiment with unique models available exclusively on Crusoe Cloud from cutting-edge labs like Decart.

- Managed endpoints: Fully managed endpoints powered by Crusoe’s proprietary inference engine with MemoryAlloy technology, tuned specifically to each model for maximum optimization.

- Production-scale deployments: Users can monitor performance metrics and enable provisioned throughput for production-scale deployments.

- Unified interface: A single, integrated environment allowing teams to easily switch between the Crusoe Intelligence Foundry for inference tasks and the Crusoe Cloud Console for IaaS resources.‍

‍

Trusted by Customers‍

“Our mission at Wonderful is to enable enterprises to transform their operating model with AI agents that actually work in production. The challenge is always doing that at scale without compromising speed - something which MemoryAlloy tackles. Its cluster-wide KV cache capability uniquely addresses the biggest bottlenecks in large-scale inference,” said Roey Lalazar, co-founder and CTO at Wonderful.ai. “This is the kind of foundational technology that will enable our customers to build and deploy far more powerful and responsive AI agents with confidence.”

“Yutori's Scouts are always-on AI agents that monitor the web; they are powered by in-house models for autonomously navigating websites on a browser. Optimizing for throughput and price is critical for our product experience,” Dhruv Batra, Co-founder and Chief Scientist at Yutori. “We're excited to explore the performance benefits that Crusoe's Inference Engine provides, and are looking forward to serving our models through the service.”

“The demands of clinical deployment in healthcare are unforgiving – we need to process complex records instantly. Crusoe Managed Inference helps us meet that challenge,” said Grant Jensen, Co-Founder & CEO at Oaklet. “It provides a reliable path to production at a pace we haven’t seen on other platforms. This allows us to focus entirely on refining our EHR system, utilizing Crusoe’s breakthrough speed to support clinicians in real-time.”

AI developers can get started today

Crusoe Managed Inference is now available. AI developers can sign up for Crusoe Intelligence Foundry to get started with a library of leading models here.

About Crusoe

As the AI factory company, Crusoe is on a mission to accelerate the abundance of energy and intelligence. The company provides a reliable, scalable, cost-effective, and environmentally friendly solution for AI infrastructure. By harnessing large-scale clean energy, building AI-optimized data centers, and delivering an AI cloud platform, Crusoe empowers its customers to build the future faster.

‍

Crusoe launches Managed Inference, delivering breakthrough speed for production AI

Latest articles

Are you ready to build something amazing?