Why the Arrival of DeepSeek Calls for a Serverless Approach to Enterprise AI Architecture

PinIt

By outsourcing AI infrastructure to the cloud provider, serverless inference frees enterprises to focus on the application layer of their AI deployments.

Just two years after ChatGPT first shook the world, DeepSeek’s R1 model has arrived with a technological earthquake of its own. An open-source model rivaling the world’s best closed models at a fraction of the cost, R1 underscores how rapidly progress in generative AI has accelerated and how quickly the industry can be upended.

DeepSeek’s emergence should come as an inflection point for enterprise leaders to pause and re-evaluate their AI adoption strategy. The last two years have seen generative AI quickly become the top digital transformation priority for enterprises across industries. However, DeepSeek shows the risk of investing too deeply in an enterprise AI architecture that can quickly become obsolete.

For example, DeepSeek’s R1 model’s API pricing is about 30 times cheaper than OpenAI’s. The open-source model is also so lightweight that, for simple use cases, it can be run locally on a mobile device without the pricey GPUs that other top-of-the-line models depend on.

This is not necessarily an endorsement of DeepSeek — every enterprise will have different needs, and other models may be more suitable depending on the use case. Rather, it is a cautionary tale against overinvesting in AI infrastructure, particularly GPUs. In contrast, by adopting a serverless approach, in which AI infrastructure is managed by the cloud provider, enterprises will have more flexibility to adapt to new innovations like DeepSeek’s R1.

See also: DeepSeek Explodes on the Scene

Rethinking AI Infrastructure from Training to Inference

When considering AI infrastructure, it’s important to keep in mind that model training and inference have drastically different computing requirements. Model training will no doubt be the most compute-intensive — and costly — stage of the AI lifecycle. AI inference on the other hand, where end-users ultimately interact with trained AI models, will require significantly less compute power but come with greater operational demands for low-latency performance.

Using the same compute infrastructure for both training and inference will unnecessarily inflate the costs of the latter. To optimize costs, enterprises should use high-powered GPU clusters only for training and leverage a serverless approach for inference that scales up or down compute resources as necessary, depending on the workload.

When it comes to model training, AIOps best practices prescribe a centralized operating model with an AI Center of Excellence (CoE). This centralized CoE leverages advanced GPU clusters, typically featuring high-end GPUs like NVIDIA’s H100 or AMD’s MI300X, optimized for the parallel processing demands of training large language models (LLMs) and other complex AI systems. From here, local data scientists can pull trained models from a central repository, fine-tune them on regional data, and deploy them locally for inference at the edge.

Running AI inference at the edge comes with several advantages over a centralized approach to AI inference. The low latency unlocked by inference at the edge allows end-users to gain real-time insights from streaming data. It also improves data governance, protecting sensitive data with local data controls. And with a serverless approach to inference at the edge, compute resources can be scaled up or down depending on the workload to optimize costs.

See also: Enterprise AI Planning Faces Three Crucial Blind Spots

A Serverless Approach to AI Inference at the Edge

Inference at the edge is where the value of a serverless approach really comes into play. Rather than installing costly GPU clusters at every edge data center where AI models are deployed, serverless inference uses only the most efficient compute resources required for the task at hand, optimizing for cost and performance.

By outsourcing AI infrastructure to the cloud provider, serverless inference frees enterprises to focus on the application layer of their AI deployments, where they can apply their specialized expertise to tailor AI applications to their operational needs and desired business outcomes. This approach also prevents enterprises from being locked in with expensive GPUs that may become obsolete with the arrival of new technologies.

The responsibility of keeping up with advances in hardware, incurring the capital expense involved with procuring advanced chips, and choosing the right compute configurations for a given AI workload is delegated to the cloud providers who manage the infrastructure for thousands of AI deployments every day.

Key Considerations for Implementing Serverless Inference

Open AI Ecosystem

When considering cloud providers that offer serverless inference, look for a provider that provides access to an open AI ecosystem. For most enterprises, model training will involve taking a pre-trained open-source model and fine-tuning it on proprietary data. As DeepSeek’s R1 model demonstrates, new open-source models are launched all the time, and the pace of advances in AI technology only seems to be accelerating.

An open AI ecosystem ensures enterprises will have access to the latest open-source models as they come out, mitigating the risk of vendor lock-in with closed AI models. Open-source models like R1 are typically also more cost-efficient than closed models and can be more easily customized for specific use cases.

Data Sovereignty and Governance

Another important consideration is data governance. The real-time insights unlocked by inference at the edge typically depend on a foundation of sensitive or proprietary data. This data is typically protected by strict data governance controls — often as a jurisdictional requirement. Depending on who has access to the inference application, training models directly on this sensitive data could potentially expose it to unauthorized users.

To protect this data, look for a serverless inference provider that offers retrieval-augmented generation (RAG). Rather than fine-tuning models directly with sensitive data, data is held in a secure vector store, from which the model retrieves the data only when prompted by the user.

In addition to preventing data leaks, there are other advantages to keeping sensitive data separate from training data. The fine-tuning costs will be lower, as data can simply be updated in the vector store without retraining the entire model. A RAG approach also makes it easier to deploy models across geographies with varying data sovereignty requirements.

Finally, most real-time insights use cases for AI inference at the edge depend on ultra-low latency processing of streaming data. For this, look for a cloud provider that offers access to Apache Kafka, an open-source streaming data platform ideal for supporting real-time insights with AI inference.

Looking Ahead

As enterprises scale up their AI deployments across their global footprints, it will only become more important to maintain flexibility with their infrastructure. AI technology will no doubt continue to evolve and improve, and enterprises that overinvest in AI infrastructure or align themselves with closed model providers may find themselves locked out of new leaps in efficiency, as evidenced by DeepSeek’s R1.

The need for efficient infrastructure for AI inference at the edge will only magnify as generative AI evolves into agentic AI: autonomous AI agents that leverage LLMs to execute tasks independently. By some estimates, there could soon be more AI agents than people. The agent swarms that will come out of agentic AI will impose a much more significant burden on engineering teams to configure and maintain complex infrastructure wherever agents are needed at the edge.

The only viable infrastructure approach is serverless inference. It offers scalability, cost-efficiency, and agility, enabling enterprises to focus on building robust AI applications that will define their competitive advantage in the AI age.

Kevin Cochrane

About Kevin Cochrane

Kevin Cochrane is the CMO at Vultr. He is a 25+ year pioneer in the digital experience space. Kevin co-founded his first start-up, Interwoven, in 1996, pioneered open source content management at Alfresco in 2006, and built a global leader in digital experience management as CMO of Day Software and later Adobe. Kevin has also held senior executive positions at OpenText, Bloomreach, and SAP. Now at Vultr, Kevin is now working to build Vultr's global brand presence as a leader in the independent Cloud platform market.

Leave a Reply

Your email address will not be published. Required fields are marked *