Red Hat to provide AI inference on AWS

Red Hat Asia Pacific Pty Ltd

By Dylan Bushell-Embling
Friday, 05 December, 2025

Red Hat has expanded its relationship with Amazon Web Services to cover the delivery of enterprise-grade generative AI services on AWS with Red Hat AI.

The collaboration is aimed at helping IT decision-makers run high-performance AI inferencing at scale, regardless of the underlying hardware. Under the agreement, Red Hat AI Inference Server will be enabled to run with AWS AI chips to provide a common interface layer capable of supporting any generative AI model.

Red Hat has also worked with AWS to develop an AWS Neuron operator for Red Hat OpenShift, Red Hat OpenShift AI and Red Hat OpenShift Service on AWS. AWS Neuron operators provide native support for AWS AI chips, and automate the deployment and management of AWS Neuron devices on OpenShift clusters.

Red Hat has also recently released a collection of infrastructure-as-code principles for the red Hat Ansible Automation Platform. The amazon.ai Certified Ansible Collection is designed to support orchestrating AI services on AWS. Meanwhile the two companies are collaborating to optimise an AWS AI chip plug-in up-streamed to support the vLLM open source library, which is used to serve and perform inference with large language models.

Joe Fernandes, VP and GM of Red Hat’s AI Business Unit, said the collaboration is aimed at addressing the evolving needs of organisations as they integrate AI into their hybrid cloud strategies.

“By enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, we’re empowering organisations to deploy and scale AI workloads with enhanced efficiency and flexibility,” he said. “Building on Red Hat’s open source heritage, this collaboration aims to make generative AI more accessible and cost-effective across hybrid cloud environments.”

AWS VP for Annapurna Labs Colin Brace added that today’s enterprises are demanding solutions capable of providing the performance, cost efficiency and operational choice needed for mission-critical AI workloads.

“AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective,” he said. “Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale, combining the flexibility of open source with AWS infrastructure and purpose-built AI accelerators to accelerate time-to-value from pilot to production.”

Image credit: iStock.com/Olemedia