Artificial Intelligence – Cloud Architecture Blog

An AI journey continues – configure the scheduler

Now that the infrastructure has been deployed (software defined network, OKE, H100, storage, etc) it was now time to configure the scheduler (run:ai). The first question posed, post installation, “do we need any special configuration for the network operator in order for the scheduler pods to leverage RDMA?” Would we need Single Root I/O Virtualization…

robbyrobertrobertson1499

May 1, 2025

Artificial Intelligence

ai, cloud, nvidia, technology

An AI journey continues – GPU Deployment!

With our OKE cluster successfully deployed, it was time to start working on the GPU node deployment. Our GPU node/s have a requirement to run Ubuntu 22.04 because of the support for the NVIDIA GPU Operators that are required by the run:ai scheduler. For optimal performance between the GPU worker node instances, we needed to…

robbyrobertrobertson1499

April 28, 2025

Artificial Intelligence, Network

cloud, kubernetes

An AI journey continues – Network design

I wish I had been brilliant enough to have planned out the network deployment without issue but as the saying goes….we live and we learn. Here are some key networking decisions that will need to considered: After running through all of the pre-requisites for the run:ai cluster installation. We made sure that we had an…

robbyrobertrobertson1499

April 28, 2025

Artificial Intelligence, Network

cloud, kubernetes

An AI journey continues – storage

In the last blog entry, we left off with the scheduler and Kubernetes cluster decision in place. Our focus quickly turned to storage options. Since we will have two GPU worker nodes we required a shared storage option. The throughput objective requirement that was provided was 50 Gbps. OCI AI Architecture documentation lists Lustre, BeeGFS,…

robbyrobertrobertson1499

April 28, 2025

Artificial Intelligence, Storage

An AI journey begins – choosing a scheduler

As a veteran of the technology industry, I have experienced the ebbs and flows of the “next big thing”. E-Commerce, blockchain, cloud computing, IoT, edge computing, quantum computing, big data, etc. The current buzz or “next big thing” is Artificial Intelligence (AI). I recently had an opportunity to deploy an AI architecture. I thought I…

robbyrobertrobertson1499

April 28, 2025

Artificial Intelligence

ai, Artificial Intelligence, cloud, kubernetes, technology

Category: Artificial Intelligence

An AI journey continues – configure the scheduler

An AI journey continues – GPU Deployment!

An AI journey continues – Network design

An AI journey continues – storage

An AI journey begins – choosing a scheduler