Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.

The talk focuses on the development and management of an AI as a service platform by SS&C Technologies, which supports the real-time and batch processing of GPU workloads in a financial sector setting.

Key Points:

AI Cloud as a Service: The platform is designed for efficient processing using GPU resources within SS&C's data center, addressing both economic and technology aspects .
Infrastructure: The platform includes components like Kubernetes, Kafka, and NiFi, and supports AI-related services such as RAG (retrieval-augmented generation) and language model inferences .
Business Needs: The system was crafted to meet specific SLAs while minimizing GPU costs, managing real-time and batch ingestion for diverse use cases.

System Implementation:

The platform supports around 80 GPUs across multiple regions and zones, accommodating over 1,000 use cases and 3,000 users.
Emphasizes on virtual queuing and rate limiting to efficiently utilize GPU resources, ensuring production traffic has priority access over development traffic.
Employs Lewis scripts for atomic operations and managing back pressure in system workloads.

Challenges and Solutions:

Handling Overload: Utilizes OpenAI's API and an async LLM engine to manage incoming job requests and prevent system overload.
Time Optimization: Implements scheduling windows to optimize workloads during periods of low GPU utilization.

Future Directions:

Continue enhancing system virtualization to further increase efficiency and reliability of GPU workload processing.
Develop comprehensive governance policies to ensure compliance and secure usage of AI technologies.

This is the end of the AI-generated content.

Abstract

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc. A year ago we broke ground and went live with AI as a service providing RAG, inference for embeddings, LLM text, image and voice and we needed an efficient and low TCO platform to power the needs of the business. Our centralized AI Gateway has a prioritized job scheduler that we wrote and we will discuss how over 300 production use cases run workloads in a way that provide the SLAs for the demands required while keeping the GPU costs down. We also run on AWS around the globe and will discuss how the platform works in a multi cloud environment also keeping costs down in different ways in AWS while also meeting SLAs.

Speaker

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Joe Stein is an Architect, Developer and Security Professional with over 25 years of experience. He has worked on production environments (mostly running Apache Kafka at the core also most often within a containerized environment) at Bloomberg, Verizon, EMC, CrowdStrike, Cisco, Bridgewater Associates, MUFG Union Bank and US Bank. He was also an Apache Kafka Committer and PMC member from Jan 2012- Aug 2016. Currently he is the Principal Architect of Research & Development at SS&C Technologies.

Find Joseph Stein at:

From the same track

Session Capacity Planning

How Netflix Shapes our Fleet for Efficiency and Reliability

Wednesday Nov 19 / 11:45AM PST

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Session Architecture

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Wednesday Nov 19 / 02:45PM PST

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits.

Dumanshu Goyal

Uber Technical Lead @Airbnb Powering $11B Transactions, Formerly @Google and @AWS

Session AI/ML

Producing the World's Cheapest Tokens: A How-to Guide

Wednesday Nov 19 / 10:35AM PST

AI inference is expensive, but it doesn’t have to be. In this talk, we’ll break down how to systematically drive down the cost per token across different types of AI workloads.

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Session Platform Engineering

Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Wednesday Nov 19 / 03:55PM PST

As companies grow, so does the complexity of keeping distributed systems in sync. At DoorDash, we tackled this challenge while building a high-throughput, domain-oriented data platform for capturing changes across hundreds of services.

Vinay Chella

Engineering Leader @DoorDash - Specializing in Distributed Systems, Streaming & Storage Platforms, Apache Cassandra Committer, Previously Engineering Leader @Netflix

Akshat Goel

Staff Software Engineer, Core Infra at @DoorDash, Previously Senior Software Engineer @Amazon

Realtime and Batch Processing of GPU Workloads

Summary

Abstract

Speaker

Joseph Stein

Find Joseph Stein at:

Speaker

Joseph Stein

Date

Location

Track

Topics

Share

From the same track

How Netflix Shapes our Fleet for Efficiency and Reliability

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Producing the World's Cheapest Tokens: A How-to Guide

Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Follow QCon

Contact

Menu

Conferences around the World