Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.
The talk focuses on the development and management of an AI as a service platform by SS&C Technologies, which supports the real-time and batch processing of GPU workloads in a financial sector setting.
Key Points:
- AI Cloud as a Service: The platform is designed for efficient processing using GPU resources within SS&C's data center, addressing both economic and technology aspects .
- Infrastructure: The platform includes components like Kubernetes, Kafka, and NiFi, and supports AI-related services such as RAG (retrieval-augmented generation) and language model inferences .
- Business Needs: The system was crafted to meet specific SLAs while minimizing GPU costs, managing real-time and batch ingestion for diverse use cases.
System Implementation:
- The platform supports around 80 GPUs across multiple regions and zones, accommodating over 1,000 use cases and 3,000 users.
- Emphasizes on virtual queuing and rate limiting to efficiently utilize GPU resources, ensuring production traffic has priority access over development traffic.
- Employs Lewis scripts for atomic operations and managing back pressure in system workloads.
Challenges and Solutions:
- Handling Overload: Utilizes OpenAI's API and an async LLM engine to manage incoming job requests and prevent system overload.
- Time Optimization: Implements scheduling windows to optimize workloads during periods of low GPU utilization.
Future Directions:
- Continue enhancing system virtualization to further increase efficiency and reliability of GPU workload processing.
- Develop comprehensive governance policies to ensure compliance and secure usage of AI technologies.
This is the end of the AI-generated content.
Abstract
SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc. A year ago we broke ground and went live with AI as a service providing RAG, inference for embeddings, LLM text, image and voice and we needed an efficient and low TCO platform to power the needs of the business. Our centralized AI Gateway has a prioritized job scheduler that we wrote and we will discuss how over 300 production use cases run workloads in a way that provide the SLAs for the demands required while keeping the GPU costs down. We also run on AWS around the globe and will discuss how the platform works in a multi cloud environment also keeping costs down in different ways in AWS while also meeting SLAs.
Speaker
Joseph Stein
Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member
Joe Stein is an Architect, Developer and Security Professional with over 25 years of experience. He has worked on production environments (mostly running Apache Kafka at the core also most often within a containerized environment) at Bloomberg, Verizon, EMC, CrowdStrike, Cisco, Bridgewater Associates, MUFG Union Bank and US Bank. He was also an Apache Kafka Committer and PMC member from Jan 2012- Aug 2016. Currently he is the Principal Architect of Research & Development at SS&C Technologies.