How the Record-Breaking, Cloud Native AI Supercomputer Was Built - Peter Salanki, CoreWeave

Ғылым және технология

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in Paris from March 19-22, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at kubecon.io
How the Record-Breaking, Cloud Native AI Supercomputer Was Built - Peter Salanki, CoreWeave
MLCommons released the latest MLPerfs in June, announcing a new record for AI performance by a Supercomputer running on Kubernetes. In this session, we'll cover what these benchmarks mean for the AI/ML industry and how CoreWeave and NVIDIA worked together to achieve this world-record breaking result. Software and hardware engineers will discuss:
- How leveraging Kubernetes and other CNCF technologies helped build massive GPU clusters for generative AI at breakneck speed
- How the team leveraged Argo Workflows to automate health checks, testing, and lifecycle management
- How Prometheus, Grafana, Mimir and Loki is used to track bare metal and network health & performance
- Learnings from running a record-breaking MLPerf submission on Kubernetes with Slurm on Kubernetes

Пікірлер

    Келесі