


JAN
22
Thu, 22 Jan
Online
0
days
0
hours
0
min
0
sec



How do you choose the right serving strategy for your model? This presentation seeks not to just introduce Dynamo to the audience but to take a practical outlook by walking through techniques and real-world scenarios in serving AI workloads in production such as disaggregated serving, optimizing deployments against memory-bound bottlenecks.
Key learning objectives: Attendees will have an intuitive understanding of how Dynamo addresses memory-bound operations to get the best performance out of GPUs. They will understand the different architectural patterns in serving AI workloads in distributed environments and a framework to determine the choice of deployment based on technical and business constraints.
Target level: Beginner - Advanced







