Nvidia GRID support for AI/ML workloads with GPU services

As modern applications become more prolific, Cloud Providers need to address the increasing customer demand for accelerated computing that typically requires large volumes of multiple, simultaneous computation that can be met with GPU capability.

Cloud Providers can now leverage vSphere support for vGPU on NVIDIA hardware and vMotion from within Cloud Director delivering multi-tenancy GPU services, key in reducing cost requirements for GPU projects. Customers can self serve, manage and monitor their GPU accelerated hosts and virtual machines within Cloud Director. Cloud Providers are able to monitor (through vCloud API and UI dashboard) vGPU allocation, usage per VDC and per VM to optimize utilization and meter/bill (through vCloud API) vGPU usage averaged over a unit of time per tenant for tenant billing.

Cloud Director delivers NVIDIA GRID GPU virtualization though preconfigured NVIDIA ESXi hosts and virtual machines with NVIDIA drivers, using vSphere 7 Update 2 GPU and compatible NVIDIA hardware features whereby the NVIDIA drivers technology virtualizes the GPU hardware allowing multiple virtual machines to share the same vGPU resources. This solution takes advantage of Nvidia MIG (Multi-instance GPU) which achieves multitenancy boundaries between workloads at the physical level inside a single device and is a big deal for multi-tenant environment driving considerably more optimization and margin. Cloud Director is reliant on host pre-configuration for GPU services including NVIDIA GRID deployment/configuration and GPU profiles.

For the initial phase of this release, Cloud Director supports vMotion and High Availability of GPU workloads, and only Flex based tenant Orgs can support GPU Profiles & Policies. GPU will be focused and optimized to support general purpose GPU targeting Machine Learning, Artificial Intelligence and high performance compute optimized for GPU. Cloud Providers can offer vApp Templates pre-configured with all the necessary Nvidia / Quadro drivers, placement policies, GPU Profiles assigned, VM and guest OS enabled for GPU.

Networking services: NSX-T Segment Profiles

To simplify operational onboarding and configuration of essential services Cloud Director now allows system administrators to assign custom NSX-T segment profiles to organization virtual data center networks. Segment profile capabilities include:

  • Spoof Guard: Enable/disable Port Bindings based on an IP or MAC address
  • IP Discovery: Configure ARP and/or DHCP snooping
  • MAC Discovery: Setup MAC Change and MAC Learning rules
  • Segment Security: BPDU and DHCP Filter, Rate Limits, etc.
  • QoS: DSCP (trusted or untrusted), CoS, Bandwidth limitations

This allows providers or tenant admins to apply a profile encapsulating multiple networking components to a network segment. This can save considerable configuration time and using profiles help ensure a consistent approach to networks and security minimizing space for manual configuration error.

Networking services: NSX-T Edge Gateway Rate Limiting

Customers need quality of service to ensure the performance of critical applications where there is limited network capacity. The primary goal of Quality of Service (QoS) is to manage packet loss and reduce latency and jitters on a network connection.

VMware Cloud Director is now able to use a preconfigured QoS Profile at the customer Gateway for both ingress and egress traffic. The QoS profiles can be designed simply and infrequently as an operational configuration or managed service in NSX-T. The more frequent workflow - i.e. assigning these profiles to an NSX-T Data Center Edge Gateway with the bulk of the QoS profile specification occurring, is accomplished in Cloud Director.

Cloud Director provides a simple mechanism for the provider and or tenant to specify the desired QoS profiles when configuring NSX-T Data Center Edge Gateways. The QoS profiles must already exist on the target NSX-T manager configured as a day two operation or a managed service that can done by the provider and offers additional upsell opportunity.

Networking services: vApp move with networking

The move vApp API has been extended to now also move the entire network configuration with the vApp to negate re-configuration post move (apart from the new parent network connection or any new parent environment compatibility issues).

Networking services: New Distributed Routing Toggle for Org VDC

Cloud Director now allows tenants to configure their NSX-T routed organization virtual data center networks as being "distributed" or "not distributed" in much the same way that NSX-V routed organization virtual data center networks could be configured with a "distributed routing" flag. For NSX-T Org VDC Networks that are not distributed, VCD will attach this network to a Tier-1 Service Interface port (i.e., directly to the Service Router component of the Tier-1 gateway).

This delivers the possibility for tenants to use the Edge Gateway firewall capability to control East-West traffic between organization virtual data center networks connected the edge gateway. Prior 10.3.2, that is impossible to do since East-West traffic is always distributed, by-passing the NSX-T Service Router altogether. By forcing all East-West traffic (between networks) through the Edge Gateway, using the more expensive Distributed Firewall (DFW) solution is not necessary.

Automation capability: vRealize Orchestrator update and NSX-T support

Cloud Director now supports vRealize Orchestrator versions 8.6/8.5/8.4 and orchestration support for the Cloud Director REST API schema in 10.3.2.1, this has been compiled with support for latest JRE/JDK for Java v11. For organisations that use vRealize suite this has been a long awaited update enabling them to upgrade to the latest Cloud Director version, but also support has been supplied for basic NSX-T workflows; NSX-T manager, Geneve Network pools, NSX-T backed provider vDC, T0 Gateways, NSX-T backed external networks and NSX-T backed edge gateway.

To find out more about VMware Cloud Director 10.3.2 please head over to the release notes and documentation .

Attachments

  • Original Link
  • Original Document
  • Permalink

Disclaimer

VMware Inc. published this content on 13 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 13 January 2022 16:30:04 UTC.