Now on third generation technology, the ability to deliver a great graphic experience in virtual desktops and virtual (published) applications is something becoming more common but the three primary hypervisors (Citrix XenServer, Microsoft Hyper-V and the segment leader, VMware) do not all have the same capabilities.
Hyper-V is Just Passing Through
With Microsoft’s server virtualization platform, Hyper-V, you can only do Passthrough Mode which Microsoft has confusingly (or maybe purposely) called vGPU which is completely different than NVIDIA's vGPU (which actually allows you to virtually carve up and share one or more graphics cards and the physical GPU’s on them between multiple VMs).
Citrix XenServer and VMware ESX both support both NVIDIA vGPU and passthrough modes.
What is Passthrough Mode?
Passthrough mode, supported by Microsoft Hyper-V, Citrix XenServer and VMware ESX simply hands off the entire GPU to the hypervisor for the benefit of one VM. No carving or segmentation possible. The good news is that if you are on the virtual machine (VM) that happens to be the benefactor of this passthrough, you will benefit some or all of the GPU’s processing capability and memory. If you are on any other VM on that server, your experience will not be as good.
This mode is more common for Citrix XenApp environments where a few VMs will be shared by many users.
Benefit: If an application is super GPU core or GPU memory intensive, it still allows the workload to be virtualized and hosted centrally.
What is vGPU?
vGPU mode is supported on Citrix XenServer and VMware ESX only. For all of its cosmic power, Microsoft Hyper-V does not support this.
vGPU gives each VM a dedicated configurable portion of video RAM. The vGPU’s do not share RAM in any way. The video RAM on the card is divided up with each portion dedicated to a particular VM.
What is a CUDA core?
CUDA, which originally stood for Compute Unified Device Architecture, is effectively a computing language created by NVIDIA with an API to both simplify and enable the computing power of graphic processors to be leveraged for graphic and non-graphical applications.
CUDA cores (not physical cores) are time-sliced similar to how a CPU is time-sliced on a hypervisor. So if an NVIDIA GRID GPU has 768 GPU cores, each virtual desktop gets all 768 cores for a split second, and then the next virtual desktop gets them for a split second, and so on. This is more common in Citrix XenDesktop environments where you are trying to get 1:1 VMs with higher density of VMs/card.
Benefit: This allows GPUs to be split up into more VM instances and shared more efficiently so more people/workloads can benefit from having access to vGPU’s. These splits, determining how much vGPU should be allocated to each individual, can be configured a variety of ways. The NVIDIA GRID vGPU user guide (pages 4-7) provides more detail on this topic.
Drawback: If every user allocated to the NVIDIA GRID card was simultaneously hammering away at it, they would start to feel a bit of delay. The maximums of what each instance can use are dictated by the profile assigned to each Physical GPU. NVIDIA incorporates a scheduler which can introduce a small overall "hit" on performance (again, think CPU sharing in a hypervisor for a good analogy).
Field experience and real live production implementations demonstrate that NVIDIA handles resource scheduling really well and it is rare to find an environment where everyone is simultaneously rendering something and just staring at the screen for the time sharing to become a real issue.
From simply providing a better computing experience for regular office workers using virtual desktops to delivering stunning visual displays with ridiculous polygon counts for engineers and designers using Bentley, AutoDesk Revit, McNeal Rhino and Adobe Creative Suite among other graphically demanding applications, GPU makes desktop and application virtualization better.
Virtual desktops and applications just feel better when underpinned with GPU.
images courtesy of NVIDIA