h1

Short-Take: VMware View, What’s Up with PCoIP?

March 21, 2011

Isn’t it time you looked at what VMware View and PCoIP have to offer? Now that there is a server off-load card supporting View PCoIP virtual machines, the overhead of display processing opens-up opportunities for denser View servers (or does it?) Here’s what VMware says about PCoIP in the “VMware View Architecture Planning Guide, View 4.6”

VMware View with PCoIP
PCoIP is a new high-performance remote display protocol provided by VMware. This protocol is available for View desktops that are sourced from virtual machines, Teradici clients, and physical machines that have Teradici-enabled host cards.

PCoIP can compensate for an increase in latency or a reduction in bandwidth, to ensure that end users can remain productive regardless of network conditions. PCoIP is optimized for delivery of images, audio, and video content for a wide range of users on the LAN or across the WAN. PCoIP provides the following features:

  • You can use up to 4 monitors and adjust the resolution for each monitor separately, up to 2560 x 1600 resolution per display.
  • You can copy and paste text between the local system and the View desktop, but you cannot copy and paste system objects such as folders and files between systems.
  • PCoIP supports 32-bit color.
  • PCoIP supports 128-bit encryption.
  • PCoIP supports Advanced Encryption Standard (AES) encryption, which is turned on by default.
  • For users outside the corporate firewall, you can use this protocol with your company’s virtual private network or with View security servers.
  • MMR is not supported on Windows 7 clients or virtual desktops.
    • Although MMR is not supported on Windows 7 virtual desktops, if the Windows 7 desktop has 1GB of
      RAM and 2 virtual CPUs, you can use PCoIP to play 480p- and 720p-formatted videos at native resolutions.
      For 1080p, you might need to make the window smaller than full screen size.

If you use PCoIP, the display protocol from VMware, you can adjust the display resolution and rotation
separately for each monitor. PCoIP allows a true multiple-monitor session rather than a span mode session.

  • The maximum number of monitors that you can use to display a View desktop is 10 if you use the RDP display protocol and 4 if you use PCoIP.

RAM Sizing for Specific Monitor Configurations When Using PCoIP
If you use PCoIP, the display protocol from VMware, the amount of extra RAM that the ESX host requires depends in part on the number of monitors configured for end users and on the display resolution. Table 4-1 lists the amount of overhead RAM required for various configurations. The amounts of memory listed in the columns are in addition to the amount of memory required for other PCoIP functionality.

RAM sizing for Multi-Monitor PCoIP sessions

When you consider these requirements, note that virtual machine configuration of allocated RAM does not change. That is, you do not need to allocate 1GB of RAM for applications and another 31MB for dual 1080p monitors. Instead, consider the overhead RAM when calculating the total physical RAM required for each ESX server. Add the guest operating system RAM to the overhead RAM and multiply by the number of virtual machines.

  • Software developers or other power uses with high-performance needs might have much higher CPU requirements than knowledge workers and task workers. Dual virtual CPUs are recommended for compute-intensive tasks or for Windows 7 desktops that need to play 720p video using the PCoIP display protocol.

Maximum Connections for View Connection Server
Table 4-7 provides information about the tested limits regarding the number of simultaneous connections that a VMware View deployment can accommodate.

This example assumes that you are using VMware View with vSphere 4.1 and vCenter Server 4.1. It also assumes that View Connection Server is running on a 64-bit Windows Server 2008 R2 Enterprise operating system.

Maximum Connections for View Connection Server

PCoIP Secure Gateway connections are required if you use security servers for PCoIP connections from outside the corporate network. Tunnelled connections are required if you use security servers for RDP connections from outside the corporate network and for USB and multimedia redirection (MMR) acceleration with a PCoIP Secure Gateway connection.

Network Bandwidth Considerations
For display traffic, many elements can affect network bandwidth, such as protocol used, monitor resolution and configuration, and the amount of multimedia content in the workload. Concurrent launches of streamed applications can also cause usage spikes.

Because the effects of these issues can vary widely, many companies monitor bandwidth consumption as part of a pilot project. As a starting point for a pilot, plan for 150 to 200Kbps of capacity for a typical knowledge worker.

With the PCoIP display protocol, if you have an enterprise LAN with 100Mb or a 1Gb switched network, your end users can expect excellent performance under the following conditions:

  • Two monitors (1920×1080)
  • Heavy use of Microsoft Office applications
  • Heavy use of Flash-embedded Web browsing
  • Frequent use of multimedia with limited use of full screen mode
  • Frequent use of USB-based peripherals
  • Network-based printing

This information was excerpted from the information guide called PCoIP Display Protocol: Information and Scenario-Based Network Sizing Guide.

WAN Support and PCoIP
For wide-area networks (WANs), you must consider bandwidth constraints and latency issues. The PCoIP display protocol provided by VMware adapts to varying latency and bandwidth conditions.
If you use the RDP display protocol, you must have a WAN optimization product to accelerate applications for users in branch offices or small offices. With PCoIP, many WAN optimization techniques are built into the base protocol.

  • WAN optimization is valuable for TCP-based protocols such as RDP because these protocols require many handshakes between client and server. The latency  of these handshakes can be quite large. WAN accelerators spoof replies to handshakes so that the latency of the network is hidden from the protocol. Because  PCoIP is UDP-based, this form of WAN acceleration is unnecessary.
  • WAN accelerators also compress network traffic between client and server, but this compression is usually limited to 2:1 compression ratios. PCoIP is able to  provide compression ratios of up to 100:1 for images and audio.

The following examples show how PCoIP can be expected to perform in various WAN scenarios:

Work from home
A user with a dedicated cable or DSL connection with 4-8MB download and less than 300ms latency can expect excellent performance under the following conditions:

  • Two monitors (1920×1080)
  • Microsoft Office applications
  • Light use of Flash-embedded Web browsing
  • Periodic use of multimedia
  • Light printing with a locally connected USB printer

Mobile user
A user with a dedicated 3G connection with 5-500Kb download and less than 300ms latency can expect adequate bandwidth and tolerable latency under the following conditions:

  • Single monitor
  • Microsoft Office applications
  • Light use of Flash-embedded Web browsing
  • Light printing with a locally connected USB printer

Encourage mobile users to use local applications to access multimedia content.

Branch or remote office
Plan for 3 concurrent active users per 1Mb of bandwidth. Users at an office that has a 20Mb dedicated site-to-site UDP-based VPN with less than 200ms latency can expect acceptable performance under the following conditions:

  • Two monitors (1920×1080)
  • Microsoft Office applications
  • Light use of Flash-embedded Web browsing
  • Light printing with a locally connected USB printer

This information was excerpted from the information guide called PCoIP Display Protocol: Information and Scenario-Based Network Sizing Guide.
For information about setting up VPNs for using PCoIP, see the following solutions overviews, available on the VMware Web site:

  • VMware View and Juniper Networks SA Servers SSL VPN Solution
  • VMware View and F5 BIG-IP SSL VPN Solution
  • VMware View and Cisco Adaptive Security Appliances (ASA) SSL VPN Solution

Client Connections Using the PCoIP Secure Gateway
When clients connect to a View desktop with the PCoIP display protocol from VMware, View Client can make a second connection to the PCoIP Secure Gateway component on a View Connection Server instance or a security server. This connection provides the required level of security and connectivity when accessing View desktops from the Internet.

As of View 4.6, security servers include a PCoIP Secure Gateway component. The PCoIP Secure Gateway connection offers the following advantages:

  • The only remote desktop traffic that can enter the corporate data center is traffic on behalf of a strongly authenticated user.
  • Users can access only the desktop resources that they are authorized to access.
  • This connection supports PCoIP, which is an advanced remote desktop protocol that makes more efficient use of the network by encapsulating video display packets in UDP instead of TCP.
  • PCoIP is secured by AES-128 encryption.
  • No VPN is required, as long as PCoIP is not blocked by any networking component. For example, someone trying to access their View desktop from inside a hotel room might find that the proxy the hotel uses is not configured to allow inbound traffic on TCP port 4172 and both inbound and outbound traffic on UDP port 4172.
    For more information, see “Firewall Rules for DMZ-Based Security Servers,” on page 60.

Security servers with PCoIP support run on Windows Server 2008 R2 and take full advantage of the 64-bit architecture. This security server can also take advantage of Intel processors that support AES New Instructions (AESNI) for highly optimized PCoIP encryption and decryption performance.

Tunneled Client Connections with Microsoft RDP
When users connect to a View desktop with the Microsoft RDP display protocol, View Client can make a second HTTPS connection to the View Connection Server host. This connection is called the tunnel connection because it provides a tunnel for carrying RDP data.

Clients that use the PCoIP display protocol can use the tunnel connection for USB redirection and multimedia redirection (MMR) acceleration, but for all other data, PCoIP uses the PCoIP Secure Gateway on a security server.

Clients that use [only] the PCoIP or HP RGS display protocols do not use the tunnel connection.

Direct Client Connections
Administrators can configure View Connection Server settings so that View desktop sessions are established directly between the client system and the View desktop virtual machine, bypassing the View Connection Server host. This type of connection is called a direct client connection.

With direct client connections, an HTTPS connection can still be made between the client and the View Connection Server host for users to authenticate and select View desktops, but the second HTTPS connection (the tunnel connection) is not used.

Direct PCoIP connections include the following built-in security features:

  • PCoIP supports Advanced Encryption Standard (AES) encryption, which is turned on by default.
  • The hardware implementation of PCoIP uses both AES and IP Security (IPsec).
  • PCoIP works with third-party VPN clients.

Front-End Firewall Rules
To allow external client devices to connect to a security server within the DMZ, the front-end firewall must allow traffic on certain TCP and UDP ports. Table 5-1 summarizes the front-end firewall rules.

Front-end Firewall Rules, Ports Needed

Back-End Firewall Rules
To allow a security server to communicate with each View Connection Server instance that resides within the internal network, the back-end firewall must allow inbound traffic on certain TCP ports. Behind the back-end firewall, internal firewalls must be similarly configured to allow View desktops and View Connection Server instances to communicate with each other. Table 5-2 summarizes the back-end firewall rules.

Back-end Firewall Rules, Ports Needed

Understanding VMware View Communications Protocols
VMware View components exchange messages by using several different protocols.
Figure 5-5 illustrates the protocols that each component uses for communication when a security server is not configured. That is, the secure tunnel for RDP and the PCoIP secure gateway are not turned on. This configuration might be used in a typical LAN deployment.

View Components and Protocols when using Security Server

 

View Components and Protocols when not using Security Server

PCoIP Secure Gateway
As of View 4.6, security servers include a PCoIP Secure Gateway component. When the PCoIP Secure Gateway is enabled, after authentication, View clients that use PCoIP can make another secure connection to a security server. This connection allows remote clients to access View desktops from the Internet.

When you enable the PCoIP Secure Gateway component, PCoIP traffic is forwarded by a security server to View desktops. If clients that use PCoIP also use the USB redirection feature or multimedia redirection (MMR) acceleration, you can enable the View Secure Gateway component in order to forward that data.

When you configure direct client connections, PCoIP traffic and other traffic goes directly from a View client to a View desktop.

When end users such as home or mobile workers access desktops from the Internet, security servers provide the required level of security and connectivity so that a VPN connection is not necessary. The PCoIP Secure Gateway component ensures that the only remote desktop traffic that can enter the corporate data center is
traffic on behalf of a strongly authenticated user. End users can access only the desktop resources that they are authorized to access.

– Architecture Planning Guide, Pages 17, 21, 32-33, 41, 45-46, 50-51, 60-64

Okay, that’s a lot of stuff and it says nothing about performance tuning PCoIP for WAN/3G environments. For that, we need to review the PCoIP Zero Client to VMware View 4 Performance Optimization and LAN/WAN Optimization Guides from Teradici. I’ll follow-up with relevant excerpts from that in a couple of days.

While some contend that the PCoIP protocol is hard on VMware ESX servers, a product is on the horizon that allows “up to 2X consolidation” ratios over current server-based PCoIP encoding. This “magic bullet” from Teradici comes in the form of a PCIe x8 “PCoIP off-load” card that has support “already baked into” VMware View 4.5+ – all that’s needed is the card and enablement from View Manager.

PCoIP Server Off-Load Card with View Enablement (inset)

Teradici’s off-load card offers some impressive claims, but it’s 4.736″ x 6.6″ (full-height) form factor will make it unfriendly in dense, multi-node boxes popular with small enterprise. This card looks ideal for the “in-a-box” deployments making the rounds (hypervisor with VSA) if it is able to play nice with others. Teradici quotes the following specs:

  • Based on TERA2800 processor
  • ESX/ESXi 4+, VMware View 4.5+
  • 2GB on-board DDR3 SDRAM w/ECC
  • Up to 32 displays at 2560×1600 resolution per card
  • Up to 64 displays at 1920×1200 resolution per card
  • Additional cards per server supported for more displays
  • Matched with PCoIP Zero Clients for highest possible performance

SOLORI’s Take: Depending on the unit cost, this could be a game changer enabling more “thread anaemic” (but incredibly cheap and power-smart) processors like AMD’s 4100 series to drive significant workstation loads. Unfortunately, there are no systems (I know of) capable of exploiting the compute/power density of that combination. Given the systems targeted for the 4100 series, we’ll likely see no dense-node candidates, and here’s why (math to follow).

Let’s look at the math and see what combinations make sense, eh? Given the 64 display number (taking 1920×1200 for the baseline) and an average display/VM quotient of 1.2, that results in roughly 54 VMs/card (rounding up). Given VMware View’s 16 VM/core target, the math indicates a single quad-core processor.

In reality, you would likely allow for 1-2 cores for network and storage handling, and since the VM/core target is an average, more cores per chassis will help that target be realized (i.e. same ratio but with 8-12 cores.) That means a modern 2P system with 8-12 cores and two Teradici off-load cards would be the perfect combination for PCoIP/View nirvana. That calculus also means that a 24-core system would require four off-load slots (and 32 PCI Express lanes) to pull-off a match (384 VMs).

Chart of PCoIP off-load cards to CPU cores, 16 VM/core and 1 vCPU per View VM - 20% VMs are dual-display (one core reserved for system use)

Chart of PCoIP off-load cards per to CPU cores, 1 vCPU in 75% of VMs, 2 vCPU in 25% - 20% VMs are dual-display (one core reserved for system use)

Editor’s note: These tables are NOT recommended usage profiles as Andre rightly points out in the comments below. The assumptions that enable 16:1 vCPU-to-core ratios breaks down long before the 1:1 ratio (of VM displays to off-load queues) can be achieved. There is no “generic” ratio that can be used to “predict” consolidation effectiveness without performing due diligence on target workload characteristics, and the design use of the TERA2800 is not targeted at 1:1 scenarios. The charted numbers are meaningless outside the (unlikely) context where PCoIP overhead is THE limiting performance factor for a set of View pools.

In a view environment, storage bottlenecks can kill a project. Therefore – absent a 10G iSCSI or FCoE environment – it’s unlikely to see View architectures that can play the density game without local storage “accelerators” to off-load the demand. This is especially true for a smaller deployment (i.e. ROBO). Getting the right hardware combination will be a challenge – even with only two Teradici cards per chassis. Would be fun trying though, and certainly worth a deeper look in the future…

[Update: updated chart to disclose one core reservation for system use in estimate. Add link to PCoIP Secure Gateway announcement. Address Andre’s comments and clarify the context of the charted examples.]

2 comments

  1. Yout calulcations are not ccorrect. Those cards were never intended for a 1:1 ratio (display:VM). The idea is that all VMs will share the TERA2800 processor when required. It’s easier to understand if you imagine a pool of processors that is assigned and unassigned to VMs as needed.

    If you read the Teradici’s PCoIP Offload Card that you mentioned in your blog post you will see the following:

    “By constantly monitoring the CPU and graphic encoding demands of each virtual machine, the Server Offload Card dynamically and seamlessly offloads the most demanding PCoIP protocol image encoding tasks from the CPU.

    The Server Offload Card automatically determines – in real time – which displays will benefit the most from hardware acceleration. The transitions to and from the CPU and Server Offload Card happen instantly and transparently, protecting the users’ experience even as loads change.

    This optimal use of resources ensures that the graphical demands of each virtual machine are met and that performance scales as new virtual machines are added.”

    Like


    • Andre:

      Great point and an important distinction with respect to display to VM ratios! It is important to look at the use case, but in principle looking at the card’s capabilities as a pooled resource of display off-load queues is important for deployment planning. However, while does make the upper limit much more unbounded, it doesn’t change the math for the lower bounds case it.

      Obviously, the priority mechanism for the off-load-enabled pool determines which VM from a competing pool may grab the available off-load processor queues. As you (and Teradici) point out, the driver determines which portion of the display needs to be off-loaded – on the fly. When pool demand outstrips the card’s potential, we’re back to the familiar fail-back condition of just consuming system CPU resources… Note that this same fail-back mechanism allows for VMs to reside on pool members that do not contain off-load cards.

      And this gets us back to my point: the “1:1” scenario where a pool (or set of pools) begins to degrade the end user experience (i.e. noticeable drag on system CPU) solely due to PCoIP overhead. For my purposes I’m thinking of an educational scenario where management and maintenance costs may drive a View adoption strategy. In “computer lab” or “1 PC per child” initiatives, it’s a common scenario to have all students in a pool running the same set of applications and they tend to have multi-media components (say “fun and engaging” please.)

      Take a look at the chart: even at 1:1 display off-load a single 2P/8C system is shown to be driving 112 “full experience” VMs. While TERA2800 is handling PCoIP protocols, it is NOT handling the STANDARD display protocols (let alone local multi-media off-load). This means that a high-demand, “full experience” pool is more likely to be CPU-bounded than display bounded – even with the TERA2800.

      I especially like that Teradici’s claims are bounded by an existing deployment’s CPU utilization being “reduced by 30-50%” when using the TERA2800 versus no off-load. This quote established two things: 1) that PCoIP can potentially be a significant source of “drag” on ESX CPU resources, and 2) that this drag represents 30-50% of lost system capacity.

      Now back to your point: the TERA2800 solution is NOT meant to be deployed in a 1:1 display ratio. This aligns well to the idea that vCPUs are not allocated in a 1:1 core ratio. In a “typical and hypothetical” use case scenario where mixed workloads are distributed across a pool of (no more than 8) ESX hosts, many View VMs will be effectively “idle” in both CPU demand and PCoIP processing demand.

      In my “real world” 75:25 example, the 2P system might be expected to support upwards of 90 VMs (core bound) requiring a pair of TERA2800 cards assuming the 1:1. I maintain that in a use case where 1:1 off-load might be appropriate, CPU resources would be consumed at a much greater rate – driving effective consolidation ratios down much quicker than PCoIP overhead does.

      In more practical applications – say a conservative 2:1 ratio of VM-to-display queue – a modern 2P system would likely never approach the capacity of a single TERA2800 card to its full capacity. That is to say, the off-load card is really a way to achieve/protect design consolidation ratios: not expand them.

      As in all things virtualization, use case will be the driving consolidation factor. For me, I would like to see the same facilities that allow the driver to “intelligently” determine the off-load “appropriateness” of the PCoIP demand leveraged to predict the effectiveness of adding a TERA2800 (i.e. quantify the PCoIP overhead in a View pool).

      Like



Comments are closed.

%d bloggers like this: