How to Compare Remoting Protocols

If you want to learn about a good way of comparing remoting protocols such as Microsoft RDP/RemoteFX, Citrix ICA/HDX, VMware/Teradici PCoIP and Quest EOP, this article is for you. As you may already know, Shawn Bass and I have invested substantial time into testing and comparing the most popular Windows remoting protocols in our vendor-independent labs over the last months and years. Presenting our results at major industry events such as Microsoft Tech-Ed, Citrix Synergy, VMware VMworld and BriForum created some buzz in the market. Occasionally, vendors like Citrix and Teradici made some controversial public comments about our findings and conclusions.

Now that Shawn and I are getting ready to start phase 3 of our comparison tests we thought that it’s time to shed some light on the details of our test setup. This will allow you to reproduce our tests in your own environment and it will help us to further optimize our methodology, minimize errors and avoid inaccurate conclusions. We are fully aware of the fact that no test is able to totally reproduce the real world experience and that things may look differently in test environments that are not 100% identical to ours. But still we think that the vendor-independent test results we produce have substantial value for customers and users.

The story behind our remoting protocol comparison project started in 2008 when I thought it’s time to educate BriForum attendees about the different graphics and multimedia formats used in Microsoft Windows and their behavior when used with remote desktop technologies. At this time the session was rather academic and attendees suggested that I beef it up with some real visual test results. At the same time BriForum presenter and virtualization expert Shawn Bass also started looking at remoting protocol details, using this information for his customer projects. So we decided to join forces on that as setting up some good, down-to-earth tests and deriving adequate results was too much work for only one person. In addition, it was clear that every now and then vendors would not be happy about our findings and it is better to have two experts for planning, testing, analyzing and reviewing, always following the four-eye principle. Ever since, Shawn and I were co-working on this, using the results for customer projects, for public presentations and for our expert training classes we are teaching throughout the world.

Our test methodology strictly follows the rules I’ve learned when working in research at CERN in Geneva and at the Fraunhofer Institute for Computer Graphics in Darmstadt between 1988 and 1999. The rules are fairly simple: Know goal and scope of your experiment, document all details of your test setup, never change your methodology within one test phase and make sure that identical results can be reproduced in another, completely independent lab. These rules are generally accepted in research groups, making sure that results are sufficiently accurate and reproducible.

Okay, so let’s start with goal and scope. It is all about installing and comparing various popular virtual desktop products on reference hardware. Instead of judging the results by ourselves, it was our goal to record multiple predefined test sequences on video, allowing viewers of our presentations and training classes to see and compare the results by themselves. If possible, all tests were done with out-of-the-box settings, with no tuning tips applied. I suggested calling our methodology the “ShaBy Method”, but Shawn thought that this is not a good name ;-)

On the hardware side Shawn is using a Whitebox Reference Server with Intel i7-920 2.66 GHz CPU (4 cores, SLAT-enabled), 8GB RAM, NVIDIA Quadro FX 3800 graphics adapter with 1GB video RAM and several identical 500 GB SATA2 disks (7200 rpm). Shawn’s client device is a Viewsonic VOT530 with Intel Core2Duo 2.4GHz CPU, 2GB RAM and a 320GB SATA2 disk.

My server is a Shuttle Barebone with Intel i7-930 2.8 GHz CPU (4 cores, SLAT-enabled), 8GB RAM, NVIDIA Quadro FX 3800 graphics adapter with 1GB video RAM, several identical 500GB SATA2 disks (7200 rpm) and an Intel Pro/1000 PT NIC. As clients I was using a Dell Lattitude D830 laptop, Mobile Core2 Duo T7300 CPU, 4GB RAM, and a Mobile Intel 965 Express graphics adapter in phase 1 and later I changed to an HP Pavilion m8180, Intel Core2 Quad Q6600 2.4 GHz CPU, 8GB RAM, ATI FirePro v5800 graphics adapter with 1 GB video RAM and a 500GB SATA2 disk (7200 rpm) for phase 2.

Even though the hardware setup looks differently, both Shawn and I were using the identical graphics hardware and driver combination on the server side. This allowed us to reproduce test results in a second lab and see if they were in the expected range or showed the same user experience. But all side-by-side visual comparisons between individual tests of any given scenario were only done with sequences recorded in the same environment.

On the network side we used dedicated 100Mbit/s or 1Gbit/s networks for LAN tests. We also introduced a way to limit network bandwidth and add latency to emulate wide-area network conditions. In phase 1, we decided to use Shunra VE Desktop Pro to limit the bandwidth to 2Mbit/s (6Mbit/s for HD video) and set latency to 50ms or 200ms round trip time. Wireshark allowed us to do some network monitoring during the tests. In phase 2 we started using the Apposite Linktropy Mini2 appliance for WAN emulation and monitoring. For more details on this device, see http://www.apposite-tech.com/products/mini2.html.

For the tests, we installed the virtual desktop products with their respective remoting protocols one after the other on the server and on the client, using brand new physical disks if necessary. For this purpose we bought stacks of identical 500GB SATA2 disks, allowing us to install and preserve various versions of Microsoft Remote Desktop Session Host (RDP, RemoteFX), Microsoft Remote Desktop Virtualization Host on Hyper-V (RDP, RemoteFX), Citrix XenDesktop on XenServer (ICA/HDX), VMware View on vSphere (PCoIP), Quest vWorkspace on Hyper-V (EOP), Ericom PowerTerm WebConnect on Hyper-V (BLAZE) and HP Remote Graphics on physical hardware (RGS).

On the server, we installed publicly available applications or media files for testing the different graphics and multimedia formats Graphics Device Interface (GDI), WMV, Quicktime, Direct3D, OpenGL, Windows Presentation Foundation, Silverlight and Flash within a remote user session. For each application or media format we created an AutoIt script simulating predefined user interactivity, like scrolling through a document or starting media playback.

After logging on to a user session from the client device, an individual test sequence was started by launching the AutoIt script assigned to a graphics or media format. We did this for all remote desktop products with their remoting protocols and for all media formats – GDI with Wordpad, GDI with PDF, WMV video, HD WMV video, Quicktime video, Flash demo, HD Flash demo, Silverlight, DirectX 9 Rollercoaster, DirectX 9 Google Earth, DirectX 10, OpenGL Software Rendering, OpenGL Hardware Rendering and Windows Presentation Foundation. Running each test scenario with multiple network bandwidth and latency settings resulted in a repository of more than 1,500 raw videos during phase 2!

But how did we record these videos? We decided not to install recording software on the client as this may have influenced the test scenario performance, producing inaccurate results. Instead we were feeding the monitor output signal of the test client device into an Epiphan DVI2USB Solo Frame Grabber connected to a dedicated PC used for video recording. The Epiphan box converts the DVI output signal into a standard USB video data stream. You can find more details about the Epiphan device on http://www.epiphan.com/products/dvi-frame-grabbers/dvi2usb-solo/.

We recorded a full screen raw video with the Epiphan software and the Microsoft MPEG4 v2 codec for each test sequence, like “GDI Wordpad on Citrix XenDesktop 5.5 and XenServer 6.0 at 2Mbit/s bandwidth with 200ms latency and 0.01% packet loss”. In phase 1 each raw video was 30 seconds at a resolution of 1280×720 and 15-20 frames per second. In phase 2 we recorded 45 second videos at a resolution of 1024×768 and 15-20 frames per second.

All raw videos were re-encoded at quarter size (640×480 or 512×384) to ensure that resolution, data rate and encoding parameters are identical and as needed for the visual side-by-side comparison at a later stage. The video post processing was done with Microsoft Expression Encoder 3, using the VC-1 Advanced Windows Media Format at a fixed bitrate of 1045 Kbps and a key frame interval of 5 seconds. Finally, predefined sets of four re-encoded videos were imported into a Microsoft Expression Blend 3 template project to build a Silverlight application for each four-up side-by-side comparison. The good thing about embedding the videos into Silverlight applications was that they run both on Microsoft Windows and on Apple Mac platforms.

Bottom line: Shawn and I aren’t perfect. We know for certain there are at least two or three inaccuracies in the testing data. This protocol comparison project of ours is a work in progress. We welcome any and all assistance from any of the vendors to tell us where we are wrong – as long as that wrong doesn’t involve tweak this or tune that. And again we want to make clear our opposition to make any one protocol leveraged for a given test case scenario. We only publish what we are seeing during our tests – any final interpretation and conclusion is on the viewers’ side.

What’s up next? After all popular remoting protocols were improving significantly over the last two years, Shawn and I came to the conclusion that it’s time to retire our phase 2 methodology and move on to new things. All protocols a performing great under LAN conditions and performance is also very good for GDI under WAN conditions, so there is no need for further tests here. We thought that it’s better to plan for a phase 3 which may change focus to topics like audio/video synchronization (lip sync), voice and video conferencing, user interaction response times, graphics quality analysis, tuning and tweaking tips (myth busting) or impact of WAN acceleration. So stay tuned, new remoting protocol comparison results will be coming up soon…

2 comments

  1. great idea.

    I’ve done few test like this in the past, however test don’t show the reality of using protocol as a user…
    my reccomendations as follows:

    1. Best bet is to get yourself SSD drive/better storage for your VM and use it as daily + do test in parallel.
    2. From a client perspective would be good idea to combine this with thin client testing, especially for PCoIP. (don’t get just anything, worthwile talking to Teradici guys)
    3. for bandwidth throttling best bet is to use some appliance in between, like WANEM. why:
    a) does not impact any client or server device, which can cause non true results.
    b) you can record bandwidth utilization. Alternatively use wireshark and capture packets/bandwith in there if you have more complex network.

    Comment by Andriy Kidanov on November 17, 2011 at 10:58 pm

  2. Andriy,

    Shawn and I are using one or the other remoting technology on a daily basis. My work laptop has multiple VMs hosted by Hyper-V installed on SSD as you suggest. Remoting into these VMs gives me a good idea of what you can do and what the user experience is. But this does not provide any data we can use to compare with others. It’s “only” our personal experience with one or two remoting protocols in such an environment. But what we wanted is unbiased comparison.

    Doing thin client testing is a good idea. We have such test scenarios on our to-do list.

    We are using the Apposite Linktropy Mini2 appliance for WAN emulation as you suggest

    Comment by Benny on November 18, 2011 at 8:21 am

The comments are closed.