A bit of trouble shooting

I posted a few months ago regarding my move to using a virtual machine as my “mail” machine. I’m still a huge fan of that approach, but in my new job (the really cool one working on the Xbox team!), I’m running Win7 rather than server as my main machine. I looked into using an internal hosted VM server solution, but since the solutions I found are used primarily for testing, performance wasn’t quite what I wanted (or what I was used to).

Let me outline the problem. I don’t look at email or twitter or my rss feed all of the time – in fact, most of my electronic communication time occurs when I’m compiling / building (I could use that time to have a sword-fight, but I find it better to at least pretend I’m productive). I love to group the “non-essential” stuff together, and when I’m waiting on a build, it’s the perfect time to write email, documents, tweet (or sometimes blog).

imageThe problem is, that when I’m compiling, my 12 cores are pretty busy. This makes reading email, or doing pretty much anything else that would like more than a few time slices of cpu power sort of suck.

I have two computers (a desktop and a laptop), so I have an option of using my laptop for those tasks while the compiler elves are taking over my computer, but given the choice of using a 24” monitor and a full size ergonomic keyboard over a 13” monitor and compressed keyboard, which would you choose? To be fair, I adore my laptop (Lenovo X200s) – but I adore it because it’s small and I can use it anywhere (including on an airline tray table while the monster man in front of me leans his seat all the way back for 8 hours straight). But it’s not a computer I like to write large documents or presentations on.

The solution, as many have already guessed, is to connect to my laptop via terminal server and use it from my desktop machine. The overhead of TS is small enough, that I barely notice a perf problem even at peak memory and cpu usage. The TS session even stretches my desktop to 1920×1080, so it’s a pretty sweet setup. All that is good (and somewhat obvious), but few things in life ever go without a hitch.

Despite how much I wanted to love the setup, I noticed that my connection would drop frequently – and then it would take a minute until I could reconnect. I powered through it for a day or two (I tell myself that I was subconsciously gathering clues). Then finally, I found the clue that led me in right direction. I listen to music on Zune quite a bit while I work (I configured terminal server to play audio on the server machine (my laptop), so I select songs from the TS session on my desktop, and listen through headphones connected to my laptop). So…I noticed that whenever my computer disconnected, I lost audio a few seconds later. The first few times this happened, I (ignorantly) assumed that because I lost the TS connection, I also lost the audio. Then it hit me that since the audio is playing on my laptop, I should never lose audio…unless the network connection was lost.

Losing the TS connection and losing audio were both symptoms of the same root cause (lost network connection) – the only thing left to figure out was why two computers on the same subnet were losing their connection.

I took a look at power management settings to ensure that my laptop wasn’t suspending, but all was good. My next step was to look at my adapter settings to see if the network adapter was powering down for some reason. Sure enough, there was a setting named “System Idle Power Saver” that was enabled. I disabled the setting, and I’ve only had one dropped connection all week.

This example may sound more like yak-shaving than testing, but I’ve always liked the trouble shooting aspect of testing. Going beyond “here’s a problem”, to “here’s what (probably) caused the problem” is, in my opinion) an important skill for testers.

Oops – my build is done – time to get this posted.

2 Comments

  1. great write up on troubleshooting aspects. i like the procedural troubleshooting process and recording each step. As any athlete or a researcher noting down the observation at every step of exploring options.

    reading this i get a sense of
    1. desktop maching compiling program and doesn’t let any other processes to go through. I used to often get through this situation on win7 32bit with 8gig lenovo still consuming only 2-3gb ram for not installing 64bit OS. Now that i have rebuilt to 64bit, though the frequency has come down but occasionally go through that situation

    2. the network connectivity loss due to having the music rechanneled or TS’d from other machine. Troubleshooting whats going on in this context and there by fixing the batterry/power options on lenovo to ensure no network connectivity issues. Glad to know the TS’ing on the busy desktop was still an option, something that i can do from the laptop. i use Lenovo laptop the most and desktop machines are for automation if laptop didn’t work out well.

    however, at a root cause, on earlier part of the post makes me think , makes me think on what is going on with compile process to make a 12 core machne to eat up all the CPU and memory (~100%). An opportunity for a compiler review ?

    Reply
    • I think the assumption is that by default,the build process gets as many system resources as it needs to finish as quickly as possible.

      Now – it’s possible to throttle back the build process, but other processes would still be in contention with the hosted VM (and the over head of the hypervisor service). The end solution is probably the most efficient use of resources across the two computers – once I figured out how to make it work.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.