![]() |
|
![]() |
|
Voice over IP (VoIP) troubleshooting with PingPlotter Introduction Using an IP Network (like the internet) to conduct a voice conversation (VoIP) is a growing trend and is becoming easier and easier for people to do. It can be inexpensive and relatively reliable. It can also be challenging - with poor voice quality, the inability to hear and communicate, delays and other problems. The underlying technology for VoIP is extremely network dependant. If you're having voice quality problems, the problem is often related to the network - maybe your internet provider or maybe some other component between you and the called party. This article will talk about some basic troubleshooting techniques that can be used to locate where the problem is occurring so you can make good decisions about how to solve the problem. Network-related VoIP Symptoms Many symptoms of VoIP problems are network related (although certainly not all of them). Here are some examples of symptoms that are often network related:
Other symptoms might not be network related. In particular, if the symptom *always* happens, any time of the day, any day of the week, then there's a decent chance it's not a network problem.
Using PingPlotter to identify the source of network problems PingPlotter has some unique capabilities in its ability to help you track down the source of network problems. What you really want to know is:
PingPlotter can offer a lot of insight into all of these questions. Collecting data with PingPlotter Before we can do much analysis, we need some data to analyze. We cover some of these topics in our Getting Started Guide, so we will not cover *details* of how to set things here. First, we need a target server to monitor. Ideally, this would be the actual VoIP server of your VoIP provider, or something on the same area of the network. If you called your VoIP provider and they asked you to collect PingPlotter data, they may have given you a server to use. In many cases, the use of any server can work, but this will only identify problems with your ISP - not with your VoIP provider. The good news here, though, is that the vast majority of VoIP problems are because of front-line service providers (like your ISP). If you don't know what address to use and you have no way of finding out what address your VoIP hardware or software is using, try using the web site of your provider. For this discussion, we will be using our web server - www.nessoft.com - as the server we're monitoring and using for troubleshooting. Use the following settings in the main PingPlotter screen: Address to trace: (the server just discussed - www.nessoft.com in our case) Now, hit the "Trace" button. You should see a picture appear that looks something like this:
The upper graph should show a full route, including the "Round Trip". If you don't get a Round Trip, check in the troubleshooting section of this document for some ideas. Now, let this run for at least 30 minutes - preferably, during a period where you're making a voice call. Ideally, you'll have a period where you have a voice call that's good and one that's bad, but that might be possible. If nothing else, just let it run for long enough to get a good sample of your network conditions. A great thing to do while you're collecting data is to make notes in the PingPlotter data about what you're experiencing. You can see instructions on how to do this in our Getting Started Guide's chapter on collecting data in the "Creating a comment or note" section. The data we collected covers several days. PingPlotter works great to just run over a long period of time so you get a good idea of what network conditions look like - during good times and bad. Examining data with PingPlotter Once you've collected some data, it's time to have a look at what might be the problem. We cover some of the PingPlotter commands on zooming, focusing and digging in the Finding the source of the problem section of our getting started guide. One of the key things to know here is that we're looking for problems at the last hop only - and then using the other hops to determine where the problem starts. Packet loss or latency that shows up only at an intermediate hop is not a problem! Let's look at the graph above. Notice how hop 15 has a full 100% packet loss, and hop 11 has 27% packet loss? The final destination looks rock-solid, though - no packet loss and the latency is really nice and smooth. This is, in general, what you want to see - a solid, flat line at the final destination, no packet loss (red lines in the time graph). An analogy - network traffic is a bit like freeway traffic A network is a bit like a freeway - it works great when everyone is going the speed limit and we have 50% of the maximum traffic that's designed to go on that freeway. As we start to add more traffic, at some point we don't have capacity for more. We start to have problems as people merge on the freeway. People already on the freeway sometimes slow down and cause traffic jams. Sometimes, when it gets too bad, people give up and decide not to continue the journey. On a freeway, this might be called "congestion". This "congestion" happens on network too - and it works pretty much the same way. Packet loss, latency and jitter (the 3 enemies of call quality) are all symptoms of congestion - when there's too much traffic for the network to handle. PingPlotter sends out packets that go all the way "there" and back again. We measure the time it takes, and also measure how often a packet (or a router) gives up on the packet we sent. Let's continue the freeway analogy a bit. Let's say between here and our destination there are 15 offramps with turnaround points off the freeway. We'll send out 15 cars and assign each car to one of these intersections, with instructions that when they get to that intersection, they should turn around and come back. Then we'll measure the time it takes to get from us, to each intersection and then back again. The most important time is the one that goes all the way to our target - that 15th car. If it makes it there and back again in expected time, then we know the freeway traffic is running pretty well - everything made it through just fine. If, however, the 15th car takes longer than expected (or it never returns!) we can look at our records for the other intersections to find a likely place where problems are occurring. Maybe all the cars out through intersection 9 had no problems - and returned quickly, but the cars going to intersection 10 and beyond started getting delays. From this, we can see that there is some kind of traffic problem past intersection 9. Just for the sake of taking this analogy too far, let's look at one other aspect. Let's say that intersection 5's turnaround spot is in a small town where the police are of a disposition to pull people over for no reason at all. Each of our cars who are going to intersection 5 have to use that turnaround, and 20% of the time they get pulled over there. Another 15% of the time, there's someone else pulled over and they have to wait while that car moves off the road. Meanwhile, traffic is whizzing by on the freeway, unrestricted. This situation can happen on a network with PingPlotter as well - where the packets going to hop 5 might get waylaid by some local rules and show packet loss, latency and jitter that are not being experienced by packets destined for other places. So, what are we looking for, when it comes to problems? The first place to look is the final destination. If the time graph looks like the graph above (straight line, no red), then PingPlotter is not finding network problems. Look for problems at the final destination. If you find a problem at the final destinatino, then look back until you find the first hop showing similar symptoms - that's who we probably need to contact to get the problem corrected. Examples and Analysis Note: Most of this data is fictional - based on truth, but not real. You should certainly not make any decisions about the respective companies indicated in this data - as that would be very, very wrong. Some of this data is 8 years old! Example: Distributed packet loss Let's look at an example, this time an example with problems:
Here, we see 8% packet loss at hop 16 (the final destination). This would result in poor voice quality, dropped "bits" from words and hard to understand conversation. Notice that the latency is pretty good still - it's just the packet loss that's a problem (packet loss is all of the red in the time graphs and the red bars in the trace graph). With a pattern like this, voice quality would be consistently "iffy" - not unusable all the time, but not very good either. Notice how the packet loss is happening at all hops from hop 6 onward, while hop 5 looks relatively good. The packet loss percentage is similar all the way down - around 8% (statistically, it would be just about impossible for all hops to have identical packet loss percentages with this kind of loss). To turn on and off time graphs like this, just double-click on the hop number in the trace graph. Hop 11 has high latency and higher packet loss - but see how hop 12 goes back to results similar to hop 10? So hop 11 probably isn't introducing any new badness into the traffic. So we should ignore hop 11 here. The same thing goes for hop 15 - it's not sending any data back, but it's passing through data just fine to hop 16. So, in this situation, the problem looks to be between hop 5 and 6. It's pretty likely that Qwest knows about this problem - it's in the "middle" of their network - and it's all owned by Qwest (we can see that from the DNS Name column). In this case, since we're subscribed to Qwest, it's a pretty easy decision to call Qwest and complain. The picture above is pretty compelling and would be a good communication tool to them. Example: Local bandwidth saturation Note: We have another example of this in our tutorial / manual.
Here, notice the big latency jumps - you have a nice flat line, then a jump in latency - including some packet loss. This pattern is one that is almost always a bandwidth saturation issue (which is the same as congestion). In the case we have here, hop 1 is inside our network (our DSL modem, actually) and hop 2 is inside Qwest. This is a case where we were transferring too much during this period - and we were using all of our available bandwidth. A VoIP call would suffer significantly during these periods - there is a lot of jitter (the "ragged" line is an easy way to see jitter - where packets take different amounts of time to arrive), higher latency and some packet loss. The voice quality would be bad, there would be additional lag, and it would probably have audio drops. There are a few options for solving this one, but none of them involve complaining to anyone else:
Example: Border congestion Congestion often happens at network borders - where one network hands off to another. This is relatively common for small, growing ISPs - where they just do not subscribe to enough bandwidth to handle everything. Let's have a look at what this condition might look like. We're going to use a different network for this picture.
This one isn't *quite* as simple, as there are a few factors. The symptoms of conditions like this would be:
If we look at the network conditions with PingPlotter, we see a couple of problems. First off, there's some serious packet loss starting at hop 9. This packet loss is carried down through the rest of the route to the final destination. This is a border - between rr.com and alter.net. Having problems at borders like this is pretty common - that's where one company pays another to handle traffic - and if a company is growing, it might be "oversubscribed" - using more bandwidth than is available. An interesting part of this is how during heavy load times for home users (ie: evening hours), the packet loss and latency are worse. During early morning hours, it goes back to being OK again. This is a big sign that the problem is load related - and that this link is having congestion problems at "rush hour". Time to add some lanes! Another problem, though, is inside the rr.com network there is significant latency and jitter. There are some slight symptoms at hop 2 (which is the border between our internal network and rr.com - so that's the cable modem), but starting at hop 6 there's some real bumps in latency and the jitter (latency variation) is also a big cause for concern. In this case, both problems are inside the rr.com network, and since we are an rr.com customer, we would call them for help on this. Example: 802.11b network near its range limit
Here's an example where we're connected using a computer-based VoIP service (like Skype). Our computer is hooked up to our DSL modem via a wireless 802.11b network. Hop 1 is our DSL modem. Here, we see a little bit of packet loss being added to every hop - our wireless network is losing a few packets (about 1 to 2 percent, it looks like), and this impacts everything this computer does - including our VoIP connection. The call quality would be generally good here (probably better than acceptable - up to the "good" range, really). The latency is fine and there is very little jitter, but there is a little packet loss. There is a problem, though - at 8:53pm, our call was interrupted - it looks like hop 1 lost a bunch of packets all together and during that period, we were unable to hear anything. Let's zoom in on that a bit.
See the period at 8:53 / 8:54 where we start getting a lot more packet loss, and then all the hops show a big block of lost packets - a period where it's likely no packets were getting through. Here, the solution might be to move the wireless access point, or switch to wired on that computer. Reporting problems, when you find them If you're using PingPlotter, it's almost certainly because you're experiencing some kind of problem - and when you find something that you think might be the cause of that problem, you need to communicate that to the right party. We cover this topic in some depth in our Getting Started Guide. The piece we want to stress here is that the data in PingPlotter doesn't really mean anything unless you correlate it with a network problem (like poor VoIP quality). It's of paramount importance that your complaints include a description of how this problem is affecting you. Don't just send a graph from PingPlotter expecting them to be able to figure out what was wrong. One great way of doing this is to put comments in the PingPlotter graph itself using the "Create Comment" feature of the time graphs. Make comments every time your VoIP quality is bad. Make comments when you give up on a conversation because they can't hear you at all (but you can hear them just fine - how frustrating!). Troubleshooting PingPlotter If you get "Destination Unreachable" at something beyond hop 3 or so, but can access that site via a web browser. Some sites do not respond to ICMP echo requests. See our knowledge base article for instructions on how to configure PingPlotter to use TCP packets instead of ICMP. If you get "Destination Unreachable" at hop 1 for all targets Make sure your software firewall (ie: ZoneAlarm, etc) is configured to allow PingPlotter to have access to the network. If you only have the final hop visible - and all intermediate hops are empty We cover this in this knowledgebase article. Other questions Jitter Jitter is the amount of variation in latency. If one packet takes 100ms and the next one takes 200ms, there's 100ms of jitter there. PingPlotter Pro offers jitter calculations and graphs, but PingPlotter Standard (and the 2.x line) still gives you an easy way to see the jitter by looking at the smoothness of the time graph, zoomed in a little. Here's an example:
This is zoomed in enough for us to see the individual samples - and we can see that none of them come in with the same latency. Adjacent samples here often have latency variations of 100ms - and just about every one has latency variation of 30ms or higher. Just looking at this graph, we can see a lot of jitter. Compare that to the first picture in this article - where the line was completely flat. We're looking for the flat lines, not big variations with red stuck in everywhere. Other resources This article introduces some concepts and ideas about VoIP troubleshooting. There are other resources online that provide more depth (albeit not within the context of PingPlotter). www.voiptroubleshooter.com is a great site that has an enormous amount of content on which symptoms relate to what kinds of network problems. This site has a relationship with Telchemy, a leading VoIP provider of call quality monitoring tools. |










