User Rating: 5 / 5

Star Active Star Active Star Active Star Active Star Active

I must tell you this. If I had someone who explained what I am going to explain to others my introduction to the VoIP world (several years ago) wouldn't be so hard as it was. The hardest thing when I started working with VoIP was following others' terms, some of them were invented (not technical terms); I was reading a lot and after my readings, I could understand what they wanted to say, even if they were using the wrong term.

I will try to explain in this article how VoIP works in terms of players, information flows and I will also explain to you some terms that are very common to use. In the end, I guess you will have a picture of who does what. I won't go into technical specifics such as SIP and RTP protocol details.

What is VoIP? What is IP Telephony?

VoIP stands for Voice over IP. It is the general term to refer to the fact that communications are being done over the IP protocol; when someone refers to VoIP, it is referring to the set of technologies, not a specific one. Nowadays, VoIP doesn't focus specifically on the voice, but video as well. VoIP is present in many places:

  • WhatsApp voice and video calls,
  • Facebook Messenger voice and video calls,
  • Skype to Skype user's voice and video conferences,
  • Inter-phones in buildings,
  • Google Duo, etc.

Usually, the most common Internet protocols that are used to make VoIP happen are SIP and RTP. There are others such as AIX. SIP is the signalling protocol that tells the phone to ring (for example), RTP is the media protocol, the one that carries the voice and/or video.

IP Telephony

You can see IP Telephony as a sub-set of VoIP. IP Telephony could be seen as a very specific case of use of the VoIP and maybe the most successful of them. IP Telephony tries to homologate the same concepts of the classic telephony system:

  • Phone number: usually called DID or DDI
  • Lines: usually called channels.

Also, IP Telephony will bring more features. Before discussing them, I will need to explain the common scenario. I will explain all, keep reading.

IP Telephony Elements

voip flow

End-users are considered to be the customers, in other words, those you actually use the phone. End-users connect to the IP Telephony system through endpoints. An endpoint could be an IP Phone such as a SNOM, Gransgream (actual hardware that has a dedicated purpose) or a softphone such as Linphone, Bria (software installed in your computer or smartphone).

The endpoints connect to a class 5 PBX. In general terms, a class 5 PBX is a PBX that gives service to the endpoint. Class 5 PBXes give some useful functionalities such as voicemail, call forwarding, follow me, time conditions IVR (auto-attendant), call center, queues, broadcasting (paging), call recording, eavesdropping, transcoding and many others.

A class 5 PBX usually needs routes to connect to the world. Those routes are given by class 4 PBXes. A class 4 PBX does not attend endpoints directly, instead, they interconnect other class 5 and class 4 PBXes, therefore they do not offer any end-user functionality as a class 5 does. However, they are focused to be quick and routing as many calls per second (CPS) as possible.

For example, FreeSWITCH (a soft-switch) could work as class 4 or 5 depending on the configurations.

Some functionalities are not exclusive for class 5 or class 4 PBX. Transcoding is one of them. It would be very bandwidth hungry to pass the raw sound/video through the Internet. Hence, the codecs were invented. A codec is no other thing than an algorithm that compresses the audio/video. There are many codecs, and the quality that they deliver varies. The two most common voice codecs are 711 and 729. Codec 711 uses 64kbs per channel and the audio is really clear; 729 uses 13kbps (approx) but its quality is not that good if you compare it to 711. Most newer and advanced codecs are proprietary or under patent; 729's patent recently has expired, so we expect more end-points to support it out of the box.

Because each end of a call could have a different codec, we say that transcoding is the ability given to a PBX to translate the audio from one codec to another. Transcoding is CPU hungry, and it should be only offered in class 5 PBXes (but some 4 offers it as well) and it should be the last resource to connect.

Another functionality you will find in both class 4 and 5 PBXes is the CNAM. In other words, the translation service to identify who is calling. Instead of having an unknown caller with a number such as 1 613 800 7370, you could see OKay Inc. You will find many available services that generally bill by the query. One exception is SuperCNAM, it bills you a flat rate quarterly with unlimited queries. It is most commonly to find this in class 5 servers, however, a few class 4 PBXes do offer it.

Commercial Classification

Speaking in commercial terms, IP Telephony covers the endpoints and the class 5 PBXes. SIP Trunking or Wholesale covers Class 4 PBXes. As you see, I have labelled some servers as tier 1, 2 or 3. This is a term that is used to know how close the PBX is to the PSTN. A tier 1 PBX is usually owned by big telecom companies such as Bell; a tier 2 PBX is commonly owned by big telecom companies and a tier 3 PBX usually is a small to medium company that gives service to the public.

You may think that it is better to make a connection to a tier 1 class 4 PBX. The answer is yes and no. Because tier 1 PBXes are very close to the PSTN network, the speed and quality of the call are usually flawless. They also give the lowest rate in the market, however here it is the trick, they usually require a minimum use. This means, that they expect to bill you a minimum of hundreds of thousands of minutes per month, which translates to thousands of dollars. In short, you will need big capital to connect to them and big traffic to pass to them.

Tier 2 PBXes are similar to tier 1 ones. Because they are not that close, their rates are a little higher (everyone needs to do a profit) or the increments are different. They are more affordable to medium size companies as they still require a minimum charge to give you some good rates but they are more flexible in commercial terms.

Tier 3 PBXes market quite different. They allow you to pay by the minute (only pay what you use), no minimum monthly charge, but the rate is not the cheapest on the market. While a tier 3 PBX may sell the USA48 minute at a rate of 0.005 USD, a tier 1 could sell it at 0.0001 or less. However, here you are not forced to a minimum charge. For example, To Connect Me is considered a be a tier 3 PBX but it gives a mixed pre-paid/post-paid scheme that makes it very attractive for those who want to give IP Telephony service.

If you are starting an IP Telephony business such as To Call Me, you will need to interconnect to tier 3 carriers such as To Connect Me.

Conversational vs Call-Center (AKA Short Duration)

The kind of traffic you route is also classified:

  • Conversational: calls that have a high percentage of being answered with long conversations. Technically speaking, ASR >= 60% and ACD >= 30 seconds.
  • Call-center. calls that have a high percentage of being aborted (not answered, a number not existent) and when answered, to be hung up almost right away. Technically speaking ASR <= 30% and ACD <= 6 seconds.

Please note this is just common sense, some carriers may change the technical metrics.

Pricing Schemas in the SIP Trunking or Wholesale Market

If you are planning to start an IP Telephony service, one of the keys to any successful business is to optimize costs. In this kind of industry, most price optimization is done by selecting the right routes. When looking for routes you see people telling things such as:

0.002$ 1/1 CLI

What does that mean? The 0.002$ refers to the rate. If no one states anything different, the default currency is USD and the default time unit for the pricing is the minute. The 1/1 refers to the increment: first and subsequent increments. The first increment usually is called connection increment. If nobody states a time unit, it is talking in seconds. CLI or NCLI means to Call LIne Identification, in other words, the route respect your caller id number.

The most common increments are 1/1, 6/6 and 60/60.

Also, please note that some odd carriers (usually this doesn't happen) may have different rates for each increment. This allows VoIP companies to do some interesting charges, for example:

  • 0.0088/0 1/1: could be read as 0.0088/60 USD  = 0.0001 USD the first second, and subsequent are for free. Bill by call, not by time.
  • 0.99 60/60: This means that each minute is going to be billed at 0.99 USD. Also please note that if your call was 61 seconds, because of the increments it will be billed at 1.98 USD (two minutes)

As you see, playing with the increments is another way VoIP companies make money. You may get a very good rate, but with a big increment.

The Color of the Route

A white route is a route where both source and destination are legal terminations in the telecommunications business. This is opposed to a black route, which is a route that is illegal on both ends. Also common in telecom (especially VoIP) is the term grey route, which defines a route that is legal for one country or the party on one end, but illegal on the alternative end.

Additionally, you will find that white routes give the best quality in terms of technical metrics such as ASR. The higher the ASR, the more likely to connect all the calls.

Beware of the Ring Tone

Good and reliable routes won't do this. But if you look for routes from non-well established or unknown companies, they kind of start billing the ring tone. For example: if a call rings 10 seconds, and the conversation was 30 secs, you are going to be billed for 40 seconds.

Note that this is wrong and I do not encourage this practice. Unfortunately, there is not an easy way to detect it. You should do a call, take the time and compare your calculations against the system.

Protocols in the VoIP: SIP & RTP

VoIP in general needs two protocols: a signalling one and a media one. The most common pair for these functions are SIP & RTP.

SIP, the signalling protocol is used to let the servers and endpoints know about when someone is calling, to transfer a call, to receive a call. It contains important information about the caller, the callee, what servers are involved in the media. RTP is just the flow of the sound or video.

SIP and RTP are the most common but not the only ones.

Security in the VoIP

Both signalling and media protocols can be encrypted. Let's pretend we are talking about SIP & RTP (the most common ones).

SIP can be encrypted with TLS, it is commonly called SIPS. SIPS needs an extra port, commonly 5061/TCP and it runs on TCP.

RTP also can be encrypted and it is commonly called sRTP. sRTP runs on UDP. No extra port needs to be open.

You can use the following combinations:

  • SIP and RTP: clear communications, easy to listen if you place a sniffer in the right place. Because the SIP payload tells the RTP port, it is trivial to get a specific conversation. This is the defacto of any telecom player.
  • SIPs and RTP: encrypted signalling with clear media. They are easy to listen to but somehow difficult to find. Because SIPS is encrypted, there is no easy way to know if a current RTP flow belongs to a specific call. This combo is rarely found in the industry, but technically speaking you can have it.
  • SIPs and sRTP: encrypted communications. The only way you could intercept this is if you have the private key (and if they are using a specific crypto suite, I won't discuss this here).

Are SIPs and sRTP safe enough for my call? Yes and no. The only thing you may be sure of is that the communication between the end-point and the class 5 PBX is secured, but from the PBX 5 and beyond you wouldn't know. So, if you are paranoid and want to be 100% sure about the secrecy of your call, keep reading there is another option.

zRTP is the point-to-point RTP encryption. The key exchange is done directly between the end-points and the PBX just acts as a dummy proxy to pass by the traffic it won't understand. This kind of implementation works only on private extensions. Because of the technology and the way they work, you won't find an IP Telephony service that will route calls to the PSTN with zRTP, however, you may find PBXes that support zRTP to interconnect extension to extension calls. Just to be clear, because of the end-to-end encryption, some functionalities such as call recording or transcoding won't be available; therefore, both end-points need to agree on the codec they will use.


Also known as software switches. These are programs that run on a general-purpose server. The most popular implementations are:

  • Asterisk
  • FreeSWITCH

I love FreeSWITCH, and I will be thrilled to help you start a VoIP business with FreeSWITCH.

I think that is all the basis. Let me know if something is not clear. I will write more about it.

Good luck!

blog comments powered by Disqus


Read about IT, Migration, Business, Money, Marketing and other subjects.

Some subjects: FusionPBX, FreeSWITCH, Linux, Security, Canada, Cryptocurrency, Trading.