Tuesday, January 31, 2017

MsIgnite BRK4007 - Troubleshoot media flows in Skype for Business across online, server and hybrid

One of the better sessions from Ignite 2016 if you ask me. A recording of the session can be found on YouTube.
 
Presented by Thomas Binder
 
Glossary
  • Candidate - A combination of an IP address and port to be used for a media channel.
  • ICE - Interactive Connectivity Establishment, a technique (and RFC 2545) to combine client-side techniques with server support to find the most appropriate way of sending media to another end-point; uses STUN and TURN. The Skype for business A/V edge server is a STUN/TURN server.
  • STUN - Simple Traversal of UDP through NAT or Session Traversal Utilities for NAT
  • TURN - Traversal Using Relay NAT
  • MRAS - Media Relay Authentication Service, is a service on the Edge Server that is responsible for providing credentials to clients in order for them to be able to request ports and establish media sessions through the Edge Server. Without credentials, clients can not include Edge Server candidates in their candidate list when trying to establish a media session.
  • SDP - Session Description Protocol (aka Self-Description Protocol)
  • RTP - Real-time Transport Protocol - sending the media
  • RTCP - Real-time Control Protocol - controlling the media during transfer and used for reporting.
  • NAT - Network address translation - a method of remapping one IP address space into another by modifying network address information in Internet Protocol (IP) datagram packet headers while they are in transit across a traffic routing device.
 
Problem / Solution
 
The problem: sending media over NAT devices and through firewalls.
The solution: ICE, STUN, TURN
There are five phases of ICE, one that happens seldom more seldom, and four that is processed every time a call is made.

1. Sign-in - MRAS Request
When a client sign-in it requests a token from MRAS, this is done once at sign-in and after 8 hours by default. This is how the client learns that an edge server exists and how it can be used, e.g. which addresses to use. The MRAS request and response can be seen in the SIP traces.

2. Candidate Discovery - a gathering of local, proxy and reflexive addresses (nothing is sent in this phase)

3. Candidate Exchange - the caller sends a list of candidates (SIP package) to the callee, and the callee initiates a candidate discovery and send back a list of candidates.

4. Connectivity Checks - a run thorough of "all" candidates trying to connect to the other sides list of candidates to find the optimal media path. (These Connectivity checks using STUN packets are not seen in a SIP trace but visible using Wireshark.)

5. Candidate Promotion - When checks are done the optimal media path and/or optimal candidate media pairs are selected. The final candidate promotion can take up to 10 seconds to happen, so if you are tracing on a test call and want to make sure you get the complete picture, make sure your test call lasts at least 10 seconds. A second invite (re-invite) and OK with only the final candidates will come eventually.

Candidates can be of different types such as
host - local IP of a client computer
srflx - server reflexive
relay - external IP and port on the A/V edge

In SDP we will also find TCP-PASS / TCP-ACT which means TCP passive or active. This is because even if we can send from a candidate (IP:port) we are not 100% sure that we can receive on that same IP:port, and that is why we list both active and passive candidates for TCP.

Candidates traditionally comes in pairs where one candidate is used for RTP and the other for RTCP. If both clients can use multiplexing for RTCP (a=rtcp-mux in SDP (newer clients can do this)) only one candidate can be used for both RTP and RTCP.


High ports in the external firewall
 
Do we need to open the high 50,000 - 59,999 TCP ports outbound?
This has been in the documentation for a long time, and it has confused a lot of people.
If two edge servers will talk to each other we will not use the high ports as destination ports. For UDP the traffic will flow from port 3478 to port 3478, and multiple sessions can be handled. TCP is not stateless so it can have only one connection from one ip-address:one port to another ip-addres:another port. So the edge will use different source ports, but the destination port will always be 443.

If your firewall is only filtering on the destination port - then forget about the 50,000 - 59,000 port range, but if your firewall requires you to configure source ports, use the "source port" column below.

However, if we have two external users connected to two different edge servers, and they cannot establish a media path client to client, we will again benefit if the 50,000 - 59,999 port range is open since we then can establish media using only a single edge server. If the high ports are blocked we can still connect edge to edge and the call will go through, but this consuming more resources and using more hops (latency).

And the final scenario is when using an edge pool with DNS load balancing. In this scenario, an external user connected to one edge server tries to set up media to an internal user connected to another edge server. In this case the external firewall must allow public to public IP hair pinning or the call will fail (or the 50,000 - 59,999 port range could be opened to avoid this.)

Changes to ports for Skype for business Online
 
If we look at the documentation found at Office 365 URLs and IP address ranges we will see that UDP ports 3478 and 3479, 3480, & 3481 should be opened - but they are not used yet in Skype for business Online. Further on, firewall openings for Skype for business Online will be simplified, and UDP 3479 - 3481 will be used, but it has not happened just yet.


References
 
Understanding how Lync establishes audio/video paths using ICE
Microsoft Lync Server 2010 Resource Kit (Chapter 9)