Why is traffic encryption on the rise?
Encryption on the public Internet is constantly rising, with current estimates showing that over 70% of traffic will be encrypted by the end of 2016*. A few content providers (e.g. Facebook, YouTube, and Netflix) are responsible for most of the encrypted traffic. This is globally a positive evolution toward protecting privacy on the Internet, a trend accelerated since Snowden’s revelations about NSA interception activities.
Similar encryption trends can be observed for datacenters, with Yahoo, Google, and Microsoft encrypting all their data center traffic. In the enterprise, more than 25% of traffic is now encrypted both for in-house traffic (email, Web apps) and cloud-based applications**.
How to classify encrypted traffic
It is important to remember that encryption does not mean that the traffic is undetectable; it just means that the content remains private. Advanced techniques can still classify encrypted traffic, enabling service providers to continue to perform policy enforcement, optimize traffic and ensure a good user experience. Here are a few examples of encrypted traffic classification techniques, with accuracy and limitations.
Example 1: Classifying traffic encrypted with SSL/TLS (e.g. https)
Typical protocols: Google, Facebook, WhatsApp
Classification method: Read name of service in SSL/TLS certificate or in Server Name Indication (SNI)
Accuracy: Deterministic method - 100% accurate
Limitations: If SNI doesn’t appear at the start of the handshake, SSL/TLS certificate may only be available after 5 or 6 packets, which can cause a slight delay. Depending on the content provider, the same certificate may be used for different services (like email, news etc.).
Example 2: Classifying encrypted P2P traffic
Typical protocols: BitTorrent, MuTorrent, Vuze
Classification method: Use IP addresses of known P2P peers
In a P2P session, the initialization phase is not encrypted. During this phase, IP addresses of peers can be identified. All flows from those IP addresses are identified as P2P (e.g. BitTorrent). Statistical protocol identification increases classification accuracy by measuring divergence from a traffic matching engine.
Accuracy: Typically more than 90% of P2P sessions are identified
Additional info: IP addresses are stored in a fixed size L3-4 cache, with the most frequent hits maintained at the top of the list.
Example 3: Classifying Skype
Classification method: Search for binary patterns in traffic flows
This pattern is usually found in the first 2 or 3 packets
Accuracy: 90 – 95 % accurate
Additional info: In addition, a statistical method is used to identify different services within Skype such as Skype voice, Skype video, and Skype chat. This method uses a combination of jitter, delay, length of packets, spacing of packets, etc.
Thanks to advanced classification techniques, traffic optimization, policy enforcement, and user experience are largely unaffected by encryption. This means that communication service providers can continue to leverage network intelligence to ensure service quality and manage resource utilization, while respecting subscriber privacy!
*Sandvine Global Internet Phenomena, Feb 2016
**ESG research Feb 2015