Info Image

YES! Encrypted Traffic Can Be Classified

YES! Encrypted Traffic Can Be Classified Image Credit: Qosmos

Why is traffic encryption on the rise?

Encryption on the public Internet is constantly rising, with current estimates showing that over 70% of traffic will be encrypted by the end of 2016*. A few content providers (e.g. Facebook, YouTube, and Netflix) are responsible for most of the encrypted traffic. This is globally a positive evolution toward protecting privacy on the Internet, a trend accelerated since Snowden’s revelations about NSA interception activities.

Similar encryption trends can be observed for datacenters, with Yahoo, Google, and Microsoft encrypting all their data center traffic. In the enterprise, more than 25% of traffic is now encrypted both for in-house traffic (email, Web apps) and cloud-based applications**.

How to classify encrypted traffic

It is important to remember that encryption does not mean that the traffic is undetectable; it just means that the content remains private. Advanced techniques can still classify encrypted traffic, enabling service providers to continue to perform policy enforcement, optimize traffic and ensure a good user experience. Here are a few examples of encrypted traffic classification techniques, with accuracy and limitations.

Example 1: Classifying traffic encrypted with SSL/TLS (e.g. https)

Typical protocols: Google, Facebook, WhatsApp

Classification method: Read name of service in SSL/TLS certificate or in Server Name Indication (SNI)

Accuracy: Deterministic method - 100% accurate

Limitations: If SNI doesn’t appear at the start of the handshake, SSL/TLS certificate may only be available after 5 or 6 packets, which can cause a slight delay. Depending on the content provider, the same certificate may be used for different services (like email, news etc.).

Example 2: Classifying encrypted P2P traffic

Typical protocols: BitTorrent, MuTorrent, Vuze

Classification method: Use IP addresses of known P2P peers

In a P2P session, the initialization phase is not encrypted. During this phase, IP addresses of peers can be identified. All flows from those IP addresses are identified as P2P (e.g. BitTorrent).  Statistical protocol identification increases classification accuracy by measuring divergence from a traffic matching engine.  

Accuracy: Typically more than 90% of P2P sessions are identified

Additional info: IP addresses are stored in a fixed size L3-4 cache, with the most frequent hits maintained at the top of the list.

Example 3: Classifying Skype

Classification method: Search for binary patterns in traffic flows

This pattern is usually found in the first 2 or 3 packets

Accuracy: 90 – 95 % accurate

Additional info: In addition, a statistical method is used to identify different services within Skype such as Skype voice, Skype video, and Skype chat. This method uses a combination of jitter, delay, length of packets, spacing of packets, etc.

Thanks to advanced classification techniques, traffic optimization, policy enforcement, and user experience are largely unaffected by encryption. This means that communication service providers can continue to leverage network intelligence to ensure service quality and manage resource utilization, while respecting subscriber privacy!

*Sandvine Global Internet Phenomena, Feb 2016

**ESG research Feb 2015

NEW REPORT:
Next-Gen DPI for ZTNA: Advanced Traffic Detection for Real-Time Identity and Context Awareness
Author

Erik Larsson, Head of Marketing, DPI & Traffic Intelligence, Enea

Erik works with cybersecurity and networking use cases for Enea’s Qosmos DPI and traffic intelligence software. He has extensive experience from marketing, business development and strategy at high-growth private companies and publicly listed technology vendors.

PREVIOUS POST

There Was an Old Woman Who Swallowed a Fly …. the Blocker Who Blocked the Ad Blocker

NEXT POST

A Modest Proposal for an NFV-Driven, Dynamic Digital Service Lifecycle