Communications service providers (CSPs) around the world are all looking for new technologies and services to meet ever increasing customer expectations and deliver the best possible experience. The emergence of 5G is creating real excitement but is also adding even more complexities to ongoing network operations. These complexities are compounded by the sheer quantity of data, from disparate services and legacy network systems that need to be intelligently analysed and processed to provide actionable insights across multiple network layers.
To stay profitable, CSPs must find ways to create this actionable insight and intelligence, from its network operations as cost effectively as possible. Traditionally, the operator has been largely reactive - locating a network issue and then taking the necessary steps to isolate it and minimize its impact. However, with the advent of artificial intelligence and machine learning, CSPs now have the power to be more proactive - to predict events and then to prescriptively take remediation action or to re-route traffic according to the changing conditions.
There are a variety of instances and use cases where artificial intelligence and machine learning, when embedded with telecom expertise, can add significant value to an operators’ network operations. Here are some of the key areas:
Detecting network faults
AI/ML can be commonly used to detect network faults in real-time or near real-time. This includes, in more advanced cases, diagnosing the root-cause of network alarms that could occur across multiple event sources, while filtering out any noise alarms, analysing which wastes precious resources. Then AI/ML can also be applied to find solutions for the original root-causes, especially in the case of 5G; when having to overcome disparate network technologies, including NFV, further complicates the analysis.
Looking for anomalies
AI/ML algorithms can be used to help operators look for correlations between performance KPIs and faults. Humans can track a handful of KPIs and try to detect their interrelationships through dashboards and advanced visualisations, but AI/ML allows operators to perform the same task at scale, while handling thousands of KPIs in real time and discovering nontrivial KPI relationships and abnormal behaviour that is hidden “under the radar”. By leveraging AI/ML, operators benefit from automation and autonomy, driving new levels of efficiency to network operations without human involvement.
Assessing the damage, which alarm bells are ringing?
Network operations staff are bombarded with so many alarms they can struggle to see the forest for the trees. On a daily basis they are faced with a number of common incidents that generate trouble tickets. AI/ML can help by classifying alarms through historical behaviour and identifying their root-cause. Operators can suppress symptomatic alarms and instead focus on the root-cause alone, eliminating redundant trouble tickets and any associated costs. Through adaptive mechanisms AI/ML can also locate the source of the problems more quickly and automatically suggest groupings, tagging potential root-cause alarms.
Traditional rule-based alarm correlation means a heavy burden of rule development and maintenance, with many alarms going past undetected. Using AI/ML operators could instead train a model to devise logic and priorities. Focusing attention on the most important alarms means resolving more critical issues faster. As the volume and nature of alarms change over time, AI/ML can evaluate these trends and make intelligent fault predictions.
From trends to predicting network faults
Predictive failure recognition can also provide insight into anomalies and casualty analysis. According to a recent Heavy Reading survey, predictive maintenance was the top use case for AI/ML in telecoms with 92% of CSPs saying it was critical - ahead of security, network management and fraud/revenue assurance. What’s not clear is how predictive maintenance will be used if a system predicts, for example, with 95% probability that a particular fault will occur in the next month, should the element be replaced immediately? Or should a spare unit be kept available in stock or should it wait until a failure occurs and scramble to fix it? While the first option ensures the greatest reliability, it is also the most capex-intensive. And as technology matures, network operations and culture will need to evolve with it.
AI/ML is not a one-time fix
Still finding their feet in telecom networks, AI/ML is suited to narrowly defined tasks, within a specified context and with consistent input variables. We are still a long way from the sort of general intelligence that humans possess to learn from one situation and apply to another, addressing multiple problems and reacting to different inputs and changing scenarios.
AI/ML needs to be viewed not as the one-time fix for all our network ills, but as a tool to be embedded in the network operations workflow. It needs to augment personnel and embedded expertise, not replace them. These technologies should be considered part of an ongoing process of continuous improvement in network management, improving the customer experience, keeping costs under control, and ensuring profitable growth. As operators increasingly rely on AI/ML systems to predict incidents and make recommendations, these systems will one day operate with full autonomy, applying their recommended changes without the need for human intervention.