#1: THE STATE OF AFFAIRS IN DATA ANALYTICS
Data and analytics have permeated to the core of decision making and value creation at a wide variety of businesses. A recent survey measuring the value created by big data analysis indicated increase in revenues across industries in retail, and a host of other benefits like lower costs, better customer insights, improved operations management, and new product ideas for manufacturing.
The excitement generated from such improvements in business value has prodded businesses to invest in data analytics resources – human and machine. While new hiring and training has peaked at frenetic rates, analyses of data science and analytics team productivity problems and discussions around improvements to this productivity are only recently emerging.
Our thesis is that businesses trading accelerated staffing instead of productivity enhancements for the data analytics team are likely to meet with lesser success in scaling their analytics practices. We will look at alternate approaches in this article.
#2: SCALING DATA SCIENCE: THE CHALLENGES
A recent Teradata survey of business decision makers reveals that 48% of them call for more training in data and analytics, and 42% need more staff dedicated to data and analytics. Though the difficulty in hiring data scientists is higher relative to other data analytics personnel, their share in the 2020 data analytics job market is only about 3%. We believe that this apparent skew is partly due to the varying (and ever expanding) definition of the data scientist’s role, as also due to other personnel the data scientists depends on to do his job.
Now let us see how the industry views the productivity of its current data and analytics workforce, particularly its data scientists. The Data Science Report 2017 is a survey of data scientists’ experiences in their present organizations. This survey discovered that 78% of data scientists use data generated from internal systems – which technically means they shouldn’t have challenges with data acquisition or running machine learning workloads. Surprisingly, however, 80% data scientists report that their biggest challenges are either improving quality or access to training data or deploying machine learning into production.
This extreme gap in demand vs supply begets the question – how do we improve the productivity of data scientists? In general, we tend to agree that maximum billing goes to quality data sets and production-ready models. However, in addition, we believe that interpretable and actionable models are also as important to reducing delivery efforts and should be ranked among the productivity enhancements considered for a data analytics team.
#3: SOLUTION: THE EFFICIENCY GOAL
Hiring the brightest and more is not always the best solution especially with limited resources around. We urge business leaders to consider enhancing the productivity of their existing data science and analytics resources before they design aggressive hiring plans; and before they set out to hire the next “10X Data Scientist” who knows all. Businesses should start this journey examining their top-down goals – with customer engagement; we hear our clients articulate some very well-defined goals like “increase the probability of a second purchase on the app”, and “correlate the causes of service churn to predict preventive actions.” This way, the prominence of analytics in the value delivery chain is known to the data and analytics team, which then works with a defined end-goal in mind.
Next we advise business leaders to focus on the critical paths in their analytics projects. Our experience with clients, supported by our reading of recent surveys, shows that the following are most likely to be on an analytics projects critical path: high quality data, packaged production-grade models, and interpretation and action of results. Once the current projects’ critical paths are identified, the emphasis should be on choosing the right tools and the right skills. Tools that help packaging analytics products to collect data to deliver required actions meet all three requirements of project critical paths: First, since data is integrated as a product input, a standardized data model norm is established thus reducing the complexity of cleaning, cataloguing, and labelling data. Second, leaders must emphasize on production-grade model delivery.
Deployment and monitoring of model execution is possible today with commercial analytics platforms. Lastly, interpretation and enablement of third party applications to consume analytics outcomes is just as important. This capability is crucial in having business users of traditional applications, like CRM and ERP, experience the analytics outcomes directly.
We reiterate that the realization of packaged analytics products is the single most important driver in improving productivity of data science and analytics that business leaders must immediately prioritize over skill augmentation.
About The Author:
Dr.Prateek manages Flytxt’s technology roadmap, with a special strategic focus on Data Science R&D and IP creation and protection. His 15 years’ experience spans across mobile network and application innovation, wireless data network management, and cryptography applications. In his last appointment, with the Center for Excellence in Telecom at IIT-Bombay, Dr.Prateek contributed to the IEEE 802.16m standard and co-invented a mobile social network application platform. Previously he held technology lead position for the packet data services at Reliance Communications and developed cryptographic software at Algorithmic Research Ltd (Israel). Dr.Prateek is an alumnus of Indian Institute of Technology, Mumbai.