Formerly known as Wikibon
Search
Close this search box.

2017 Big Data and Analytics Forecast: Usage Scenarios

Over a 10 year period, big data analytic technology will mature to support 3 major usage scenarios: Data Lake applications, massively scalable mobile and Web applications, and autonomous applications that manage an IoT ecosystem.

Three Usage Scenarios Driving Big Data Analytics Growth

Wikibon sees three major usage scenarios that are generating spending on the software categories forecasted in Figure 1. They are Data Lake applications, massively scalable Web and mobile applications, and autonomous applications that manage an Internet of Things ecosystem. A critical skills shortage in managing big data infrastructure and massively scalable applications is driving these workloads to the public cloud. There, enterprises find operational skills embedded in the technology that manages the services. The cloud providers are also simplifying machine learning.  By embedding data science technology into untrained and even trained machine learning models, developers can consume models via straightforward API’s with less need for scarce data scientists.

Data Lake applications

Hadoop-driven Data Lake applications have been about unbundling, commoditizing, and scaling many of the roles of traditional data warehouses with a new emphasis on the requirement to handle machine data. As the unbundled elements continue to mature, the big data ecosystem, increasingly a broader ecosystem than Hadoop, will continue to assume more roles that data warehouse have done and still aspire to do. Even as applications incorporate more real-time analysis Data Lakes will provide a foundation of rich context to inform the analytics.

  • Big data started out as defined by Hadoop. Now it applies more broadly to high volume, velocity, and/or complex data with scale-out commodity hardware and software where the compute processing comes to the data.
  • Mainstream uses started as a larger repository complementing data warehouses.  Hadoop Data Lakes cost as little as 1/10th traditional data warehouses in $/TB. So Data Lakes served first as a repository for data previously too expensive to store for later analysis.
  • ETL offload was cheap and freed capacity on the data warehouse. As processing capabilities matured, Data Lakes offloaded ETL processes from data warehouses, where they consumed as much as 40% of the very expensive capacity.
  • Traditional business intelligence is now moving to Data Lakes. More recently, traditional production reporting and interactive visualization and analysis has started to emerge as a mainstream usage scenario.  Tools such as Tableau, Qlik, PowerBI need MPP SQL DBMS’s on HDFS or other commodity storage such as S3 to support them. The tools are finally beginning to drive adoption of these server databases. The front-end tools weren’t designed to handle result sets with hundreds of millions of records so they currently have to work with samples or aggregates of the data. Newer tools such as Zoomdata can avoid the limitations of sampling by progressively rendering the result set as it emerges from the back-end server.’

Massively scalable Web and mobile applications

Web and mobile applications have driven the rise of NoSQL application databases in order to support scale requirements of B2C populations. While analytics based on Data Lakes help inform interactions and decisions with historical data, the immature state of machine learning is limiting the growth of packaged applications built on live, continuously-updated predictive analytics.

  • Web and mobile applications typically rely on NoSQL databases that can scale-out to data volumes and ingest throughput not possible with traditional databases. Large-scale Internet vendors pioneered these applications.
  • The information in Data Lakes are beginning to inform user interactions in Web and mobile applications in real-time. Data warehouses, by contrast, traditionally informed operational applications with historical data collected offline.  Increasingly, the rich information in Data Lakes is forming the basis for predictive analytics based on machine learning. These models anticipate and influence user interactions. The models also provide analytics that inform an increasing share of operational applications.
  • The immature mechanics of creating, deploying and maintaining machine learning models is slowing adoption.  Making the machine learning pipeline accessible to developers and admins as opposed to data scientists will require additional maturity. Developers need to be able to select and consume models in the form of API’s, not algorithms that need feature engineering. And admins need more repeatability at the operational end of the ML tool chain for activities such as model deployment and continual data feedback that keeps the models up-to-date.
  • Machine learning will define a new generation of packaged applications. Predictive models will anticipate and influence an ever greater share of user interactions and automated decisions across all applications.

Autonomous applications that manage an Internet of Things ecosystem

Traditionally, applications have served people. With IoT, they serve physical things. For example, hydroponic agriculture can operate autonomously and sense soil and plant conditions. The application can apply the optimal amount of water and fertilizer via an ecosystem of emitters controlled by actuators.

  • IoT applications will emerge as a fundamentally new class of self-tuning applications. They grow beyond embodying the customer- and employee-interactions and physical back-office processes of prior generations of applications.
  • IoT applications embody and optimize the physical and digital behavior of how products evolve over their lifecycles. These lifecycles are increasingly captured in Digital Twins. In addition, these applications increasingly work autonomously with predictive models informing how the Digital Twins should operate. Examples include platforms such as GE Predix and IBM’s Watson IoT.
  • The rise of IoT applications will drive the need for continuous processing and analytics. Data is generated in flows and the requirements of real-time decisions will drive growth in streaming analytics.

 

Action Item

Big data pro’s must maintain a focus on acquiring the skills and technology required to build high-impact applications even while the initial usage scenarios are focused on data warehouse-related infrastructure efficiency. The supporting technology needs to enable ever lower-latency decisions driven by advanced analytics. The relevant skills should be able to leverage ever more integrated technology building blocks.

Addendum

This 2017 version of the Wikibon big data and analytics forecast is just one of four related forecast reports. The others are:

  • Forecast
  • Definitions
  • Market Share
  • Continuous Applications

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content