Wikibon conducted a web survey of 300 US enterprises that had either deployed or were evaluating a Big Data Solution in May 2014. The following is a first-pass frequency analysis of the survey responses. The size and breadth of the sample base will enable us to do further segmentations to identify key insights, but these are not provided in this document. Such analysis will be conducted in the coming weeks and made available to Wikibon clients.
Methodology and Respondent Profile
For the purpose of this study, we described Big Data Analytics projects as those that:
- Leverage non-traditional data management tools and technologies such as Hadoop,NoSQL, or MPP analytic databases and/or
- Involve the analysis of multi-structured and/or unstructured data such asclickstream, text, log file, and social media data. An example of such a use casewould be the use of Hadoop to store, transform and analyze mobile sensor data.
- Big Data projects do not include projects solely involving the use of relationaldatabases to analyze traditional structured data associated (e.g., CRM, ERP, Finance, etc.)
We obtained a broad distribution of enterprise types, led by IT Technology Providers, Healthcare, and Banking & Finance. Figure 1 shows the industry distribution of the sample.
The enterprises in our sample ranged from companies with at least $10M in annual revenue and above. The median company size was in the $100M-$500M revenue range and had between 1,000 and 5,000 employees.
Respondent’s level of responsibility ranged from C-level to Managers. 73% of respondents described themselves as being “Very familiar” with Big Data with 27% being “Somewhat familiar”. In terms of their role in relation to Big Data, respondents fell evenly across the five major categories in Figure 2, representing different perspectives on the issue of importance and barriers to Big Data deployments.
State of Big Data Analytics Deployments
We asked respondents, all of whom were are at least “somewhat familiar” with Big Data analytics technology, to identify at which stage their organization was in the Big Data adoption and deployment process (Figure 3).
41% of respondents reported that their organization was currently evaluating Big Data analytics technologies and use cases. 28% said there was at least one Big Data analytics pilot or proof-of-concept project underway. 31% said there was at least one Big Data analytics project in production at their organization.
The level of Big Data analytics adoption and deployment maturity of the survey’s respondents is important to consider when analyzing the full survey results. Based on this data, combined with their level of knowledge about Big Data (see above), it is apparent that the 303 respondents represent what might be called early adopters and early majority adopters of Big Data analytics technologies and methods.
Regardless of level of adoption and deployment maturity, a vast majority of respondents reported that Big Data analytics are an important element of their organizations’ data management and business intelligence programs (Figure 4).
Most respondents reported positive results, at least initially, related to Big Data Analytics projects (Figure 5). 41% reported they have realized the full value of their investment in Big Data technology, services and staff, with 58% reporting just partial return on Big Data investment but moving in the right direction. This high proportion of positive results is likely due in part to the relatively low-level of investment made to date in Big Data technologies, services and staff, but is none-the-less encouraging that most practitioners feel positively about such investments.
As for financial drivers of Big Data projects (Figure 6), 20% of respondents reported that saving money was the primary driver, with 23% saying increasing revenue was the primary driver. A majority, 55%, reported both saving money and increasing revenue were primary drivers of their organizations Big Data Analytics projects.
The 20% of respondents that said saving money is the primary driver are at risk of overlooking the revenue realization potential of Big Data Analytics which manifest themselves n a number of ways – identifying cross-sell and upsell opportunities, better advertising and marketing campaign targeting, and developing net new products and lines of business.
Data Types and Volume, Technologies in Use in Big Data Analytics
As part of the survey, we asked a series of questions to get a better understanding of the types of data being analyzed and the tools and technologies in use.
Regarding types of data involved in Big Data Analytics projects (Figure 7), social media data was the most common data involved. Practitioners are likely exploring social media data such as Tweets and Facebook posts to better understand customer sentiment and to identify customer service-related issues/complaints.
Structured data from internal systems was the second most common type reported by respondents (50%). While most think of unstructured data when thinking of Big Data, it is logical that many practitioners begin new projects looking at data they already have under management in relational systems.
Other data types reportedly involved in Big Data Analytics projects were log data, mobile device data, clickstream data, sensor data and freely available data from public sources.
In a related question, we asked respondents if they had integrated two or more types of disparate data together for analysis (Figure 8). Merging disparate data sets can provide new and unexpected analytic insights and represents the real promise of Big Data Analytics.
Encouragingly, over 60% of respondents reported they had indeed integrated two or more disparate data sets, with 32% planning to do so. However, as noted later in this frequency analysis, data integration is also one of the top Big Data Analytics-related challenges according to respondents.
As for data volume (Figure 9), the sweet spot, according to respondents, for Big Data Analytics projects is between 50 and 99 terabytes of data (33%). While Big Data is often equated with petabytes or more of data, just 1% of respondents reported deployments of petabyte scale.
Which technologies and tools qualify as Big Data is a hot topic of debate. What is clear from the survey respondents is that “traditional” technologies and tools are being applied to Big Data workloads in conjunction with more emerging technologies such as Hadoop and NoSQL databases (Figure 10).
While Hadoop is an important technology in Big Data Analytics deployments, traditional data warehouses continue to play a role, at least for the time being. Not surprisingly, data integration technologies are also critical for moving data between systems when required and to assist in data transformations.
Professional services played a role in 72% of Big Data Analytics projects, according to respondents (Figure 11). This is not surprising considering the relative immaturity and complexity of related Big Data technology. It is also in line with Wikibon’s Big Data market sizing for Big Data revenue, which found professional services to constitute the largest of the three market segments at 40% (the other two being hardware at 38% and software at 22%.)
Respondents were asked about the role, if any, the public cloud played n Big Data Analytics deployments (Figure 12). 58% of respondents said their organization is using the public cloud and related services for at least some portion of their Big Data Analytics deployment, with 26% saying they plan to tap the cloud in the future.
Further analysis is needed to identify the nuances of this finding, but it is clear the public cloud is and will continue to play an increasingly important role in supporting Big Data Analytics projects.
Hadoop in the Enterprise
As part of the survey, we asked just those respondents that said they have already deployed Hadoop (36%) about their use and planned use of the open source Big Data framework.
One topic getting significant attention from vendors and practitioners alike is the relationship between Hadoop and the data warehouse. As highlghted above, a significant proportion of practitioners consider the data warehouse an important part of Big Data Analytics projects. Still, of those responents using Hadoop, 61% said they had shifted at least one workload from a data warehouse or mainframe to Hadoop. Another 34% siad they plan to shift workloads in the future (Figure 13).
These findings have important implications for both Hadoop and data warehouse vendors.
We also asked Hadoop practitioners about how they sourced the technology (Figure 14). Only 25% of Hadoop practitioners are paying customers of one or another Hadoop vendor. 24% use a free distribution provided by a vendor, but the majority, 51%, roll their own Hadoop downloaded from the Apache Software Foundation. This finding indicates that these practitioners believe they are sophisticated enough to make use of Hadoop without vendor support, at least initially. As PoCs move to production, however, these findings suggest there is a major opportunity for vendors to provide enterprise- level support services.
Another hot topic in the Hadoop community is SQL-on-Hadoop (or SQL-in-Hadoop) (Figure 15). The basic premise behind SQL-on-Hadoop is to provide users the ability to manipulate Hadoop-based data with a language they now well (SQL) and without having to move data out of Hadoop for analysis.
While SQL-on-Hadoop tools are still raw, a majority of survey respondents (84%) reported the use of the technology by Data Scientists and analysts working with Hadoop. A sub-segment of these respondents are using other methods in conjunction with SQL-on-Hadoop.
To get a sense of their commitment to Hadoop, we asked respondents if their Hadoop deployments spanned multiple data centers (Figure 16). A staggering 70% reported multiple data center Hadoop deployments, with 17% planning to expand to multiple data centers in the future. It is important to note that respondents were not asked how or if these deployments were linked, meaning many are likely isolated deployments within the same organization. Still, deployments spanning data centers will likely become more common and more sophisticated over time.
Big Data Analytics Barrier and Challenges
As part of this survey, we asked respondents to identify the biggest barriers to success when it comes to Big Data Analytics projects. We asked about both technology-related barriers and non-technology related barriers.
The top technology-related barrier to Big Data Analytics success as identified by survey respondents was difficulty transforming data into a suitable form for analysis (Figure 17*). This finding validates anecdotal evidence that data transformation takes up as much as 80% of a Data Scientist’s time prior to actually analyzing data for insights. Difficulty integrating Big Data technology with existing infrastructure and difficulty merging disparate data sets were the second and third, respectively, most cited technology-related barriers to success.
As for non-technology-related barriers (Figure 18*), getting stakeholders to agree on data definitions and “selling” the value of Big Data Analytics to the consumers of insights were the top two as reported by survey respondents. Concerns regarding security and privacy implications of Big Data Analytics was third.
These particular findings require further analysis, but what comes through in the data is that non-technology challenges such as these are equally if not more important that the technology-related barriers. Successfully navigating political and cultural issues are key to Big Data Analytics success.
Specific to Hadoop, the most influential concerns preventing practitioners from moving Hadoop PoCs to production deployments all focused on enterprise-grade issues (Figure 19). Concerns about a lack of enterprise-grade backup and recovery topped the list, followed by high availability and security concerns.
Finally, we asked a series of questions regarding challenges to specific parts of the Big Data Analytics deployment and management process. These findings are represented in Figures 20 through 23. The top data visualization-related challenge was integrating data visualizations into existing applications. Understanding and complying with regulations was the top security and privacy challenge. The top data integration challenge was getting source data into analytics environments in real time. Monitoring and reporting compliance was the top data governance challenge.
It is clear from survey respondents that Big Data Analytics is now and will continue to play a critical role in enterprises across industries. The technologies in use and the varied use cases indicate that Big Data is not a passing fad and that enterprises are setting the stage for significant investment in related products and services. Further analysis of this survey data is required to identify top use cases, level of maturity, and top challenges by industry and practitioner role. Such analysis will be provided to Wikibon clients in the form of a detailed report and presentation by analysts in Q3 2014.