Analytic database vendors are quick to tout the real-time analytics capabilities of their products. By real-time, however, most of these vendors are referring more to what could be called near real-time interactive query response times. That is, query response times that are fast enough to meet the demands of human end-users allowing them to perform iterative analytic problem solving.
To truly deserve the label “real time,” analytic databases must not only provide interactive query capabilities, but the data being analyzed itself must be complete and up-to-date. The data is often sourced from multiple heterogeneous operational databases, each with their own unique data integration challenges. Complicating matters further, destination analytic databases are increasingly deployed in the cloud.
Traditional extract, transform and load (ETL) approaches to data integration are not suitable for this type of use case. Attunity, a Burlington, Mass.-based software maker, applies change data capture (CDC) to this Big Data problem. While ETL is a batch-oriented data integration method, CDC methods detect changes in source database systems and update target analytic databases with only those changes.
Attunity has made a number of enhancements to its software to adapt CDC to Big Data workloads. Among them, Attunity’s TurboStream CDC feature performs a limited amount of pre-processing on the data from source systems and consolidates it before updating target databases. This differs from traditional CDC technology that uses a serial-update model. Attunity also employs in-memory streaming technology to maximize performance and relies on the power of target databases to perform the remaining data transformations.
The result is near-real time updates to target analytic databases. This provides business analysts the most up-to-date data available at any given time and also improves the timeliness of dashboards and other business intelligence applications used to monitor and analyze operational environments.
It is important to note that Attunity’s software also applies data replication techniques to assist with initial loading of large volumes of data to target analytic databases. It’s CDC capabilities compliment initial data loading provides continuous real-time data integration deployed and managed via point-and-click GUIs rather than complex hand coding.
A key advantage for Attunity over its rivals in this space is the company’s independence. Attunity is data source and data target agnostic. This allows the company to partner closely with most of the major database vendors. Attunity has developed optimized replication and CDC technology for Actian Vectorwise, HP Vertica, IBM Netezza, Teradata and AWS RedShift, among others. Attunity also recently acquired Hayes Technology Group, whose data replication technology is optimized for SAP HANA.
Competitors to Attunity include IBM InfoSphere Data Replication and Oracle GoldenGate. Both IBM and Oracle lack the database agnosticism of Attunity, but have a significantly wider-breadth of data integration and related data management/Big Data capabilities. Another competitor is Informatica with its PowerExchange Change Data Capture product, but the company’s data integration portfolio is heavily focused on ETL, which garners the majority of Informatica’s R&D budget.
Action Item: Big Data practitioners cannot afford to overlook the data integration challenges associated with large-scale data analytics deployments. Such deployments require a blend of data integration methods — including ETL, data replication and change data capture – depending on the particular analytic use cases involved.