Streamlining Real-Time Data Integration and Multi-Cloud Management
Data Integration and real-time data are crucial to an organization doing online business and working towards digital transformation. The success of Big Data programs depends on how efficiently an organization can collect data, process and integrate it, and then analyze it. Building flexible Data Management architectures that can support the volume and variety of data steaming into organizations today is key. Data-driven businesses need tools for both Big Data Integration, and Cloud Data Integration. Dan Potter, the Vice President of Product Marketing at Attunity, remarked in a recent DATAVERSITY® interview that:
“Data Integration has been around as long as data’s been around. It’s evolved a lot over the years and when you look at the data infrastructure changes, relational databases were the thing back in the ’80s. Then the rise of data warehouses in the ’90s. Then all of a sudden we start to move to more distributed computing, and now cloud computing.”
The Data Management infrastructure kind of evolved as well, said Potter, with the advances in Hadoop. The big cloud vendors, specifically Google, had to figure out a way to manage their own data and be able to create and query data quickly, which led to the rise of Hadoop, which then led to commercial Hadoop distribution. “You had different ways to manage the data,” he commented. “Different integration requirements for Hadoop, different challenges with Hadoop, and now lately it’s all about the cloud, and Hadoop still sits on the cloud.” Data warehouses have found a new home in the cloud. The cloud has delivered both the elasticity and cost advantages.”
Data Integration is the process of extracting data from different sources, and then filtering and transforming it into a unified view, making it useful for actionable and valuable decisions. It can include replication, and will make data available throughout the organization, supporting organizational reports and analytics. A business wanting to maximize sales and supply chain efficiency will need to have warehouse data, point-of-sale data, and shipping data.
A business wanting to train and use Machine Learning or Artificial Intelligence will aim a steady flow of data through these systems (the more data they process, the more effective they become). This flow of data can come from a variety of sources, including the Internet of Things (IoT). Use of the IoT often requires integrating databases, devices, and business systems. However, it should be noted, integration has been described as one of the leading and most expensive barriers to embracing IoT analytics. According to Gartner, half the expense of implementing an IoT program will be spent on integration.
Real-Time Data
Real-time data streaming involves the processing of massive amounts of data quickly enough so an organization’s decision-makers can react, in real time, to changing conditions, said Potter. Massive amounts of data can be stream processed, allowing organizations to respond immediately to potential security threats or fraudulent activity, and to promote profits. Data from the IoT comes with a significant amount of volume and dealing with it efficiently requires real-time data streaming. Potter commented:
“If you think about what’s happening in the analytics world, you’ve got AI and Machine Learning. The more data that you can feed into those models the more effective they are. You’ve got IoT data, which is lots of volume. You’ve got predictive and prescriptive analytics. You’ve got automation happening in terms of decision-making. The faster I can have an insight and make a good decision to lower my risk, the lower my cost, the greater the opportunities I’m finding. There’s a lot on the analytics side that’s driving the need for the new data infrastructure, and the cloud is helping to address those challenges.
There’s also a lot of disruption and the disruption’s being caused by startups who are more agile. “It’s the fast eating the slow,” he said. “It’s also the fast eating the big.” It’s really changed, and that puts a premium on real-time data, for real-time analytics, for real-time decisioning so they can be much more competitive.”
Attunity and Big Data Integration
In supporting Business Intelligence, organizations rely on data warehouses as a central repository for data collected from a variety sources. Normally, ETL (Extract, Transform, and Load) tools are used to pull data from the different sources, which is then cleansed and transformed into a useable format.
Because Big Data Integration requires communicating with a variety of data sources, some organizations use several data movement tools. This is usually accomplished in batches at night, and can be a cumbersome, disruptive, and time-consuming process, even with an ETL automation tool. The most promising CDC solutions provide filters designed to reduce the amount of data transferred. This has the effect of minimizing the resource requirements and of maximizing speed and efficiency.
Big Data Analytics can influence and reshape a business’s sales, operations, and strategy. For example, processing customer data in real-time can offer new revenue opportunities. Use of the Internet of Things can improve operational efficiency, provide new insights, and reduce risk. Machine Learning has the potential to accelerate and improve business predictions. As a Data Integration platform, Attunity Replicate makes it fairly easy to create and integrate Big Data, with no need for manual coding or a deep technical understanding of system interfaces.
Change Data Capture
Change Data Capture (CDC) describes software that is used to read (and track) data that has changed. CDCs are an approach to Data Integration based on identifying, capturing, and delivering the changes made to data sources, rather than replacing a block of data in its entirety. It is generally faster and more efficient.
A primary goal of all Change Data Capture programs is greater efficiency. This is accomplished by lowering to a minimum the amount of data used in processing Big Data. It is considered wasteful to transfer all the changes, when only specific changes need to be captured. Dan commented, “That’s the heart of what we do, and what makes us different. Our Change Data Capture moves just the changes.”
While CDC solutions typically occur in Data-Warehouse environments, these solutions can be used in any database, or any data repository system. To be competitive, a business must continue to acquire the most up-to-date data available. CDCs help in acquiring that data and streamlining the data correction process.
Attunity and The Cloud
There are two basic types of cloud: private and public. Hybrid- and multi-clouds describe when both are being used by the same organization or individual with various vendors mixed in as well. The use of different clouds allows for access to different tools. Some use multiple public clouds exclusively, which cuts IT and maintenance costs, as well as the initial installation costs. Cloud clients pay only for the time that is used. As people continue to move to the cloud, they are looking more and more for real-time data, followed by real-time decision-making. Potter commented:
“The Cloud has brought both the elasticity and cost advantages, but it has also fundamentally separated the concepts of storage and processing. That has really ushered in a new era of data infrastructure technologies, in support of modern analytics and modern operational needs.”
Change data capture, as a foundational technology, has really found its place in this movement to the cloud, he said. When a change happens on a transactional system – even a 30-year-old legacy system like a mainframe that runs most of the banking transactions, insurance transactions today and it’s not going to change for a long time – “most every time those transactions happen on those core systems we route them to cloud now. We route them to other systems. We route them where and when you need them,” said Potter of Attunity’s platform. That style of real-time Data Integration and doing it in a way that’s not invasive to those production systems’ absolute requirements is really what sets Attunity apart from others in the space.
Potter said that they are able to say to customers:
“We can address all your needs today and in the future. We’re agnostic. Whatever data mart you want to move to, whether it’s stay on Teradata or move it to the Snowflake in the cloud we got you covered. You want to start on Amazon, but two years from now move to Azure we can do that and we can do all of that.”
Image used under license from Shutterstock.com
artificial intelligence Big Data change data capture Cloud Data integration Data Management Hadoop IoT Machine Learning real-time data