A new bulk data uploader from Talend seeks to boost Azure data integration, specifically for users of Microsoft’s Azure SQL Data Warehouse cloud service.
The Talend tool’s immediate targets are the large data volumes now held in Azure Blob Storage or Azure Data Lake Storage repositories. Getting such data into Azure SQL Data Warehouse to support analytics uses can take considerable effort. Offered as part of the Talend Cloud Summer ’18 release, the software handles data in bulk and streamlines workflows via an interface that allows users to select data preparation steps.
Although integration specialist Talend calls the new technology an uploader, a common first use case is for what might be described more as a cross-loader — that is, a tool for moving data around once it is in the Microsoft Azure cloud.
Such tools are useful, according to IDC analyst Stewart Bond, because some fundamental data-handling rules apply whether the data resides in a data center or in the cloud.
“The cloud is [akin to] a data center that data will move across,” Bond said. “That is why we’re seeing the major cloud providers build, partner with or acquire data movement and integration capabilities.”
Be aware of the blob
The shift to cloud-based object storage — Azure Blob Storage, in Microsoft’s case — changes the details, but not the overall tenor of data integration work, Bond added.
“Having data in a cloud data lake or blob storage that needs to be moved into a cloud data warehouse is no different than having data in staging areas in the data center prior to moving it into an on-premises data warehouse,” he said. “Putting data in the cloud doesn’t make it any cleaner or less messy than it would be anywhere else.”
For Microsoft cloud users, tools like Talend’s help move data “from Azure source to Azure destination,” so it can be more easily accessed for analytical querying, according to Vincent Lam, head of cloud product marketing at Talend, based in Redwood City, Calif. “You have all this data that has been aggregated [in the Azure cloud], which is terrific. But it’s hard to use,” he said.
Initially, when it comes to the cloud and data, “the data warehouse is the killer use case,” Lam continued. He said global bulk data moves such those supported by the Talend uploader are key to improving Azure data integration performance.
Gen2 of Azure data warehouse incarnate
Stewart Bondanalyst, IDC
These days, the target for Azure data integration jobs is often Azure SQL Data Warehouse, Microsoft’s competitive answer to Amazon Redshift and other cloud data warehouses.
The data warehouse technologies themselves are at the center of a cloud computing arms race, with updates coming regularly. For example, Microsoft released a second-generation version of its cloud data warehouse in late April; last week, it made the Azure SQL Data Warehouse Compute Optimized Gen2 service tier available in France and Australia, increasing the number of Azure regions where the updated software can be used to 22.
Gen2 introduces unlimited columnar storage capacity and boosts both computing power and query performance by five times over the first incarnation of Azure SQL Data Warehouse, according to Microsoft. The company also claimed that Gen2 can run up to 128 concurrent queries — four times more than the initial version, which Microsoft still offers as a lower-end option now known as Gen1.
Cloud integration tools in context
For Microsoft, specialized tools like the Talend bulk data uploader expand the scope of Azure data integration. That is important because filling cloud data warehouses with data for analytics applications is a first step for nascent cloud efforts in many organizations.
IDC’s Bond said tools for moving data within clouds like the AWS and Azure ones are incorporated into integration platform-as-a-service (iPaaS) offerings. He included Informatica, Tibco, SnapLogic and Talend as iPaaS vendors, among other providers.
Moving data to the cloud generally is complicated, too, Bond said. He cited Talend, Attunity and others among providers of technologies that can alleviate some throughput constraints and accelerate data movement to the cloud.