Containers key for Hortonworks alliance on big data hybrid

NEW YORK — Hortonworks forged a deal with IBM and Red Hat to produce the Open Hybrid Architecture Initiative. The goal of the Hortonworks alliance is to build a common architecture for big data workloads running both on the cloud and in on-premises data centers.

Central to the Hortonworks alliance initiative is the use of Kubernetes containers. Such container-based data schemes for cloud increasingly appear to set the tone for big data architecture in future data centers within organizations.

Hortonworks’ deal was discussed as part of the Strata Data Conference here, where computing heavyweight Dell EMC also disclosed an agreement with data container specialist BlueData Software to present users with reference architecture that brings cloud-style containers on premises.

Big data infrastructure shifts

Both deals indicate changes are afoot in infrastructure for big data. Container-based data schemes for cloud are starting to show the way that data will be handled in the future within organizations.

The Hortonworks alliance hybrid initiative — as well as Dell’s and other reference architecture — reflects changes spurred by the multitude of analytics engines now available to handle data workloads and as big data applications move to the cloud, said Gartner analyst Arun Chandrasekaran in an interview.

“Historically, big data was about coupling compute and storage together. That worked pretty well when MapReduce was the sole engine. Now, there are multiple processing engines working on the same data lake,” Chandrasekaran said. “That means, in many cases, customers are thinking about decoupling compute and storage.”

De-linking computing and storage

Essentially, modern cloud deployments decouple compute and storage, Chandrasekaran said. This approach is seeing greater interest in containerizing big data workloads for portability, he noted.

We are decoupling storage and compute again.
Arun Murthychief product officer and co-founder, Hortonworks

The shifts in architecture toward container orchestration show people want to use their infrastructure more efficiently, Chandrasekaran said.

The Hortonworks alliance with Red Hat and IBM shows a basic change is underway for the Hadoop-style open source distributed data processing framework. Cloud and on-premises architectural schemes are blending.

“We are decoupling storage and compute again,” said Arun Murthy, chief product officer and co-founder of Hortonworks, based in Santa Clara, Calif., in an interview. “As a result, the architecture will be consistent whether processing is on premises or on cloud or on different clouds.”

The elastic cloud

This style of architecture pays heed to elastic cloud methods.

Strata Data Conference 2018
This week’s Strata Data Conference in New York included a focus on Hortonworks’ deal with IBM and Red Hat, an agreement between Dell EMC and BlueData, and more.

“In public cloud, you don’t keep the architecture up and running if you don’t have to,” Murthy said.

That’s compared with what Hadoop has done traditionally in the data center, where clusters were often configured and sitting ready for high-peak loads.

For Lars Herrmann, general manager of the integrated solutions business unit at Red Hat, based in Raleigh, N.C., the Hortonworks alliance project is a step toward bringing in a large class of data applications to run natively on the OpenShift container platform. It’s also about deploying applications more quickly.

“The idea of containerization of applications allows organizations to be more agile. It is part of the trend we see of people adopting DevOps methods,” Herrmann said.

Supercharging on-premises applications

For its part, Dell EMC sees spinning up data applications more quickly on premises as an important part of the reference architecture it has created with help from BlueData.

“With the container approach, you can deploy different software on demand to different infrastructure,”┬áKevin Gray, director of product marketing at Dell EMC, said in an interview at Strata Data.

The notion of multi-cloud support for containers is popular, and Hadoop management and deployment software providers are moving to support various clouds. At Strata, BlueData made its EPIC software available on Google Cloud Platform and Microsoft Azure. EPIC cloud support has been available on AWS.

Big data evolves to singular architecture

Tangible benefits will accrue as big data architecture evolves shops to a more singular architecture for data processing on the cloud and in the data center, said Mike Matchett, analyst and founder of Small World Big Data, in an interview at the conference.

“Platforms need to be built such that they can handle distributed work and deal with distributed data. They will be the same on premises as on the cloud. And, in most cases, they will be hybridized, so the data and the processing can flow back and forth,” Matchett said.

There still will be some special optimizations for performance, Matchett added. IT managers will make decisions based on different workloads, as to where particular analytics processing will occur.