Tag Archives: frameworks

Google adds single-tenant VMs for compliance, license cares

Google’s latest VM runs counter to standard public cloud frameworks, but its added flexibility checks off another box for enterprise clients.

Google Cloud customers can now access sole-tenant nodes on Google Compute Engine. The benefits for these single-tenant VMs, currently in beta, are threefold: They reduce the “noisy neighbor” issue that can arise on shared servers; add another layer of security, particularly for users with data residency concerns; and make it easier to migrate certain on-premises workloads with stringent licensing restrictions.

The public cloud model was built on the concept of multi-tenancy, which allows providers to squeeze more than one account onto the same physical host, and thus operate at economies of scale. Early customers happily waived some of those advantages of dedicated hardware in exchange for less infrastructure management and the ability to quickly scale out.

But as more traditional corporations adopt public cloud, providers have added isolation capabilities to approximate what’s inside enterprises’ own data centers, such as private networks, virtual private clouds and bare-metal servers. Single tenancy applies that approach down to the hardware level, while maintaining a virtualized architecture. AWS was the first to offer single-tenant VMs with its Dedicated Instances.

Customers access Google’s single-tenant VMs the same way as its other compute instances, except they’re placed on a dedicated server. The location of that node is either auto-selected through a placement algorithm, or customers can manually select the location at launch. These instances are customizable in size, and are charged per second for vCPU and system memory, as well as a 10% sole-tenancy premium.

Single-tenant VMs another step for Google Cloud’s enterprise appeal

Google still lags behind AWS and Microsoft Azure in public cloud capabilities, but it has added services and support in recent months to shake its image as a cloud valued solely for its engineering. Google must expand its enterprise customer base, especially with large organizations in which multiple stakeholders sign off on use of a particular cloud, said Fernando Montenegro, a 451 Research analyst.

Not all companies will pay the premium for this functionality, but it could be critical to those with compliance concerns, including those that must prove they’re on dedicated hardware in a specific location. For example, a DevOps team may want to build a CI/CD pipeline that releases into production, but a risk-averse security team might have some trepidations. With sole tenancy, that DevOps team has flexibility to spin up and down, while the security team can sign off on it because it meets some internal or external requirement.

“I can see security people being happy that, we can meet our DevOps team halfway, so they can have their DevOps cake and we can have our security compliance cake, too,” Montenegro said.

I can see security people being happy … our DevOps team … can have their DevOps cake and we can have our security compliance cake, too.
Fernando Montenegroanalyst, 451 Research

A less obvious benefit of dedicated hardware involves the lift and shift of legacy systems to the cloud. A traditional ERP contract may require a specific set of sockets or hosts, and it can be a daunting task to ensure a customer complies with licensing stipulations on a multi-tenant platform because the requirements aren’t tied to the VM.

In a bring-your-own-license scenario, these dedicated hosts can optimize customers’ license spending and reduce the cost to run those systems on a public cloud, said Deepak Mohan, an IDC analyst.

“This is certainly an important feature from an enterprise app migration perspective, where security and licensing are often top priority considerations when moving to cloud,” he said.

The noisy neighbor problem arises when a user is concerned that high CPU or IO usage by another VM on the same server will impact the performance of its own application, Mohan said.

“One of the interesting customer examples I heard was a latency-sensitive function that needed to compute and send the response within as short a duration as possible,” he said. “They used dedicated hosts on AWS because they could control resource usage on the server.”

Still, don’t expect this to be the type of feature that a ton of users rush to implement.

“[A single-tenant VM] is most useful where strict compliance/governance is required, and you need it in the public cloud,” said Abhi Dugar, an IDC analyst. “If operating under such strict criteria, it is likely easier to just keep it on prem, so I think it’s a relatively niche use case to put dedicated instances in the cloud.”

Databricks platform additions unify machine learning frameworks

SAN FRANCISCO — Open source machine learning frameworks have multiplied in recent years, as enterprises pursue operational gains through AI. Along the way, the situation has formed a jumble of competing tools, creating a nightmare for development teams tasked with supporting them all.

Databricks, which offers managed versions of the Spark compute platform in the cloud, is making a play for enterprises that are struggling to keep pace with this environment. At Spark + AI Summit 2018, which was hosted by Databricks here this week, the company announced updates to its platform and to Spark that it said will help bring the diverse array of machine learning frameworks under one roof.

Unifying machine learning frameworks

MLflow is a new open source framework on the Databricks platform that integrates with Spark, SciKit-Learn, TensorFlow and other open source machine learning tools. It allows data scientists to package machine learning code into reproducible modules, conduct and compare parallel experiments, and deploy models that are production-ready.

Databricks also introduced a new product on its platform, called Runtime for ML. This is a preconfigured Spark cluster that comes loaded with distributed machine learning frameworks commonly used for deep learning, including Keras, Horovod and TensorFlow, eliminating the integration work data scientists typically have to do when adopting a new tool.

Databricks’ other announcement, a tool called Delta, is aimed at improving data quality for machine learning modeling. Delta sits on top of data lakes, which typically contain large amounts of unstructured data. Data scientists can specify a schema they want their training data to match, and Delta will pull in all the data in the data lake that fits the specified schema, leaving out data that doesn’t fit.

MLflow's tracking user interface
MLflow includes a tracking interface for logging the results of machine learning jobs.

Users want everything under one roof

Each of the new tools is either in a public preview or alpha test stage, so few users have had a chance to get their hands on them. But attendees at the conference were broadly happy about the approach of stitching together disparate frameworks more tightly.

Saman Michael Far, senior vice president of technology at the Financial Industry Regulatory Authority (FINRA) in Washington, D.C., said in a keynote presentation that he brought in the Databricks platform largely because it already supports several query languages, including R, Python and SQL. Integrating these tools more closely with machine learning frameworks will help FINRA use more machine learning in its goal of spotting potentially illegal financial trades.

You have to take a unified approach. Pick technologies that help you unify your data and operations.
John Golesenior director of business analysis and product management at Capital One

“It’s removed a lot of the obstacles that seemed inherent to doing machine learning in a business environment,” Far said.

John Gole, senior director of business analysis and product management at Capital One, based in McLean, Va., said the financial services company has implemented Spark throughout its operational departments, including marketing, accounts management and business reporting. The platform is being used for tasks that range from extract, transform and load jobs to SQL querying for ad hoc analysis and machine learning. It’s this unified nature of Spark that made it attractive, Gole said.

Going forward, he said he expects this kind of unified platform to become even more valuable as enterprises bring more machine learning to the center of their operations.

“You have to take a unified approach,” Gole said. “Pick technologies that help you unify your data and operations.”

Bringing together a range of tools

Engineers at ride-sharing platform Uber have already built integrations similar to what Databricks unveiled at the conference. In a presentation, Atul Gupte, a product manager at Uber, based in San Francisco, described a data science workbench his team created that brings together a range of tools — including Jupyter, R and Python — into a web-based environment that’s powered by Spark on the back end. The platform is used for all the company’s machine learning jobs, like training models to cluster rider pickups in Uber Pool or forecast rider demand so the app can encourage more drivers to get out on the roads.

Gupte said, as the company grew from a startup to a large enterprise, the old way of doing things, where everyone worked in their own silo using their own tool of choice, didn’t scale, which is why it was important to take this more standardized approach to data analysis and machine learning.

“The power is that everyone is now working together,” Gupte said. “You don’t have to keep switching tools. It’s a pretty foundational change in the way teams are working.”

What are the options for OpenStack-supported hypervisors?

while retaining control over their own infrastructure and data. Cloud frameworks don’t provide the underlying virtualization for an enterprise private cloud, making it critical that a cloud framework support as many major hypervisors as possible. There is a wide range of OpenStack-supported hypervisors, and you should carefully consider the level of support each provides and how that matches your particular needs.

Together, VMware and Microsoft currently hold the majority of the hypervisor marketplace. Microsoft Hyper-V can run Windows, Linux and FreeBSD VMs under OpenStack, while VMware vSphere 5.1.0 and later will support VMware-based Linux and Windows images through vCenter Server. XenServer and Xen Cloud Platform can run Linux or Windows VMs, though the Nova compute service must be installed in a paravirtualized VM. Even OpenStack Nova compute supports the native Ironic bare-metal hypervisor for machine provisioning and control.

Use libvirt with Linux-based hypervisors

It’s important to remember that all hypervisors aren’t created equal.

Many OpenStack-supported hypervisors are Linux-based but will typically require the libvirt open API for virtualization and management. For example, libvirt will allow Kernel-based Virtual Machine under OpenStack, and KVM versions are available to run PowerPC and Power Architecture processors, IBM System/390 mainframes and more conventional x86 processor architectures. The Xen Project hypervisor will run under libvirt to support Linux, Windows, FreeBSD and NetBSD VMs under OpenStack Nova. Libvirt supports Virtuozzo 7.0.0 and later for containers and VMs based on KVM.

Generally, OpenStack will also use libvirt to support Linux Containers, Quick EMUlator and User-mode Linux, though these platforms are rarely used outside of legacy application maintenance.

It’s important to remember that all hypervisors aren’t created equal, and OpenStack-supported hypervisors might not receive the same level of support, stability, performance or interoperability. Private cloud adopters should invest time into performing due diligence tests and experiments to verify the compatibility between the chosen hypervisor and cloud framework to ensure adequate results for the needs of the specific enterprise.