Tag Archives: York

Containers key for Hortonworks alliance on big data hybrid

NEW YORK — Hortonworks forged a deal with IBM and Red Hat to produce the Open Hybrid Architecture Initiative. The goal of the Hortonworks alliance is to build a common architecture for big data workloads running both on the cloud and in on-premises data centers.

Central to the Hortonworks alliance initiative is the use of Kubernetes containers. Such container-based data schemes for cloud increasingly appear to set the tone for big data architecture in future data centers within organizations.

Hortonworks’ deal was discussed as part of the Strata Data Conference here, where computing heavyweight Dell EMC also disclosed an agreement with data container specialist BlueData Software to present users with reference architecture that brings cloud-style containers on premises.

Big data infrastructure shifts

Both deals indicate changes are afoot in infrastructure for big data. Container-based data schemes for cloud are starting to show the way that data will be handled in the future within organizations.

The Hortonworks alliance hybrid initiative — as well as Dell’s and other reference architecture — reflects changes spurred by the multitude of analytics engines now available to handle data workloads and as big data applications move to the cloud, said Gartner analyst Arun Chandrasekaran in an interview.

“Historically, big data was about coupling compute and storage together. That worked pretty well when MapReduce was the sole engine. Now, there are multiple processing engines working on the same data lake,” Chandrasekaran said. “That means, in many cases, customers are thinking about decoupling compute and storage.”

De-linking computing and storage

Essentially, modern cloud deployments decouple compute and storage, Chandrasekaran said. This approach is seeing greater interest in containerizing big data workloads for portability, he noted.

We are decoupling storage and compute again.
Arun Murthychief product officer and co-founder, Hortonworks

The shifts in architecture toward container orchestration show people want to use their infrastructure more efficiently, Chandrasekaran said.

The Hortonworks alliance with Red Hat and IBM shows a basic change is underway for the Hadoop-style open source distributed data processing framework. Cloud and on-premises architectural schemes are blending.

“We are decoupling storage and compute again,” said Arun Murthy, chief product officer and co-founder of Hortonworks, based in Santa Clara, Calif., in an interview. “As a result, the architecture will be consistent whether processing is on premises or on cloud or on different clouds.”

The elastic cloud

This style of architecture pays heed to elastic cloud methods.

Strata Data Conference 2018
This week’s Strata Data Conference in New York included a focus on Hortonworks’ deal with IBM and Red Hat, an agreement between Dell EMC and BlueData, and more.

“In public cloud, you don’t keep the architecture up and running if you don’t have to,” Murthy said.

That’s compared with what Hadoop has done traditionally in the data center, where clusters were often configured and sitting ready for high-peak loads.

For Lars Herrmann, general manager of the integrated solutions business unit at Red Hat, based in Raleigh, N.C., the Hortonworks alliance project is a step toward bringing in a large class of data applications to run natively on the OpenShift container platform. It’s also about deploying applications more quickly.

“The idea of containerization of applications allows organizations to be more agile. It is part of the trend we see of people adopting DevOps methods,” Herrmann said.

Supercharging on-premises applications

For its part, Dell EMC sees spinning up data applications more quickly on premises as an important part of the reference architecture it has created with help from BlueData.

“With the container approach, you can deploy different software on demand to different infrastructure,” Kevin Gray, director of product marketing at Dell EMC, said in an interview at Strata Data.

The notion of multi-cloud support for containers is popular, and Hadoop management and deployment software providers are moving to support various clouds. At Strata, BlueData made its EPIC software available on Google Cloud Platform and Microsoft Azure. EPIC cloud support has been available on AWS.

Big data evolves to singular architecture

Tangible benefits will accrue as big data architecture evolves shops to a more singular architecture for data processing on the cloud and in the data center, said Mike Matchett, analyst and founder of Small World Big Data, in an interview at the conference.

“Platforms need to be built such that they can handle distributed work and deal with distributed data. They will be the same on premises as on the cloud. And, in most cases, they will be hybridized, so the data and the processing can flow back and forth,” Matchett said.

There still will be some special optimizations for performance, Matchett added. IT managers will make decisions based on different workloads, as to where particular analytics processing will occur.

Matt Wood talks AWS’ AI platform, ethical use

NEW YORK — AWS spotlighted its evolving AI offerings at AWS Summit this week, with a variety of features and upgrades.

The company incorporated one emerging technology, Amazon Rekognition, into the event’s registration process as it scanned consenting attendees’ faces and compared them against photos submitted previously during registration.

But despite outlines for customers’ use, the AWS AI platform is not immune to growing concerns over potentially unethical usage of these advanced systems. Civil rights advocacy groups worry that technology providers’ breakneck pace to provide AI capabilities, such as Rekognition, could lead to abuses of power in the public sector and law enforcement, among others.

Matt Wood, AWS general manager of deep learning and AI, discussed advancements to the AWS AI platform, adoption trends, customer demands and ethical concerns in this interview.

AWS has added a batch transform feature to its SageMaker machine learning platform to process data sets for non-real-time inferencing. How does that capability apply to customers trying to process larger data files?

Matt Wood: We support the two major ways you’d want to run predictions. You want to run predictions against fresh data as it arrives in real time; you can do that with SageMaker-hosted endpoints. But there are tons of cases in which you want to be able to apply predictions to large amounts of data, either that just arrives or gets exported from a data warehouse, or that is just too large in terms of the raw data size to process one by one. These two things are highly complementary.

We see a lot of customers that want to run billing reports or forecasting. They want to look at product sales at the end of a quarter or the end of a month [and] predict the demand going forward. Another really good example is [to] build a machine learning model and test it out on a data set you understand really well, which is really common in oil and gas, medicine and medical imaging.

In the keynote, you cited 100 new machine learning features or services [AWS has developed] since re:Invent last year. What feedback do you get from customers for your current slate [of AI services]?

Wood: What we heard very clearly was a couple things. No. 1, customers really value strong encryption and strong network isolation. A lot of that has to do with making sure customers have good encryption integrated with Key Management Service inside SageMaker. We also recently added PrivateLink support, which means you can connect up your notebooks and training environment directly to DynamoDB, Redshift or S3 without that data ever flowing out over the private internet. And you can put your endpoints over PrivateLink as well. [Another] big trend is around customers using multiple frameworks together. You’ll see a lot of focus on improving TensorFlow, improving Apache MXNet, adding Chainer support, adding PyTorch support and making sure ONNX [Open Neural Network Exchange] works really well across those engines so that customers can take models trained in one and run them in a different engine.

Matt Wood, AWS GM of Deep Learning and AIMatt Wood speaks during the AWS Summit keynote address (Source: AWS).

What do you hear from enterprises that are reluctant or slow to adopt AI technologies? And what do you feel that you have to prove to those customers?

Wood: It’s still early for a lot of enterprises, and particularly for regulated workloads, there’s a lot of due diligence to do — around HIPAA [Health Insurance Portability and Accountability Act], for example, getting HIPAA compliance in place. The question is: ‘How can I move more quickly?’ That’s what we hear all the time.

There’s two main pathways that we see [enterprises take] today. The first is: They try and look at the academic literature, [which] is very fast-moving, but also very abstract. It’s hard to apply it to real business problems. The other is: You look around on the web, find some tutorials and try to learn it that way. That often gives you something which is up and running that works, but again, it glosses over the fundamentals of how do you collect training data, how do you label that data, how do you build and define a neural network, how do you train that neural network.

To help developers learn, you want a very fast feedback loop. You want to be able to try something out, learn from it, what worked and what didn’t work, then make a change. It’s kick-starting that flywheel, which is very challenging with machine learning.

What are some usage patterns or trends you’ve seen from SageMaker adopters that are particularly interesting?

Wood: A really big one is sports analytics. Major League Baseball selected SageMaker and the AWS AI platform to power their production stats that they use in their broadcasts and on [their] app. They’ve got some amazing ideas about how to build more predictive and more engaging visuals and analytics for their users. [It’s the] same thing with Formula 1 [F1]. They’re taking 65 years’ worth of performance data from the cars — they have terabytes of the stuff — to model different performance of different cars but also to look at race prediction and build an entirely new category of visuals for F1 fans. The NFL [is] doing everything from computer vision to using player telemetry, using their position on the field to do route prediction and things like that. Sports analytics drives such an improvement in the experience for fans, and it’s a big area of investment for us.

Another is healthcare and medical imaging. We see a lot of medical use cases — things like disease prediction, such as how likely are you to have congestive heart failure in the next 12 months, do outpatient prediction, readmittance prediction, those sorts of things. We can actually look inside an X-ray and identify very early-stage lung cancer before the patient even knows that they’re sick. [And] you can run that test so cheaply. You can basically run it against any chest X-ray.

You partnered with Microsoft on Gluon, the deep learning library. What’s the status of that project? What other areas might you collaborate with Microsoft or another major vendor on an AI project?

Wood: Gluon is off to a great start. Celgene, a biotech that’s doing drug toxicity prediction, is trying to speed up clinical trials to get drugs to market more quickly. All of that runs in SageMaker, and they use Gluon to build models. That’s one example; we have more.

Other areas of collaboration we see is around other engines. For example, we were a launch partner for PyTorch 1.0 [a Python-based machine learning library, at Facebook’s F8 conference]. PyTorch has a ton of interest from research scientists, particularly in academia, [and we] bring that up to SageMaker and work with Facebook on the development.

Microsoft President Bradford Smith recently called on Congress to consider federal regulation for facial recognition services. What is Amazon’s stance on AI regulation? How much should customers determine ethical use of AI, facial recognition or other cloud services, and what is AWS’ responsibility?

Wood: Our approach is that Rekognition, like all of our services, falls under our Acceptable Use Policy, [which] is very clear with what it allows and what it does not allow. One of the things that it does not allow is anything unconstitutional; mass surveillance, for example, is ruled out. We’re very clear that customers need to take that responsibility, and if they fall outside our Acceptable Use [Policy}, just like anyone else on AWS, they will lose access to those services, because we won’t support them. They need to be responsible with how they test, validate and communicate their use of these technologies because they can be hugely impactful.

AWS Summit Rekognition kiosksAmazon Rekognition kiosks scan the faces of attendees and print identification badges (Source: David Carty).

The American Civil Liberties Union, among others, has asked AWS to stop selling Rekognition to law enforcement agencies. Will you comply with that request? If not, under what circumstances might that decision change?

Wood: Again, that’s covered under our Acceptable Use Policy. If any customer in any domain is using any of our services in a way which falls outside of acceptable use, then they will lose access to that service.

Certainly, the Acceptable Use Policy covers lawful use, but do you think that also covers ethical use? That’s a thornier question.

Wood: It is a thornier question. I think it’s part of a broader dialogue that we need to have, just as we’ve had with motor cars and any large-scale technology which provides a lot of opportunity, but which also needs a public and open discussion.

Debugging data: Microsoft researchers look at ways to train AI systems to reflect the real world – The AI Blog

Photo of Microsoft researcher Hanna Walach
Hanna Wallach is a senior researcher in Microsoft’s New York City research lab. Photo by John Brecher.

Artificial intelligence is already helping people do things like type faster texts and take better pictures, and it’s increasingly being used to make even bigger decisions, such as who gets a new job and who goes to jail. That’s prompting researchers across Microsoft and throughout the machine learning community to ensure that the data used to develop AI systems reflect the real world, are safeguarded against unintended bias and handled in ways that are transparent and respectful of privacy and security.

Data is the food that fuels machine learning. It’s the representation of the world that is used to train machine learning models, explained Hanna Wallach, a senior researcher in Microsoft’s New York research lab. Wallach is a program co-chair of the Annual Conference on Neural Information Processing Systems from Dec. 4 to Dec. 9 in Long Beach, California. The conference, better known as “NIPS,” is expected to draw thousands of computer scientists from industry and academia to discuss machine learning – the branch of AI that focuses on systems that learn from data.

“We often talk about datasets as if they are these well-defined things with clear boundaries, but the reality is that as machine learning becomes more prevalent in society, datasets are increasingly taken from real-world scenarios, such as social processes, that don’t have clear boundaries,” said Wallach, who together with the other program co-chairs introduced a new subject area at NIPS on fairness, accountability and transparency. “When you are constructing or choosing a dataset, you have to ask, ‘Is this dataset representative of the population that I am trying to model?’”

Kate Crawford, a principal researcher at Microsoft’s New York research lab, calls it “the trouble with bias,” and it’s the central focus of an invited talk she will be giving at NIPS.

“The people who are collecting the datasets decide that, ‘Oh this represents what men and women do, or this represents all human actions or human faces.’ These are types of decisions that are made when we create what are called datasets,” she said. “What is interesting about training datasets is that they will always bear the marks of history, that history will be human, and it will always have the same kind of frailties and biases that humans have.”

Researchers are also looking at the separate but related issue of whether there is enough diversity among AI researchers. Research has shown that more diverse teams choose more diverse problems to work on and produce more innovative solutions. Two events co-located with NIPS will address this issue: The 12thWomen in Machine Learning Workshop, where Wallach, who co-founded Women in Machine Learning, will give an invited talk on the merger of machine learning with the social sciences, and the Black in AI workshop, which was co-founded by Timnit Gebru, a post-doctoral researcher at Microsoft’s New York lab.

“In some types of scientific disciplines, it doesn’t matter who finds the truth, there is just a particular truth to be found. AI is not exactly like that,” said Gebru. “We define what kinds of problems we want to solve as researchers. If we don’t have diversity in our set of researchers, we are at risk of solving a narrow set of problems that a few homogeneous groups of people think are important, and we are at risk of not addressing the problems that are faced by many people in the world.”

Timnit Gebru is a post-doctoral researcher at Microsoft’s New York City research lab. Photo by Peter DaSilva.

Machine learning core

At its core, NIPS is an academic conference with hundreds of papers that describe the development of machine learning models and the data used to train them.

Microsoft researchers authored or co-authored 43 accepted conference papers. They describe everything from the latest advances in retrieving data stored in synthetic DNA to a method for repeatedly collecting telemetry data from user devices without compromising user privacy.

Nearly every paper presented at NIPS over the past three decades considers data in some way, noted Wallach. “The difference in recent years, though,” she added, “is that machine learning no longer exists in a purely academic context, where people use synthetic or standard datasets. Rather, it’s something that affects all kinds of aspects of our lives.”

The application of machine-learning models to real-world problems and challenges is, in turn, bringing into focus issues of fairness, accountability and transparency.

“People are becoming more aware of the influence that algorithms have on their lives, determining everything from what news they read to what products they buy to whether or not they get a loan. It’s natural that as people become more aware, they grow more concerned about what these algorithms are actually doing and where they get their data,” said Jenn Wortman Vaughan, a senior researcher at Microsoft’s New York lab.

The trouble with bias

Data is not something that exists in the world as an object that everyone can see and recognize, explained Crawford. Rather, data is made. When scientists first began to catalog the history of the natural world, they recognized types of information as data, she noted. Today, scientists also see data as a construct of human history.

Crawford’s invited talk at NIPS will highlight examples of machine learning bias such as news organization ProPublica’s investigation that exposed bias against African-Americans in an algorithm used by courts and law enforcement to predict the tendency of convicted criminals to reoffend, and then discuss how to address such bias.

“We can’t simply boost a signal or tweak a convolutional neural network to resolve this issue,” she said. “We need to have a deeper sense of what is the history of structural inequity and bias in these systems.”

One method to address bias, according to Crawford, is to take what she calls a social system analysis approach to the conception, design, deployment and regulation of AI systems to think through all the possible effects of AI systems. She recently described the approach in a commentary for the journal Nature.

Crawford noted that this isn’t a challenge that computer scientists will solve alone. She is also a co-founder of the AI Now Institute, a first-of-its-kind interdisciplinary research institute based at New York University that was launched in November to bring together social scientists, computer scientists, lawyers, economists and engineers to study the social implications of AI, machine learning and algorithmic decision making.

Jenn Wortman Vaughan is a senior researcher at Microsoft’s New York City research lab. Photo by John Brecher.

Interpretable machine learning

One way to address concerns about AI and machine learning is to prioritize transparency by making AI systems easier for humans to interpret. At NIPS, Vaughan, one of the New York lab’s researchers, will give a talk describing a large-scale experiment that she and colleagues are running to learn what factors make machine learning models interpretable and understandable for non-machine learning experts.

“The idea here is to add more transparency to algorithmic predictions so that decision makers understand why a particular prediction is made,” said Vaughan.

For example, does the number of features or inputs to a model impact a person’s ability to catch instances where the model makes a mistake? Do people trust a model more when they can see how a model makes its prediction as opposed to when the model is a black box?

The research, said Vaughan, is a first step toward the development of “tools aimed at helping decision makers understand the data used to train their models and the inherent uncertainty in their models’ predictions.”

Patrice Simard, a distinguished engineer at Microsoft’s Redmond, Washington, research lab who is a co-organizer of the symposium, said the field of interpretable machine learning should take a cue from computer programming, where the art of decomposing problems into smaller problems with simple, understandable steps has been learned. “But in machine learning, we are completely behind. We don’t have the infrastructure,” he said.

To catch up, Simard advocates a shift to what he calls machine teaching – giving machines features to look for when solving a problem, rather than looking for patterns in mountains of data. Instead of training a machine learning model for car buying with millions of images of cars labeled as good or bad, teach a model about features such as fuel economy and crash-test safety, he explained.

The teaching strategy is deliberate, he added, and results in an interpretable hierarchy of concepts used to train machine learning models.

Researcher diversity

One step to safeguard against unintended bias creeping into AI systems is to encourage diversity in the field, noted Gebru, the co-organizer of the Black in AI workshop co-located with NIPS. “You want to make sure that the knowledge that people have of AI training is distributed around the world and across genders and ethnicities,” she said.

The importance of researcher diversity struck Wallach, the NIPS program co-chair, at her fourth NIPS conference in 2005. For the first time, she was sharing a hotel room with three roommates, all of them women. One of them was Vaughan, and the two of them, along with one of their roommates, co-founded the Women in Machine Learning group, which is now in its 12th year and has held a workshop co-located with NIPS since 2008. This year, more than 650 women are expected to attend.

Wallach will give an invited talk at the Women in Machine Learning Workshop about how she applies machine learning in the context of social science to measure unobservable theoretical constructs such as community membership or topics of discussion.

“Whenever you are working with data that is situated within society contexts,” she said, “necessarily it is important to think about questions of ethics, fairness, accountability, transparency and privacy.”

Related: 

John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Visual Studio Live Share aims to spur developer collaboration

NEW YORK — Developers at Microsoft’s event here last week got a sneak peek at a tool that aims to boost programmer productivity and improve application quality.

Microsoft’s Visual Studio Live Share, displayed at its Connect(); 2017 conference, lets developers work on the same code in real time. It also continues to bolster the company’s credibility in their eyes, delivering tools and services that make their jobs easier.

The software brings the Agile practice of pair programming to a broader set of programmers, except the programmers do not need to be physically together. Developers can remotely access and debug the same code in their respective editor or integrated development environment and share their full project context, rather than just their screens. Visual Studio Live Share works across multiple machines. Interested developers can sign up to join the Visual Studio Live Share preview, set for early 2018. It will be a limited, U.S.-only preview.

“It works not just between Visual Studio Code sessions between two Macs or between two Visual Studio sessions on Windows, but you can, in fact, have teams composed of multiple different parts of the Visual Studio family on multiple different operating systems all developing simultaneously,” said Scott Guthrie, executive vice president in Microsoft’s cloud and enterprise group.

The ability for developers to collaboratively debug and enhance the quality of applications in real time is extremely useful for developers looking for help with coding issues. While the capability has been around in various forms for 20 years, by integrating it into the Visual Studio tool set, Microsoft aims to standardize live sharing of code.

Scott Guthrie, Microsoft executive vice president of cloud and enterprise, presenting the keynote at Connect(); 2017.
Scott Guthrie, Microsoft executive vice president of cloud and enterprise, presenting the keynote at Connect(); 2017.

“I will be happy to see full collaboration make it to a shipping product,” said Theresa Lanowitz, an analyst at Voke, a research firm in Minden, Nev. “I had that capability shipping in 1994 at Taligent.”

Thomas Murphy, an analyst at Gartner, said he likes what he has heard about Visual Studio Live Share thus far, but wants to see it firsthand and compare it with pair programming tools such as AtomPair.

“[Microsoft is] doing a great job of being open and participating in open software in a nice incremental fashion,” he said. “But does it bring them new developers? That is a harder question. I think there are still plenty of people that think of Microsoft as the old world, and they are now in the new world.”

General availability of Visual Studio App Center

There are still plenty of [developers] that think of Microsoft as the old world, and they are now in the new world.
Thomas Murphyanalyst, Gartner

Also this week, Microsoft made its Visual Studio App Center generally available. Formerly known as Visual Studio Mobile Center and based on Xamarin Test Cloud, Visual Studio App Center is essentially a mobile backend as a service that provides a DevOps environment to help developers manage the lifecycle of their mobile apps. Objective-C, Swift, Android Java, Xamarin and React Native developers can all use Visual Studio App Center, according to the company.

Once a developer connects a code repository to Visual Studio App Center, the tool automatically creates a release pipeline of automated builds, tests the app in the cloud, manages distribution of the app to beta testers and app stores, and monitors usage of the app with crash analytics data using HockeyApp analytics tool Microsoft acquired in 2014.

“HockeyApp is very useful for telemetry data; that was a good acquisition,” Lanowitz said. Xamarin’s mobile development tools, acquired by Microsoft in 2016, also are strong, she said.

Darryl K. Taft covers DevOps, software development tools and developer-related issues as news writer for TechTarget’s SearchSoftwareQuality, SearchCloudApplications, SearchMicroservices and TheServerSide. Contact him at dtaft@techtarget.com or @darrylktaft on Twitter.

Azure DevOps Projects helps ease release automation

NEW YORK — Microsoft’s new Azure DevOps Projects tool lets developers configure a DevOps pipeline and connect it to the cloud with no prior knowledge of how to do so.

Azure DevOps Projects, released to public preview at the Microsoft Connect(); conference here this week, is a scaffolding system for developers to configure a full DevOps pipeline and connect to Microsoft’s Azure cloud services in less than five minutes.

With digital transformation efforts in full swing across enterprises in nearly every industry, developers are driven harder than ever to speed up application releases. In the process, they also want to ensure quality and security and to manage these apps more efficiently. This is where DevOps becomes critical and where a simplified way to get started with DevOps could be useful.

Abel Wang, a senior cloud developer advocate for DevOps at Microsoft, demonstrated how, with a series of clicks to provide information about the type of application and programming language used, Azure DevOps Projects sets up a Git repository and wires up automated build and release pipelines. Everything is automatic, although developers can customize the configuration.

“We make it ridiculously easy to go to a full DevOps environment,” Wang said.

Azure DevOps Projects comes out of the intersection of Microsoft’s Visual Studio family of tools and services — particularly Video Studio Team Services (VSTS) — and the Azure cloud platform.

It’s hard to set up your DevOps pipeline, because developers often manually integrate a lot of different tools, said Scott Guthrie, executive vice president of Microsoft’s cloud and enterprise group. However, VSTS is fully integrated with Azure.

“I think this shows that Microsoft now has a good set of tools and a strengthening set of stuff for ops,” said Thomas Murphy, an analyst with Gartner.

Microsoft has historically been weak in areas such as release automation, but a stronger Azure platform, including Azure DevOps, will help Microsoft better compete with Amazon Web Services (AWS) for enterprise customers. New independent software vendors still target AWS or Cloud Foundry, but in the corporate space — especially retail businesses that view AWS as a competitor — there is a growing business-driven push away from AWS and toward Azure, Murphy said.

Microsoft engineer Donovan Brown previews Azure DevOps Projects at Microsoft Connect(); 2017.
Microsoft engineer Donovan Brown previews Azure DevOps Projects at Microsoft Connect(); 2017.

Microsoft continues to advance cloud for DevOps

Another VSTS capability, release management gates, enables developers to specify conditions necessary to begin or finish a deployment to an environment, automating a process that’s often manual. DevOps pros can configure an environment to deploy and wait a day to ensure there are no blocking work items or monitoring alerts before proceeding with deployment, said Brian Harry, Microsoft’s corporate vice president for Visual Studio Team Services and Team Foundation Server (TFS), in a blog post

VSTS also has free, cloud-hosted continuous integration (CI) and continuous delivery (CD) on macOS, so teams can build and release Apple iOS, macOS, tvOS and watchOS applications without the need for Mac hardware. This also means the VSTS CI/CD system in the cloud covers the gamut of Linux, macOS and Windows in one offering.

Without security and performance, there are no apps of the future, and moving fast only gives you continuous bugs.
Theresa LanowitzCEO, Voke

Microsoft also made generally available Team Foundation Server 2018 and the TFS Database Import Service. Additionally, it released to preview the company’s open source command-line tools for VSTS and YAML support for VSTS build definitions, so developers can represent their build pipeline as code.

Finally, Microsoft launched a new partnership with GitHub to drive adoption of the Microsoft-developed Git Virtual File System (GVFS) as the industry standard to use Git at scale. Microsoft has become one of the most prolific contributors to open source largely through GitHub, and the two have worked together to bring GVFS to the code repository’s 25 million users on Windows, Mac and Linux clients, said Sam Lambert, senior director of infrastructure at GitHub.

Theresa Lanowitz, founder and CEO of Voke, a market research firm in Minden, Nev., praised Microsoft for its strong release management updates with software such as Azure DevOps Projects, but she expected to hear and see more about security and performance. “Without security and performance, there are no apps of the future, and moving fast only gives you continuous bugs,” she said.

“Our research shows low automation and low adoption of any commercial release management tool for full lifecycle traceability — of all assets,” she said in an interview from Microsoft Connect();, noting that enterprise integration issues may be partly to blame for low adoption of these tools.

ONUG conference: SD-WAN and automation continue to grow

NEW YORK — The technology showcase floor at this fall’s ONUG conference teemed with the typical vendors displaying various products and technologies. But a significant number of these suppliers could be lumped into a single, albeit continually shifting, category: software-defined WAN.

By now, SD-WAN’s benefits are well-understood. Among other advantages, the technology can simplify the WAN and streamline traffic management. It also gives enterprises the ability to transition away from more expensive connectivity links, like MPLS — or to add broadband internet in conjunction with existing MPLS services. As a result, SD-WAN deployments continue to blossom, with IDC predicting the market will eclipse $8 billion in sales by 2021.

Yet, as SD-WAN continues to mature, customers are now looking for new ways to exploit the technology. One concept gaining traction: network infrastructure consolidation. This change pushes SD-WAN from a sole “solution” to a framework supporting additional functions in a broader network package. Versa Networks, for example, offers its FlexVNF platform that inserts multiple functions, like SD-WAN, routing and security, into the broader platform. FatPipe Networks also launched a virtual network functions platform that integrates SD-WAN functionality earlier this month.

The ONUG conference also featured a session dedicated to its SD-WAN working group, the Open SD-WAN Exchange (OSE). Currently, the group is developing an interworking architecture framework that will contain reference points, or open APIs, to enable standardization across domain orchestrators. The group settled on RESTful architecture with JSON support, according to OSE group member Steve Wood, principal engineer for enterprise architecture and SD-WAN at Cisco.

Wood said the goal, for now, is not to get individual SD-WAN devices to talk with each other, but to automate and standardize the gateway between different domain orchestrators. The working group also hopes to standardize APIs that will enable application identification.

The OSE group has yet to release any APIs, however, and has only a handful of enterprises working with it to develop the standard interfaces.

Move away from just networking

Earlier this year, ONUG announced its rebranding to move the group away from being solely focused on computer networking. This transition was evident from the range of sessions throughout the ONUG conference — from discussions dedicated to containers and container orchestration to hybrid cloud and machine learning.

One session focused on automated container orchestration, especially related to hybrid cloud environments. Panelists from GE, Bank of America, Intuit and Citigroup discussed their companies’ current use of containers — if any — and hesitations they had with the still-emerging technology.

Bruce Pinsky, a distinguished engineer at Intuit, based in Mountain View, Calif., agreed containers are the next level of virtualization, but said more progress is necessary. Harmen Van der Linde, global head of Citi Management Tools at New York-based Citigroup, had similar concerns, among which was the question of how vendors will deal with patching containers on a large scale.

Process trumps the product

The cloud isn’t always easy. As Maria Azúa, senior vice president of distributed hosting at Fidelity, based in Boston, spoke those words during her ONUG presentation, a few audience members laughed in agreement.

“Everybody thinks the cloud is the best thing since sliced bread; everybody thinks it’s really easy. But it’s not that easy,” she said.

Automation is the only way to standardization.
Maria Azúasenior vice president of distributed hosting at Fidelity

For Azúa, an important consideration when moving to the cloud is to know where the data goes. This entails creating a digital signature for the data and ensuring if the signature is compromised, you’re in possession of the key and can kill the data.

“Don’t patch anything in the cloud,” she said. “Everything is immutable.”

Despite data management and security challenges, Azúa said companies will continue to shift their workloads to the cloud. By 2020, she predicted 85% of workloads will be in hybrid cloud environments, with the top five cloud providers controlling about 75% of those services.

By 2025, she estimated cloud usage will be at 53%, surpassing enterprise reliance on legacy infrastructure.

Automation is also making inroads, Azúa said, especially as standardization becomes more important.

“Automation is the only way to standardization,” she said. “You can buy any tools you want, but you’ll never get there [standardization] if you don’t automate, because the human being is very bad at repeating tasks.”

Companies can potentially bypass these errors with automation. And to Azúa, this automation means more than getting a computer to automate a single task. She defined automation as a standardized declarative process for multiple workflows. These processes then become more important than the products themselves.

“More importantly, you need to understand that your processes trump any product that you have,” she said. “Process trumps product.”

To buy or build IT infrastructure focus of ONUG 2017 panel

NEW YORK — Among the most vexing questions enterprises face is whether it makes more sense to buy or build IT infrastructure. The not-quite-absolute answer, according to the Great Discussion panel at last week’s ONUG conference: It depends.

“It’s hard, because as engineers, we follow shiny objects,” said Tsvi Gal, CTO at New York-based Morgan Stanley, adding there are times when the financial services firm will build what it needs, rather than being lured by what vendors may be selling.

“If there are certain areas of the industry that have no good solution in the market, and we believe that building something will give us significant value or edge over the competition, then we will build,” he said.

This decision holds even if buying the product is cheaper than building IT infrastructure, he said — especially if the purchased products don’t always have the features and functions Morgan Stanley needs.

“I don’t mind spending way more money on the development side, if the return for it will be significantly higher than buying would,” he said. “We’re geeks; we love to build. But at the end of the day, we do it only for the areas where we can make a difference.”

ONUG panelists discuss buy vs. build IT infrastructure
Panelists at the Great Discussion during the ONUG 2017 fall conference

A company’s decision to buy or build IT infrastructure heavily depends on its size, talent and culture.

For example, Suneet Nandwani, senior director of cloud infrastructure and platform services at eBay, based in San Jose, Calif., said eBay’s culture as a technology company creates a definite bias toward building and developing its own IT infrastructure. As with Morgan Stanley, however, Nandwani said eBay stays close to the areas it knows.

“We often stick within our core competencies, especially since eBay competes with companies like Facebook and Netflix,” he said.

On the other side of the coin, Swamy Kocherlakota, S&P Global’s head of global infrastructure and enterprise transformation, takes a mostly buy approach, especially for supporting functions. It’s a choice based on S&P Global’s position as a financial company, where technology development remains outside the scope of its main business.

This often means working with vendors after purchase.

“In the process, we’ve discovered not everything you buy works out of the box, even though we would like it to,” Kocherlakota said.

Although he said it’s tempting to let S&P Global engineers develop the desired features, the firm prefers to go back to the vendor to build the features. This choice, he said, traces back to the company’s culture.

“You have to be part of a company and culture that actually fosters that support and can maintain [the code] in the long term,” he said.  

The questions of talent, liability and supportability

Panelists agreed building the right team of engineers was an essential factor to succeed in building IT infrastructure.

“If your company doesn’t have enough development capacity to [build] it yourself, even when you can make a difference, then don’t,” Morgan Stanley’s Gal said. “It’s just realistic.”

But for companies with the capacity to build, putting together a capable team is necessary.

“As we build, we want to have the right talent and the right teams,” eBay’s Nandwani said. “That’s so key to having a successful strategy.”

To attract the needed engineering talent, he said companies should foster a culture of innovation, acknowledging that mistakes will happen.

For Gal, having the right team means managers should do more than just manage.

“Most of our managers are player-coach, not just coach,” Gal said. “They need to be technical; they need to understand what they’re doing, and not just [be] generic managers.”

But it’s not enough to just possess the talent to build IT infrastructure; companies must be able to maintain both the talent and the developed code.

“One of the mistakes people make when building software is they don’t staff or resource it adequately for operational support afterward,” Nandwani said. “You have to have operational process and the personnel who are responding when those things screw up.”

S&P Global’s Kocherlakota agreed, citing the fallout that can occur when an employee responsible for developing important code leaves the company. Without proper documentation, the required information to maintain the code would be difficult to follow.

This means having the right support from the beginning, with well-defined processes encompassing software development lifecycle, quality assurance and control and code reviews.

“I would just add that when you build, it doesn’t free you from the need to document what you’re doing,” Gal said.

CIOs should lean on AI ‘giants’ for machine learning strategy

NEW YORK — Machine learning and deep learning will be part of every data science organization, according to Edd Wilder-James, former vice president of technology strategy at Silicon Valley Data Science and now an open source strategist at Google’s TensorFlow.

Wilder-James, who spoke at the Strata Data Conference, pointed to recent advancements in image and speech recognition algorithms as examples of why machine learning and deep learning are going mainstream. He believes image and speech recognition software has evolved to the point where it can see and understand some things as well as — and in some use cases better than — humans. That makes it ripe to become part of the internal workings of applications and the driver of new and better services to internal and external customers, he said.

But what investments in AI should CIOs make to provide these capabilities to their companies? When building a machine learning strategy, choice abounds, Wilder-James said.

Machine learning vs. deep learning

Deep learning is a subset of machine learning, but it’s different enough to be discussed separately, according to Wilder-James. Examples of machine learning models include optimization, fraud detection and preventive maintenance. “We use machine learning to identify patterns,” Wilder-James said. “Here’s a pattern. Now, what do we know? What can we do as a result of identifying this pattern? Can we take action?”

Deep learning models perform tasks that more closely resemble human intelligence such as image processing and recognition. “With a massive amount of compute power, we’re able to look at a massively large number of input signals,” Wilder-James said. “And, so what a computer is able to do starts to look like human cognitive abilities.”

Some of the terrain for machine learning will look familiar to CIOs. Statistical programming languages such as SAS, SPSS and Matlab are known territory for IT departments. Open source counterparts such as R, Python and Spark are also machine-learning ready. “Open source is probably a better guarantee of stability and a good choice to make in terms of avoiding lock-in and ensuring you have support,” Wilder-James said.

Unlike other tech rollouts

The rollout of machine learning and deep learning models, however, is a different process than most technology rollouts. After getting a handle on the problem, CIOs will need to investigate if machine learning is even an appropriate solution.

“It may not be true that you can solve it with machine learning,” Wilder-James said. “This is one important difference from other technical rollouts. You don’t know if you’ll be successful or not. You have to enter into this on the pilot, proof-of-concept ladder.”

The most time-consuming step in deploying a machine learning model is feature engineering, or finding features in the data that will help the algorithms self-tune. Deep learning models skip the tedious feature engineering step and go right to the training step. To tune a deep learning model correctly requires immense data sets, graphic processing units or tensor processing units, and time. Wilder-James said it could take weeks and even months to train a deep learning model.

One more thing to note: Building deep learning models is hard and won’t be a part of most companies’ machine learning strategy.

“You have to be aware that a lot of what’s coming out is the closest to research IT has ever been,” he said. “These things are being published in papers and deployed in production in very short cycles.”

CIOs whose companies are not inclined to invest heavily in AI research and development should instead rely on prebuilt, reusable machine and deep learning models rather than reinvent the wheel. Image recognition models, such as Inception, and natural language models, such as SyntaxNet and Parsey McParseface, are examples of models that are ready and available for use.

“You can stand on the shoulders of giants, I guess that’s what I’m trying to say,” Wilder-James said. “It doesn’t have to be from scratch.”

Machine learning tech

The good news for CIOs is that vendors have set the stage to start building a machine learning strategy now. TensorFlow, a machine learning software library, is one of the best known toolkits out there. “It’s got the buzz because it’s an open source project out of Google,” Wilder-James said. “It runs fast and is ubiquitous.”

While not terribly developer-friendly, a simplified interface called Keras eases the burden and can handle the majority of use cases. And TensorFlow isn’t the only deep learning library or framework option, either. Others include MXNet, PyTorch, CNTK, and Deeplearning4j.

For CIOs who want AI to live on premises, technologies such as Nvidia’s DGX-1 box, which retails for $129,000, are available.

But CIOs can also utilize cloud as a computing resource, which would cost anywhere between $5 and $15 an hour, according to Wilder-James. “I worked it out, and the cloud cost is roughly the same as running the physical machine continuously for about a year,” he said.

Or they can choose to go the hosted platform route, where a service provider will run trained models for a company. And other tools, such as domain-specific proprietary tools like the personalization platform from Nara Logics, can fill out the AI infrastructure.

“It’s the same kind of range we have with plenty of other services out there,” he said. “Do you rent an EC2 instance to run a database or do you subscribe to Amazon Redshift? You can pick the level of abstraction that you want for these services.”

Still, before investments in technology and talent are made, a machine learning strategy should start with the basics: “The single best thing you can do to prepare with AI in the future is to develop a competency with your own data, whether it’s getting access to data, integrating data out of silos, providing data results readily to employees,” Wilder-James said. “Understanding how to get at your data is going to be the thing to prepare you best.”