Tag Archives: Networks

Tintri acquisition proposal leaves customers in limbo

While DataDirect Networks awaits the outcome of its proposed acquisition of Tintri assets, Tintri customers wonder what it all means for them.

High-performance computing (HPC) storage specialist DataDirect Networks (DDN) believes the Tintri acquisition is a way to appeal to mainstream enterprise IT shops. DDN made a bid to acquire Tintri’s assets for an undisclosed sum after Tintri filed for bankruptcy in early July — barely a year after going public on the Nasdaq.

DDN sees a way to broaden its appeal with Tintri’s flash-based, analytics-driven storage arrays, but the acquisition isn’t yet certain. The two vendors have signed a letter of intent, but the proposed sale won’t be finalized until the completion of a court-ordered bidding process. That means bidders could emerge to challenge DDN.

Users react to planned Tintri acquisition

The uncertainly of the Tintri acquisition affects its customers, who remain unsure of what it means for their maintenance contracts. DDN declined to estimate how long the bidding process might take.

That leaves Tintri customers bracing for what comes next. The city of Lewiston, Idaho, used Tintri arrays to replace legacy storage. Systems administrator Danny Santiago said the hybrid storage was a “magic cure-all” for labor-intensive management. Lewiston’s storage includes a Tintri T620 and two T820 hybrid arrays for primary storage, backup and replication.

“I spent about half my day fighting LUN management. When we got the Tintri, I got that time back,” Santiago said. “I can go six months and never have to touch the Tintri storage. The interface is beautiful. It gives you metrics to let you know if a problem is [tied to] storage, the network or on the Windows OS side.”

Now, Santiago said he doesn’t know what to expect. His agency is in year one of an extended three-year Tintri support contract for the T620.

“Financially, we’re not in a position to change our storage,” he said. “We put a lot of money into these Tintri boxes, and we need to get the life expectancy out of them.”

The Fay Jones School of Architecture and Design at the University of Arkansas installed Tintri several years ago to replace aging EMC Clariion storage — the forerunner to the Dell EMC VNX SAN array. Scott Zemke, the Fayetteville, Ark., school’s director of technology, said Tintri competitors have already started knocking on his door.

“Quite honestly, we rarely have issues with the Tintri arrays. But, of course, we’re looking at contingency plans if we have to do a refresh. One vendor is offering ridiculously stupid deals to trade in our Tintri storage, so it will be an interesting next couple of months,” Zemke said.

“I know Tintri really wanted the business to work, but it seems like they have just had management problem after management problem. Hopefully, DDN will continue to support the stuff. We have DDN arrays in our HPC data center, and they’re a great company to work with, too,” Zemke said.

Is predictive analytics key to Tintri acquisition?

According to Tintri’s securities filings, DDN’s bid would encompass most of Tintri’s assets, including all-flash and hybrid virtualization arrays. But the predictive Tintri Analytics platform may have a greater impact on DDN’s business. The SaaS-based data lake provides real-time analytics and preventive maintenance. Customers can automate capacity and performance requirements for each virtual machine.

Predictive analytics is considered a valuable feature for modern storage arrays. Hewlett Packard Enterprise considered Nimble Storage’s InfoSight analytics a key driver of its $1.2 billion acquisition of Nimble in 2017, and HPE has since integrated InfoSight into its flagship 3PAR arrays. DDN could follow the same playbook by incorporating Tintri Analytics into its other products.

Financially, we’re not in a position to change our storage. We put a lot of money into these Tintri boxes, and we need to get the life expectancy out of them.
Danny Santiagosystem administrator for the city of Lewiston, Idaho

Tintri’s technology would help DDN serve mainstream enterprises seeking to implement AI and machine learning, said Kurt Kuckein, senior director of marketing at DDN, based in Chatsworth, Calif.

“We have plenty of organizations where we work with data scientists or the analytics team, but we really haven’t had a product for enterprise IT shops. Adding Tintri gives us a well-baked technology and a large installed base,” Kuckein said.

In the near term, DDN plans to maintain the Tintri brand as a separate engineering division. Real-time Tintri analytics eventually could wind up in branded AI ExaScaler turnkey appliances, he said.

Tintri Analytics is part of the Tintri Global Center management portal. The intelligence can predict hardware failures and automate support tickets. Tintri typically shipped replacement parts to customers by the next business day.

According to George Crump, president of IT analyst firm Storage Switzerland, Tintri’s analytics are “as good, if not better,” than Nimble’s InfoSight.

“DDN is probably the perfect acquirer for Tintri,” Crump said. “It’s profitable. It has a massive amount of storage experience. And there’s almost no overlap between the DDN and Tintri product. All the Tintri stuff would be net-new business.”

Will DDN breathe new life into Tintri storage?

The proposed Tintri acquisition follows a rocky period for the vendor. Tintri filed for Chapter 11 protection this month — just weeks after the one-year anniversary of its initial public offering (IPO). Some experts saw going public as a desperation move after Tintri failed to secure additional private investment. Tintri also went through two CEOs between April and June.

Tintri initially hoped for a share price in the range of $11.50 to raise about $109 million in June 2017, but its IPO opened at $7 and raised only $60 million. Shares rose no higher than $7.75, and Nasdaq eventually delisted Tintri after its shares dropped to below $1 for 30 consecutive trading sessions.

Aside from investors’ lukewarm reception, several strategic missteps conspired to doom Tintri. Crump said the company undercut its key differentiators of analytics and quality of service (QoS) when it launched an all-flash array in 2015.

“Tintri’s marketing message should have been, ‘Don’t buy an all-flash array, and here’s why,'” Crump said. “DDN should get rid of the all-flash model and just focus on selling the hybrid arrays. When your system is faster than all of your workloads combined, then you don’t really need QoS. That would get people’s attention.”

Big Switch taps AWS VPCs for hybrid cloud networking

Big Switch Networks has introduced software that provides consistency in building and managing a network infrastructure within a virtual network in Amazon Web Services and the private data center.

The vendor, which provides a software-based switching fabric for open hardware, said this week it would release the hybrid cloud technology in stages. First up is a software release next month for the data center, followed by an application for AWS in the fourth quarter.

The AWS product, called Big Cloud Fabric — Public Cloud, provides the tools for creating and configuring a virtual network to deliver Layer 2, Layer 3 and security services to virtual machines or containers running on the IaaS provider. AWS also offers tools for building the virtual networks, which it calls Virtual Private Clouds (VPCs).

In general, customers use AWS VPCs to support a private cloud computing environment on the service provider’s platform. The benefit is getting more granular control over the virtual network that serves sensitive workloads.

Big Cloud Fabric — Public Cloud lets companies create AWS VPCs and assign security policies for applications running on the virtual networks. The product also provides analytics for troubleshooting problems. While initially available on AWS, Big Switch plans to eventually make Big Cloud Fabric — Public Cloud available on Google Cloud and Microsoft Azure.

Big Switch Networks' cloud-first portfolio

VPCs for the private data center

For the corporate data center, Big Switch plans to add tools to its software-based switching fabric — called Big Cloud Fabric — for creating and managing on-premises VPCs that operate the same way as AWS VPCs, said Prashant Gandhi, the chief product officer for Big Switch, based in Santa Clara, Calif.

Customers could use the on-premises VPCs, which Big Switch calls enterprise VPCs, as the virtual networks supporting computing environments that include Kubernetes and Docker containers, the VMware server virtualization vSphere suite, and the OpenStack cloud computing framework.

“With the set of tools they are announcing, [Big Switch] will be able to populate these VPCs and facilitate a consistent deployment and management of networks across cloud and on premises,” said Will Townsend, an analyst at Moor Insights & Strategy, based in Austin, Texas.

Big Switch already offers a version of its Big Monitoring Fabric (BMF) network packet broker for AWS. In the fourth quarter, Big Switch plans to release a single console, called Multi-Cloud Director, for accessing all BMF and Big Cloud Fabric controllers.

In general, Big Switch supplies software-based networking technology for white box switches. Big Cloud Fabric competes with products from Cisco, Midokura and Pluribus Networks, while BMF rivals include technology from GigamonIxia and Apcon.

Big Switch customers are mostly large enterprises, including communication service providers, government agencies and 20 Fortune 100 companies, according to the vendor.

Ctera Networks adds Dell and HPE gateway appliance options

Ctera Networks is aiming to move its file storage services into the enterprise through new partnerships with Dell and Hewlett Packard Enterprise.

The partnerships, launched in June, allow Ctera to bundle its Enterprise File Services Platform on more-powerful servers with greater storage capacity. Ctera previously sold its branded cloud gateway appliances on generic white box hardware at a maximum raw capacity of 32 TB. The new Ctera HC Series Edge Filers include the HC1200 model offering as much as 96 TB, the HC400 with as much as 32 TB and the HC400E at 16 TB on Dell or HPE servers with 3.5-inch SATA HDDs.

The gateway appliances bundle Ctera’s file services that provide users with access to files on premises and transfers colder data to cloud-based, scale-out object storage at the customer site or in public clouds.

The new models include the Petach Tikvah, Israel, company’s first all-flash appliances. The HC1200 is equipped with 3.84 TB SATA SSDs and offers a maximum raw capacity of 46.08 TB. The HC400 tops out at 15.36 TB. The all-flash models use HPE hardware with 2.5-inch read-intensive SSDs that carry a three-year warranty.

Ctera Networks doesn’t sell appliances with a mix of HDDs and SSDs. The HC400 and HC400E are single rack-unit systems with four drive bays, and the HC1200 is a 2U device with 12 drive bays.

“In the past, we had 32 TB of storage, and it would replace a certain size of NAS device. With this one, we can replace a large series of NAS devices with a single device,” Ctera Networks CEO Liran Eshel said.

Ctera HC Series Edge Filers
New Ctera HC Series Edge Filers include Ctera Networks HC1200 (top) and HC400 file storage.

New Ctera Networks appliances enable multiple VMs

The new, more-powerful HC Series Edge Filers will enable customers to run multiple VMware virtual machines (VMs), applications and storage on the same device, Eshel said. The HC Series supports 10 Gigabit Ethernet networking with fiber and copper cabling options.

“Our earlier generation was just a cloud storage gateway. It didn’t do other things,” Eshel said. “With this version, we actually have convergence — multiple applications in the same appliance. Basically, we’re providing top-of-the-line servers with global support.”

The Dell and HPE partnerships will let Ctera Networks offer on-site support within four hours, as opposed to the next-business-day service it provided in the past. Ctera will take the first call, Eshel said, and be responsible for the customer ticket. If it’s a hardware issue, Ctera will dispatch partner-affiliated engineers to address the problem.

Using Dell and HPE servers enables worldwide logistics and support, which is especially helpful for users with global operations.

“It was challenging to do that with white box manufacturing,” Eshel said.

Software-defined storage vendors require these types of partnerships to sell into the enterprise, said Steven Hill, a senior analyst at 451 Research.

“In spite of the increasingly software-based storage model, we find that many customers still prefer to buy their storage as pre-integrated appliances, often based on hardware from their current vendor of choice,” Hill wrote in an e-mail. “This guarantees full hardware compatibility and provides a streamlined path for service and support, as well as compatibility with an existing infrastructure and management platform.”

Cloud object storage options

The Ctera product works with on-premises object storage from Caringo, Cloudian, DataDirect Networks, Dell EMC, Hitachi, HPE, IBM, NetApp, Scality and SwiftStack. It also supports Amazon Web Services, Google Cloud Platform, IBM Cloud, Microsoft Azure, Oracle and Verizon public clouds. Ctera has reseller agreements with HPE and IBM.

Eshel said one multinational customer, WPP, has already rolled out the new appliances in production for use with IBM Cloud.

The list price for the new Ctera HC Series starts at $10,000. Ctera also continues to sell its EC Series appliances on white box hardware. Customers have the option to buy the hardware pre-integrated with the Ctera software or purchase virtual gateway software that they can install on server hypervisors on premises or in Amazon or Azure public clouds.

Lustre-based DDN ExaScaler arrays receive NVMe flash

DataDirect Networks has refreshed its Storage Fusion Architecture-based ExaScaler arrays, adding two models designed with nonvolatile memory express flash and a hybrid system with disk and flash.

In a related move, the high-performance computing storage vendor acquired the code repository and support contracts of Intel’s open source Lustre parallel file system for an undisclosed sum. The Lustre file system is the foundation for DDN ExaScaler and GridScaler arrays.

The fourth version of DDN ExaScaler combines parallel file storage servers and Nvidia DGX-1 high-performance GPUs with Storage Fusion Architecture (SFA) OS software. SFA 200NV and SFA 400NV are 2U arrays, with slots for 24 dual-ported nonvolatile memory express (NVMe) SSDs. The difference between the two is in compute power: SFA 200NV has a single CPU per controller, while the SFA 400NV has two CPUs per controller.

The arrays embed a 192-lane PCIe Gen 3 fabric to maximize NVMe performance. DDN claims the dense ExaScaler flash ingests data at nearly 40 GBps.

DDN also introduced the SFA7990 hybrid system, which allows customers to fill 90 drive slots with enterprise-grade SSDs and HDDs.

AI and analytics performance driver

Adding NVMe is a natural fit for DDN, which provides scalable storage systems to hyperscale data centers that require lots of high-performance storage, said Tim Stammers, a storage analyst at 451 Research.

“NVMe is going to help drive performance on intensive applications, like AI and analytics. It makes storage faster, and in return, AI and analytics will drive the takeup of NVMe flash,” Stammers said.

Data centers have the option to buy DDN ExaScaler NVMe arrays as plug-and-play storage for AI projects. The DDN AI200 and AI400 provide as much as 360 TB of dual-ported NVMe storage in 2U. The 4U AI7990 configurations scale to 5.4 PB in 20U.

The AI turnkey appliances include performance-tested implementations of Caffe, CNTK, Horovod, PyTorch, TensorFlow and other established AI frameworks.

Customers can combine an SFA cluster with DDN’s NVMe-based storage. Lustre presents file storage as a mountable capacity pool of flash and disk sharing a single namespace.

The DDN ExaScaler upgrade provides dense storage in a compact form factor to keep acquisition within reach of most enterprises, said James Coomer, vice president for product management at DDN, based in Chatsworth, Calif.

“At this early stage, customers don’t necessarily know where they’re going with AI,” Coomer said. “They may need more flash for performance. For AI, they need an economical way to hold data that’s relatively cold. We give them a choice to expand either the hot flash area or augment it in the second stage with hard-drive tiers and anywhere in between.”

Recent AI enhancements to the SFA operating system include declustered RAID and NVMe tuning. Declustered RAID allows for faster drive rebuilds by sharing parity bits across pooled drives.

Inference and training investments planned

DDN’s Lustre acquisition includes the open source code repository, file-tracking system and existing support contracts from Intel. Coomer said DDN plans to make investments to enable Lustre to support inference and training of data for AI workloads. The open source code will remain available for contributions from the community.

DDN is a prominent contributor to Lustre code development, and it has shipped Lustre-based storage systems for nearly two decades.

“DDN says they’re going to make Lustre easier to use,” Stammers said. “What they’re banking on is that it will lead more enterprises to use Lustre for these emerging workloads.”

Juniper adds core campus switch to EX series

Juniper Networks has added to its EX series a core aggregation switch aimed at enterprises with campus networks that are too small for the company’s EX9000 line.

Like the EX9000 series, the EX4650 — a compact 25/100 GbE switch — uses network protocols typically found in the data center. As a result, the same engineering team can manage the data center and the campus.

“If an enterprise has a consistent architecture and common protocols across networks, it should be well-placed to achieve operational efficiencies across the board,” said Brad Casemore, an analyst at IDC.

The network protocols used in the EX4650 and EX9000 are the Ethernet VPN (EVPN) and the Virtual Extensible LAN (VXLAN). EVPN secures multi-tenancy environments in a data center. Engineers typically use it with the Border Gateway Protocol and the VXLAN encapsulation protocol. The latter creates an overlay network on an existing Layer 3 infrastructure.

Offering a common set of protocols lets Juniper target its campus switches at data center customers, Casemore said. “That’s a less resistant path than trying to displace other vendors in both the data center and the campus.”

Juniper released the EX4650 four months after releasing two multigigabit campus switches, the EX2300 and EX4300. Juniper also released in February a cloud-based dashboard, called Sky Enterprise, for provisioning and configuring Juniper’s campus switches and firewalls.

Juniper rivals Arista and Cisco are also focused on the campus market. In May, Arista extended its data center switching portfolio to the campus LAN with the introduction of the 7300X3 and 7050X3 spline switches. Cisco, on the other hand, has been building out a software-controlled infrastructure for the campus network, centered around a management console called the Digital Network Architecture (DNA) Center.

EX4650 switch
Juniper Networks’ EX4650 core aggregation switch for the campus

SD-WAN upgrade

Along with introducing the EX4650, Juniper unveiled this week improvements within its software-defined WAN for the campus. Companies can use Juniper’s Contrail Service Orchestration technology to prioritize specific application traffic traveling through the SD-WAN. The capability supports more than 3,700 applications, including Microsoft’s Outlook, SharePoint and Skype for Business, Juniper said.

Juniper runs its SD-WAN as a feature within the company’s NFX Network Services Platform, which also includes the Contrail orchestration software and Juniper’s SRX Series Services Gateways. The latter contains the vSRX virtual firewall, IP VPN, content filtering and threat management.

Juniper has added to the NFX platform support for active-active clustering, which is the ability to spread a workload across NFX hardware. NFX runs its software on a Linux server.

The clustering feature will improve the reliability of the LTE, broadband and MPLS connections typically attached to an SD-WAN, Juniper said.

Tempered Networks extends reach of NAC software

Tempered Networks, a maker of network access control for a wide variety of devices, has extended its technology to Microsoft Azure, Google Cloud, Linux servers and additional IoT endpoints.

Tempered, which introduced the latest enhancements this week, has developed NAC software based on the Host Identity Protocol (HIP), a technology developed by a working group within the Internet Engineering Task Force. A HIP network replaces all IP addresses with cryptographic host identifiers that are resistant to denial-of-service and man-in-the-middle attacks.

Tempered has created a HIP wrapper that lets customers manage large numbers of devices through a product the vendor calls a HIPswitch. The technology creates a private overlay network to control what specific endpoints can access. The product can protect corporate, industrial and IoT systems.

What’s new

The latest improvements to the Tempered product portfolio includes a version of HIPswitch for Microsoft Azure and one for Google Cloud. The virtual appliance serves as an identity gateway for endpoints trying to access data, workloads and containers in the public clouds. The NAC software had only been available for AWS.

Also new is the HIPserver for Linux. HIPserver, which was available only for Windows, acts as a server’s overlay network gateway. The software, combined with a firewall, can cloak workloads, so they are not visible to hackers. The technology also ensures that network connections are authenticated before establishing a TCP session. HIPserver supports all major Linux distributions, whether they are running in a public cloud, on premises or a remote site.

Another technology added to the Tempered portfolio is the HIPswitch 75 appliance, a palm-sized IoT edge gateway designed as “plug-and-play” hardware for medical devices, point-of-sale systems and building automation controls. HIPswitch ensures that access policies are enforced for the attached systems.

Finally, Tempered introduced a product called HIPclient, which runs on Windows, Mac and iOS devices. The NAC software ensures clients only access authorized network resources.

The complete Tempered platform includes central software the vendor calls the conductor, which is akin to a software-defined networking controller. Customers use the product’s user interface to whitelist everything attached to HIPswitches and to set access policies for each endpoint or groups of them. Policy routing across the identity network is handled through technology Tempered calls the HIPrelay.

Tempered sells its products via annual subscription, based on the number of products deployed. Fees for HIPswitch for cloud start at $660, HIPserver for Linux, $1,180; and HIPclient, $300.

New WPA3 security protocol simplifies logins, secures IoT

Securing Wi-Fi access has long been an Achilles’ heel for users of wireless networks — especially for users of public networks, as well as for internet of things devices — but help is on the way.

Wi-Fi Alliance, the nonprofit industry group that promotes use of Wi-Fi, has begun certifying products supporting the latest version of its security protocol, the Wi-Fi Protected Access (WPA) specification, WPA3.

The new WPA3 security protocol is intended to simplify wireless authentication, especially for IoT devices, while at the same time improving security through the inclusion of new features and removal of legacy cryptographic and security protocols.

The WPA3 security protocol, announced in January, gives enterprises and individuals a better option for securing access to Wi-Fi networks. Support for WPA2 continues to be mandatory for all products in the Wi-Fi Alliance’s “Wi-Fi Certified” program, but the new WPA3 security protocol adds new capabilities for improved security, including stronger encryption and a more secure handshake.

In its press release, Wi-Fi Alliance wrote that the WPA3 security protocol “adds new features to simplify Wi-Fi security, enable more robust authentication, and deliver increased cryptographic strength for highly sensitive data markets.”

The new specification defines both an enterprise option, WPA3-Enterprise, which offers enterprises the “equivalent of 192-bit cryptographic strength,” to protect networks that transmit sensitive data; and an individual option, WPA3-Personal, which offers password-based authentication that can be more resilient against attacks even when “users choose passwords that fall short of typical complexity recommendations,” by using a secure key setup protocol, Simultaneous Authentication of Equals (SAE), to protect against attempts by malicious actors trying to guess passwords.

Wi-Fi Alliance also rolled out Wi-Fi Certified Easy Connect, an initiative for simplifying the secure initialization and configuration of wireless internet of things devices that have little or no display interfaces. The new program permits users to add devices to Wi-Fi networks with a different device — like a smartphone — that can scan a product quick response (QR) code.

Support for the new protocol will be made available as vendors begin incorporating it into their products. Wi-Fi Alliance members that plan to support the WPA3 security protocol include Cisco, Broadcom, Huawei Wireless, Intel and Qualcomm. A Wi-Fi Alliance spokesperson said by email that “Wi-Fi Alliance expects broad industry adoption of WPA3 by late 2019 in conjunction with the next generation of Wi-Fi based on 802.11ax standard.”

How to Architect and Implement Networks for a Hyper-V Cluster

We recently published a quick tip article recommending the number of networks you should use in a cluster of Hyper-V hosts. I want to expand on that content to make it clear why we’ve changed practice from pre-2012 versions and how we arrive at this guidance. Use the previous post for quick guidance; read this one to learn the supporting concepts. These ideas apply to all versions from 2012 onward.

Why Did We Abandon Practices from 2008 R2?

If you dig on TechNet a bit, you can find an article outlining how to architect networks for a 2008 R2 Hyper-V cluster. While it was perfect for its time, we have new technologies that make its advice obsolete. I have two reasons for bringing it up:

  • Some people still follow those guidelines on new builds — worse, they recommend it to others
  • Even though we no longer follow that implementation practice, we still need to solve the same fundamental problems

We changed practices because we gained new tools to address our cluster networking problems.

What Do Cluster Networks Need to Accomplish for Hyper-V?

Our root problem has never changed: we need to ensure that we always have enough available bandwidth to prevent choking out any of our services or inter-node traffic. In 2008 R2, we could only do that by using multiple physical network adapters and designating traffic types to individual pathways. Note: It was possible to use third-party teaming software to overcome some of that challenge, but that was never supported and introduced other problems.

Starting from our basic problem, we next need to determine how to delineate those various traffic types. That original article did some of that work. We can immediately identify what appears to be four types of traffic:

  • Management (communications with hosts outside the cluster, ex: inbound RDP connections)
  • Standard inter-node cluster communications (ex: heartbeat, cluster resource status updates)
  • Cluster Shared Volume traffic
  • Live Migration

However, it turns out that some clumsy wording caused confusion. Cluster communication traffic and Cluster Shared Volume traffic are exactly the same thing. That reduces our needs to three types of cluster traffic.

What About Virtual Machine Traffic?

You might have noticed that I didn’t say anything about virtual machine traffic above. Same would be true if you were working up a different kind of cluster, such as SQL. I certainly understand the importance of that traffic; in my mind, service traffic prioritizes above all cluster traffic. Understand one thing: service traffic for external clients is not clustered. So, your cluster of Hyper-V nodes might provide high availability services for virtual machine vmabc, but all of vmabc‘s network traffic will only use its owning node’s physical network resources. So, you will not architect any cluster networks to process virtual machine traffic.

As for preventing cluster traffic from squelching virtual machine traffic, we’ll revisit that in an upcoming section.

Fundamental Terminology and Concepts

These discussions often go awry over a misunderstanding of basic concepts.

  • Cluster Name Object: A Microsoft Failover Cluster has its own identity separate from its member nodes known as a Cluster Name Object (CNO). The CNO uses a computer name, appears in Active Directory, has an IP, and registers in DNS. Some clusters, such as SQL, may use multiple CNOs. A CNO must have an IP address on a cluster network.
  • Cluster Network: A Microsoft Failover Cluster scans its nodes and automatically creates “cluster networks” based on the discovered physical and IP topology. Each cluster network constitutes a discrete communications pathway between cluster nodes.
  • Management network: A cluster network that allows inbound traffic meant for the member host nodes and typically used as their default outbound network to communicate with any system outside the cluster (e.g. RDP connections, backup, Windows Update). The management network hosts the cluster’s primary cluster name object. Typically, you would not expose any externally-accessible services via the management network.
  • Access Point (or Cluster Access Point): The IP address that belongs to a CNO.
  • Roles: The name used by Failover Cluster Management for the entities it protects (e.g. a virtual machine, a SQL instance). I generally refer to them as services.
  • Partitioned: A status that the cluster will give to any network on which one or more nodes does not have a presence or cannot be reached.
  • SMB: ALL communications native to failover clustering use Microsoft’s Server Message Block (SMB) protocol. With the introduction of version 3 in Windows Server 2012, that now includes innate multi-channel capabilities (and more!)

Are Microsoft Failover Clusters Active/Active or Active/Passive?

Microsoft Failover Clusters are active/passive. Every node can run services at the same time as the other nodes, but no single service can be hosted by multiple nodes. In this usage, “service” does not mean those items that you see in the Services Control Panel applet. It refers to what the cluster calls “roles” (see above). Only one node will ever host any given role or CNO at any given time.

How Does Microsoft Failover Clustering Identify a Network?

The cluster decides what constitutes a network; your build guides it, but you do not have any direct input. Any time the cluster’s network topology changes, the cluster service re-evaluates.

First, the cluster scans a node for logical network adapters that have IP addresses. That might be a physical network adapter, a team’s logical adapter, or a Hyper-V virtual network adapter assigned to the management operating system. It does not see any virtual NICs assigned to virtual machines.

For each discovered adapter and IP combination on that node, it builds a list of networks from the subnet masks. For instance, if it finds an adapter with an IP of 192.168.10.20 and a subnet mask of 255.255.255.0, then it creates a 192.168.10.0/24 network.

The cluster then continues through all of the other nodes, following the same process.

Be aware that every node does not need to have a presence in a given network in order for failover clustering to identify it; however, the cluster will mark such networks as partitioned.

What Happens if a Single Adapter has Multiple IPs?

If you assign multiple IPs to the same adapter, one of two things will happen. Which of the two depends on whether or not the secondary IP shares a subnet with the primary.

When an Adapter Hosts Multiple IPs in Different Networks

The cluster identifies networks by adapter first. Therefore, if an adapter has multiple IPs, the cluster will lump them all into the same network. If another adapter on a different host has an IP in one of the networks but not all of the networks, then the cluster will simply use whichever IPs can communicate.

As an example, see the following network:

The second node has two IPs on the same adapter and the cluster has added it to the existing network. You can use this to re-IP a network with minimal disruption.

A natural question: what happens if you spread IPs for the same subnet across different existing networks? I tested it a bit and the cluster allowed it and did not bring the networks down. However, it always had the functional IP pathway to use, so that doesn’t tell us much. Had I removed the functional pathways, then it would have collapsed the remaining IPs into an all-new network and it would have worked just fine. I recommend keeping an eye on your IP scheme and not allowing things like that in the first place.

When an Adapter Hosts Multiple IPs in the Same Network

The cluster will pick a single IP in the same subnet to represent the host in that network.

What if Different Adapters on the Same Host have an IP in the Same Subnet?

The same outcome occurs as if the IPs were on the same adapter: the cluster picks one to represent the cluster and ignores the rest.

The Management Network

All clusters (Hyper-V, SQL, SOFS, etc.) require a network that we commonly dub Management. That network contains the CNO that represents the cluster as a singular system. The management network has little importance for Hyper-V, but external tools connect to the cluster using that network. By necessity, the cluster nodes use IPs on that network for their own communications.

The management network will also carry cluster-specific traffic. More on that later.

Note: Replica uses a management network.

Cluster Communications Networks (Including Cluster Shared Volume Traffic)

A cluster communications network will carry:

  • Cluster heartbeat information. Each node must hear from every other node within a specific amount of time (1 second by default). If it does not hear from a minimum of nodes to maintain quorum, then it will begin failover procedures. Failover is more complicated than that, but beyond the scope of this article.
  • Cluster configuration changes. If any configuration item changes, whether to the cluster’s own configuration or the configuration or status of a protected service, the node that processes the change will immediately transmit to all of the other nodes so that they can update their own local information store.
  • Cluster Shared Volume traffic. When all is well, this network will only carry metadata information. Basically, when anything changes on a CSV that updates its volume information table, that update needs to be duplicated to all of the other nodes. If the change occurs on the owning node, less data needs to be transmitted, but it will never be perfectly quiet. So, this network can be quite chatty, but will typically use very little bandwidth. However, if one or more nodes lose direct connectivity to the storage that hosts a CSV, all of its I/O will route across a cluster network. Network saturation will then depend on the amount of I/O the disconnected node(s) need(s).

Live Migration Networks

That heading is a bit of misnomer. The cluster does not have its own concept of a Live Migration network per se. Instead, you let the cluster know which networks you will permit to carry Live Migration traffic. You can independently choose whether or not those networks can carry other traffic.

Other Identified Networks

The cluster may identify networks that we don’t want to participate in any kind of cluster communications at all. iSCSI serves as the most common example. We’ll learn how to deal with those.

Architectural Goals

Now we know our traffic types. Next, we need to architect our cluster networks to handle them appropriately. Let’s begin by understanding why you shouldn’t take the easy route of using a singular network. A minimally functional Hyper-V cluster only requires that “management” network. Stopping there leaves you vulnerable to three problems:

  • The cluster will be unable to select another IP network for different communication types. As an example, Live Migration could choke out the normal cluster hearbeat, causing nodes to consider themselves isolated and shut down
  • The cluster and its hosts will be unable to perform efficient traffic balancing, even when you utilize teams
  • IP-based problems in that network (even external to the cluster) could cause a complete cluster failure

Therefore, you want to create at least one other network. In the pre-2012 model we could designate specific adapters to carry specific traffic types. In the 2012 and later model, we simply create at least one more additional network to allow cluster communications but not client access. Some benefits:

  • Clusters of version 2012 or new will automatically employ SMB multichannel. Inter-node traffic (including Cluster Shared Volume data) will balance itself without further configuration work.
  • The cluster can bypass trouble on one IP network by choosing another; you can help by disabling a network in Failover Cluster Manager
  • Better load balancing across alternative physical pathways

The Second Supporting Network… and Beyond

Creating networks beyond the initial two can add further value:

  • If desired, you can specify networks for Live Migration traffic, and even exclude those from normal cluster communications. Note: For modern deployments, doing so typically yields little value
  • If you host your cluster networks on a team, matching the number of cluster networks to physical adapters allows the teaming and multichannel mechanisms the greatest opportunity to fully balance transmissions. Note: You cannot guarantee a perfectly smooth balance

Architecting Hyper-V Cluster Networks

Now we know what we need and have a nebulous idea of how that might be accomplished. Let’s get into some real implementation. Start off by reviewing your implementation choices. You have three options for hosting a cluster network:

  • One physical adapter or team of adapters per cluster network
  • Convergence of one or more cluster networks onto one or more physical teams or adapters
  • Convergence of one or more cluster networks onto one or more physical teams claimed by a Hyper-V virtual switch

A few pointers to help you decide:

  • For modern deployments, avoid using one adapter or team for a cluster network. It makes poor use of available network resources by forcing an unnecessary segregation of traffic.
  • I personally do not recommend bare teams for Hyper-V cluster communications. You would need to exclude such networks from participating in a Hyper-V switch, which would also force an unnecessary segregation of traffic.
  • The most even and simple distribution involves a singular team with a Hyper-V switch that hosts all cluster network adapters and virtual machine adapters. Start there and break away only as necessary.
  • A single 10 gigabit adapter swamps multiple gigabit adapters. If your hosts have both, don’t even bother with the gigabit.

To simplify your architecture, decide early:

  • How many networks you will use. They do not need to have different functions. For example, the old management/cluster/Live Migration/storage breakdown no longer makes sense. One management and three cluster networks for a four-member team does make sense.
  • The IP structure for each network. For networks that will only carry cluster (including intra-cluster Live Migration) communication, the chosen subnet(s) do not need to exist in your current infrastructureAs long as each adapter in a cluster network can reach all of the others at layer 2 (Ethernet), then you can invent any IP network that you want.

I recommend that you start off expecting to use a completely converged design that uses all physical network adapters in a single team. Create Hyper-V network adapters for each unique cluster network. Stop there, and make no changes unless you detect a problem.

Comparing the Old Way to the New Way (Gigabit)

Let’s start with a build that would have been common in 2010 and walk through our options up to something more modern. I will only use gigabit designs in this section; skip ahead for 10 gigabit.

In the beginning, we couldn’t use teaming. So, we used a lot of gigabit adapters:

There would be some variations of this. For instance, I would have added another adapter so that I could use MPIO with two iSCSI networks. Some people used Fiber Channel and would not have iSCSI at all.

Important Note: The “VMs” that you see there means that I have a virtual switch on that adapter and the virtual machines use it. It does not mean that I have created a VM cluster network. There is no such thing as a VM cluster network. The virtual machines are unaware of the cluster and they will not talk to it (if they do, they’ll use the Management access point like every other non-cluster system).

Then, 2012 introduced teaming. We could then do all sorts of fun things with convergence. My very least favorite:

This build takes teams to an excess. Worse, the management, cluster, and Live Migration teams will be idle almost all the time, meaning that this 60% of this host’s networking capacity will be generally unavailable.

Let’s look at something a bit more common. I don’t like this one either, but I’m not revolted by it either:

A lot of people like that design because, so they say, it protects the management adapter from problems that affect the other roles. I cannot figure out how they perform that calculus. Teaming addresses any probable failure scenarios. For anything else, I would want the entire host to fail out of the cluster. In this build, a failure that brought the team down but not the management adapter would cause its hosted VMs to become inaccessible because the node would remain in the cluster. That’s because the management adapter would still carry cluster heartbeat information.

My preferred design follows:

Now we are architected against almost all types of failure. In a “real-world” build, I would still have at least two iSCSI NICs using MPIO.

What is the Optimal Gigabit Adapter Count?

Because we had one adapter per role in 2008 R2, we often continue using the same adapter count in our 2012+ builds. I don’t feel that’s necessary for most builds. I am inclined to use two or three adapters in data teams and two adapters for iSCSI. For anything past that, you’ll need to have collected some metrics to justify the additional bandwidth needs.

10 Gigabit Cluster Network Design

10 gigabit changes all of the equations. In reasonable load conditions, a single 10 gigabit adapter moves data more than 10 times faster than a single gigabit adapter. When using 10 GbE, you need to change your approaches accordingly. First, if you have both 10GbE and gigabit, just ignore the gigabit. It is not worth your time. If you really want to use it, then I would consider using it for iSCSI connections to non-SSD systems. Most installations relying on iSCSI-connected spinning disks cannot sustain even 2 Gbps, so gigabit adapters would suffice.

Logical Adapter Counts for Converged Cluster Networking

I didn’t include the Hyper-V virtual switch in any of the above diagrams, mostly because it would have made the diagrams more confusing. However, I would use a Hyper-V team to host all of the logical adapters necessary. For a non-Hyper-V cluster, I would create a logical team adapter for each role. Remember that on a logical team, you can only have a single logical adapter per VLAN. The Hyper-V virtual switch has no such restrictions. Also remember that you should not use multiple logical team adapters on any team that hosts a Hyper-V virtual switch. Some of the behavior is undefined and your build might not be supported.

I would always use these logical/virtual adapter counts:

  • One management adapter
  • A minimum of one cluster communications adapter up to n-1, where n is the number of physical adapters in the team. You can subtract one because the management adapter acts as a cluster adapter as well

In a gigabit environment, I would add at least one logical adapter for Live Migration. That’s optional because, by default, all cluster-enabled networks will also carry Live Migration traffic.

In a 10 GbE environment, I would not add designated Live Migration networks. It’s just logical overhead at that point.

In a 10 GbE environment, I would probably not set aside physical adapters for storage traffic. At those speeds, the differences in offloading technologies don’t mean that much.

Architecting IP Addresses

Congratulations! You’ve done the hard work! Now you just need to come up with an IP scheme. Remember that the cluster builds networks based on the IPs that it discovers.

Every network needs one IP address for each node. Any network that contains an access point will need an additional IP for the CNO. For Hyper-V clusters, you only need a management access point. The other networks don’t need a CNO.

Only one network really matters: management. Your physical nodes must use that to communicate with the “real” network beyond. Choose a set of IPs available on your “real” network.

For all the rest, the member IPs only need to be able to reach each other over layer 2 connections. If you have an environment with no VLANs, then just make sure that you pick IPs in networks that don’t otherwise exist. For instance, you could use 192.168.77.0/24 for something, as long as that’s not a “real” range on your network. Any cluster network without a CNO does not need to have a gateway address, so it doesn’t matter that those networks won’t be routable. It’s preferred, in fact.

Implementing Hyper-V Cluster Networks

Once you have your architecture in place, you only have a little work to do. Remember that the cluster will automatically build networks based on the subnets that it discovers. You only need to assign names and set them according to the type of traffic that you want them to carry. You can choose:

  • Allow cluster communication (intra-node heartbeat, configuration updates, and Cluster Shared Volume traffic)
  • Allow client connectivity to cluster resources (includes cluster communication) and cluster communications (you cannot choose client connectivity without cluster connectivity)
  • Prevent participation in cluster communications (often used for iSCSI and sometimes connections to external SMB storage)

As much as I like PowerShell for most things, Failover Cluster Manager makes this all very easy. Access the Networks tree of your cluster:

I’ve already renamed mine in accordance with their intended roles. A new build will have “Cluster Network”, “Cluster Network 1”, etc. Double-click on one to see which IP range(s) it assigned to that network:

Work your way through each network, setting its name and what traffic type you will allow. Your choices:

  • Allow cluster network communication on this network AND Allow clients to connect through this network: use these two options together for the management network. If you’re building a non-Hyper-V cluster that needs access points on non-management networks, use these options for those as well. Important: The adapters in these networks SHOULD register in DNS.
  • Allow cluster network communication on this network ONLY (do not check Allow clients to connect through this network): use for any network that you wish to carry cluster communications (remember that includes CSV traffic). Optionally use for networks that will carry Live Migration traffic (I recommend that). Do not use for iSCSI networks. Important: The adapters in these networks SHOULD NOT register in DNS.
  • Do not allow cluster network communication on this network: Use for storage networks, especially iSCSI. I also use this setting for adapters that will use SMB to connect to a storage server running SMB version 3.02 in order to run my virtual machines. You might want to use it for Live Migration networks if you wish to segregate Live Migration from cluster traffic (I do not do or recommend that).

Once done, you can configure Live Migration traffic. Right-click on the Networks node and click Live Migration Settings:

Check a network’s box to enable it to carry Live Migration traffic. Use the Up and Down buttons to prioritize.

What About Traffic Prioritization?

In 2008 R2, we had some fairly arcane settings for cluster network metrics. You could use those to adjust which networks the cluster would choose as alternatives when a primary network was inaccessible. We don’t use those anymore because SMB multichannel just figures things out. However, be aware that the cluster will deliberately choose Cluster Only networks over Cluster and Client networks for inter-node communications.

What About Hyper-V QoS?

When 2012 first debuted, it brought Hyper-V networking QoS along with it. That was some really hot new tech, and lots of us dove right in and lost a lot of sleep over finding the “best” configuration. And then, most of us realized that our clusters were doing a fantastic job balancing things out all on their own. So, I would recommend that you avoid tinkering with Hyper-V QoS unless you have tried going without and had problems. Before you change QoS, determine what traffic needs to be attuned or boosted before you change anything. Do not simply start flipping switches, because the rest of us already tried that and didn’t get results. If you need to change QoS, start with this TechNet article.

Your thoughts?

Does your preferred network management system differ from mine? Have you decided to give my arrangement a try? How id you get on? Let me know in the comments below, I really enjoy hearing from you guys!

Cato’s network security feature on the hunt for threats

Cato Networks last week upped its SD-WAN-as-a-service offering Cato Cloud with the Cato Threat Hunting System, a network security feature built into the software-defined WAN platform to detect threats and minimize the time it takes to remove them.

The Cato Threat Hunting System offers full visibility into network traffic, and can access and identify real-time traffic that any endpoint initiates, Cato said in a statement. This means Cato can see IP addresses, session and flow information and application types within the network.

Additionally, Cato said it uses machine learning algorithms to “mine the network” for suspicious activity. If the network security feature deems something is a risk, human analysts from Cato inspect and confirm the alerts and notify the customers of the threat. Customers can also use the security operations center to deploy policies to contain exposed endpoints. 

Cato said the Threat Hunting System differs in its approach by eliminating the need to install additional monitoring tools or sensors within the network. Instead, it is integrated into Cato’s SD-WAN platform. Cato already offers network security features, including next-generation firewalls, secure web gateways and advanced threat protection.

Masergy offers adjustable bandwidth for public connectivity

Masergy recently updated its managed SD-WAN offering to let customers adjust WAN bandwidth as-needed. Masergy previously offered the ability to scale bandwidth in private networks, but the recent update targets networks using public connectivity like broadband internet.

Customers can control their global SD-WAN bandwidth consumption in real-time through Masergy’s Intelligent Service Control portal, according to a company statement. Based on location, adjustments can be made to appropriately designate bandwidth and prepare for data consumption spikes or plunges, Masergy said. The update also allows customers to schedule automatic bandwidth adjustments for upcoming projects.

The update supports uses that require atypical bandwidth usage, such as data backups, multisite video conferences and disaster recovery measures, Masergy said. Customers are billed incrementally for specific increases of bandwidth consumption. The feature is available now as a built-in option in Masergy’s Intelligent Service Control portal.

Verizon SDN deployment growth report results

Enterprises see the value of software-defined networking deployment to help scale network functionality, according to a recent report sponsored by Verizon.

In a survey of 165 senior IT leaders, 49% said they considered the imperative to scale — to increase network agility in order to deliver services more efficiently — a major trigger for SDN deployment. Following closely at 47% was the need to address network security issues and the desire to reduce costs by deploying SDN. In conjunction with network security, respondents said they thought increased network security was a major SDN benefit, in addition to better application performance.

The top concern about SDN deployment included the potential for disruption during implementation. These included concerns about the complexity of migrating existing services. A full 62% of respondents indicated they were concerned they might lack the right in-house IT skills to handle the migration.

In terms of actual deployment, 57% of respondents replied they expect to deploy SDN within the next two years; 15% reported they had already deployed it or were currently in the process of implementation.

London-based Longitude, a research firm acquired by the Financial Times, conducted the survey in the first quarter of 2018.