Tag Archives: monitoring

Netflix launches tool for monitoring AWS credentials

LAS VEGAS — A new open source tool looks to make monitoring AWS credentials easier and more effective for large organizations.

The tool, dubbed Trailblazer, was introduced during a session at Black Hat USA 2018 on Wednesday by William Bengtson, senior security engineer at Netflix, based in Los Gatos, Calif. During his session, Bengtson discussed how his security team took a different approach to reviewing AWS data in order to find signs of potentially compromised credentials.

Bengtson said Netflix’s methodology for monitoring AWS credentials was fairly simple and relied heavily on AWS’ own CloudTrail log monitoring tool. However, Netflix couldn’t rely solely on CloudTrail to effectively monitor credential activity; Bengtson said a different approach was required because of the sheer size of Netflix’s cloud environment, which is 100% AWS.

“At Netflix, we have hundreds of thousands of servers. They change constantly, and there are 4,000 or so deployments every day,” Bengtson told the audience. “I really wanted to know when a credential was being used outside of Netflix, not just AWS.”

That was crucial, Bengtson explained, because an unauthorized user could set up infrastructure within AWS, obtain a user’s AWS credentials and then log in using those credentials in order to “fly under the radar.”

However, monitoring credentials for usage outside of a specific corporate environment is difficult, he explained, because of the sheer volume of data regarding API calls. An organization with a cloud environment the size of Netflix’s could run into challenges with pagination for the data, as well as rate limiting for API calls — which AWS has put in place to prevent denial-of-service attacks.

“It can take up to an hour to describe a production environment due to our size,” he said.

To get around those obstacles, Bengtson and his team crafted a new methodology that didn’t require machine learning or any complex technology, but rather a “strong but reasonable assumption” about a crucial piece of data.

“The first call wins,” he explained, referring to when a temporary AWS credential makes an API call and grabs the first IP address that’s used. “As we see the first use of that temporary [session] credential, we’re going to grab that IP address and log it.”

The methodology, which is built into the Trailblazer tool, collects the first API call IP address and other related AWS data, such as the instance ID and assumed role records. The tool, which doesn’t require prior knowledge of an organization’s IP allocation in AWS, can quickly determine whether the calls for those AWS credentials are coming from outside the organization’s environment.

“[Trailblazer] will enumerate all of your API calls in your environment and associate that log with what is actually logged in CloudTrail,” Bengtson said. “Not only are you seeing that it’s logged, you’re seeing what it’s logged as.”

Bengtson said the only requirement for using Trailblazer is a high level of familiarity with AWS — specifically how AssumeRole calls are logged. The tool is currently available on GitHub.

LiveAction buys Savvius to combine packet monitoring and NPM

LiveAction has acquired Savvius and plans to combine its packet monitoring software with LiveAction’s technology for measuring network performance.

LiveAction announced the acquisition this week, but it did not release financial terms. The vendor said it would use Savvius’ products to broaden LiveAction’s offerings for enterprise networks.

“LiveAction and Savvius will deliver a powerful set of capabilities in a single platform that will simplify our customers’ ability to manage their networks, while preparing for the ever-greater demands of software-defined infrastructure,” Brooks Borcherding, CEO at LiveAction, based in Palo Alto, Calif., said in a statement.

Buying Savvius will make it possible for LiveAction to combine two types of products many network operators typically buy from separate vendors, said Shamus McGillicuddy, an analyst at Enterprise Management Associates, based in Boulder, Colo. Engineers often use a network performance monitor to spot problems and then switch to a packet monitoring product to perform in-depth analyses to pinpoint causes.

“A combined solution can deliver a lot of value,” McGillicuddy said. “Having flow and packet monitoring side by side in one console will be very valuable to LiveAction users.”

The majority of network managers use more than four separate tools to monitor and troubleshoot networks, McGillicuddy said. “This means that they spend a lot of time going from one tool to the next, trying to piece together answers.”

Flow monitoring — the core feature in LiveAction — refers to tools that tap into the NetFlow data collection component built into routers and switches from Cisco and other manufacturers. The software uses the data to determine packet loss and delay and round-trip time, while also showing network administrators how well the network is delivering applications and services.

LiveAction is known for premium pricing

LiveAction analyzes NetFlow records and also sells a module called LiveSensor, which provides packet analysis and performance metrics for network components without the data collection feature. In July, LiveAction launched machine-learning-driven analytics and expanded its device support to 107 networking vendors.

The company’s annual revenue from its network performance monitoring products is between $11 million and $25 million, Gartner reported in its February Magic Quadrant report on the NPM market. The report pointed out that LiveAction mostly focuses on Cisco infrastructure and “is frequently cited by end users as offering a premium-priced solution.”

Formerly called WildPackets, Savvius, based in Walnut Creek, Calif., had embarked on a channel turnaround after being primarily a direct seller for more than 25 years. The vendor had set a goal of generating up to 95% of its revenue from channel partners by the end of the third quarter of 2018. LiveAction sales are mostly through channel partners.

Companies use Savvius packet monitoring for more than break-fix scenarios, according to the company. Its technology is also used to bolster security investigations. As a result, the company has linked its products to security offerings from Cisco, Fortinet and Palo Alto Networks.

Polycom cloud service simplifies device management

Polycom has released a cloud service for provisioning, managing and monitoring its desk and conference room phones. The hardware vendor’s latest attempt to penetrate the cloud market comes a few months before its proposed acquisition by headset-maker Plantronics is set to close.

Polycom Device Management Services for Enterprises (PDMS-E) is a web-based application for controlling Polycom phones from a single user interface. It will let IT administrators manage the settings of individual phones — or every phone all at once. It also will provide analytics on call quality and connectivity issues. The product is now available in North America.

Next quarter, Polycom plans to expand the capabilities of PDMS-E to include Polycom video endpoints and, eventually, the video endpoints of Cisco, Avaya and Lifesize. The vendor will fold Polycom RealConnect — its platform for managing interoperability between its devices and Microsoft Skype for Business — into its new cloud offering.

Also in the third quarter, Polycom plans to release a version of PDMS for service providers, aiming to help those partners improve uptime and enhance their customer portals. The service provider offering will make use of technology and partnerships Polycom inherited from Obihai Technology, which it acquired in January.

“Polycom makes great phones,” said Ira Weinstein, managing partner of Recon Research Inc., based in Coral Springs, Fla. “But the important thing here is for Polycom to have greater value and a stronger footprint in the enterprise, they need to add more value.”

The Polycom cloud service will provide provisioning, management and analytics tools that many businesses aren’t getting from their service providers, Weinstein said. And Polycom can provide more insight than anyone into its own devices.

But Polycom will need to battle against its own public image. “I don’t think the typical person in our industry sees Polycom as a cloud service provider,” Weinstein said.

In announcing PDMS, company executives said they would not comment on the company’s impending acquisition by Plantronics — a $2 billion deal that is set to close in the third quarter of 2018. Polycom has continued to operate as an independent company as the acquisition closes, said Amy Barzdukas, the vendor’s chief marketing officer.

Polycom cloud service extends hardware-based strategy

Polycom decided years ago to make its phones and cameras compatible with the software of a wide range of service providers, rather than build its own calling or web conferencing service.

In a conference call with reporters and analysts this week, CEO Mary McDowell said the company’s longtime strategy had proven to be successful, saying revenue had grown last year for the first time in six years. The formerly public company struggled financially in the years preceding its 2016 acquisition by private equity firm Siris Capital Group LLC.

With the release of PDMS, Polycom is looking to gain a foothold in the cloud market without directly competing with the software vendors that power its hardware, such as Microsoft and Zoom, said Rob Arnold, analyst at Frost & Sullivan.

“It’s pretty much a follow-through on what they said they were going to do last year: focus on device and not infrastructure,” Arnold said. “This way, they are not competing with their partners, and they are staying focused on the hardware and the devices, as they had mentioned.”

As phones become more advanced, with built-in video conferencing capabilities and touchscreen apps, businesses need better monitoring and management tools for those endpoints, Arnold said.

Polycom plans to expand its cloud offerings to include meeting room features, such as automatic attendance rosters, facial recognition and natural language controls.

IT monitoring, org discipline polish Nasdaq DevOps incident response

Modern IT monitoring can bring together developers and IT ops pros for DevOps incident response, but tools can’t substitute for a disciplined team approach to problems.

Dev and ops teams at Nasdaq Corporate Solutions LLC adopted a common language for troubleshooting with AppDynamics’ App iQ platform. But effective DevOps incident response also demanded focus on the fundamentals of team building and a systematic process for following up on incidents to ensure they don’t recur.

“We had some notion of incident management, but there was no real disciplined way for following up,” said Heather Abbott, senior vice president of corporate solutions technology, who joined the New York-based subsidiary of Nasdaq Inc. in 2014. “AppDynamics has [affected] how teams work together to resolve incidents … but we’ve had other housekeeping to do.”

Shared IT monitoring tools renew focus on incident resolution

Heather Abbott, NasdaqHeather Abbott

Nasdaq Corporate Solutions manages SaaS offerings for customers as they shift from private to public operations. Its products include public relations, investor relations, and board and leadership software managed with a combination of Amazon Web Services and on-premises data center infrastructure, though the on-premises infrastructure will soon be phased out.

In the past, Nasdaq’s dev and ops teams used separate IT monitoring tools, and teams dedicated to different parts of the infrastructure also had individualized dashboard views. The company’s shift to cross-functional teams, focused on products and user experience as part of a DevOps transformation, required a unified view into system performance. Now, all stakeholders share the AppDynamics App iQ interface when they respond to an incident.

With a single source of information about infrastructure performance, there’s less finger-pointing among team members during DevOps incident response, which speeds up problem resolution.

“You can’t argue with the data, and people have a better ongoing understanding of the system,” Abbott said. “So, you’re not going in and hunting and pecking every time there’s a complaint or we’re trying to improve something.”

DevOps incident response requires team vigilance

Since Abbott joined Nasdaq, incidents are down more than 35%. She cited the IT monitoring tool in part, but also pointed to changes the company made to the DevOps incident response process. The company moved from an ad hoc process of incident response divided among different departments to a companywide, systematic cycle of regular incident review meetings. Her team conducts weekly incident review meetings and tracks action items from previous incident reviews to prevent incidents from recurring. Higher levels of the organization have a monthly incident review call to review quality issues, and some of these incidents are further reviewed by Nasdaq’s board of directors.

We always need to focus on blocking and tackling … but as we move toward more complex microservices-based architectures, we’ll be building things into the platform like Chaos Monkey.
Heather Abbottsenior vice president of corporate solutions technology, Nasdaq

And there’s still room to improve the DevOps incident response process, Abbott said.

“We always need to focus on blocking and tackling,” she said. “We don’t have the scale within my line of business of Amazon or Netflix, but as we move toward more complex microservices-based architectures, we’ll be building things into the platform like Chaos Monkey.”

Like many companies, Nasdaq plans to tie DevOps teams with business leaders, so the whole organization can work together to improve customer experiences. In the past, Nasdaq has generated application log reports with homegrown tools. But this year, it will roll out AppDynamics’ Business iQ software, first with its investor-relations SaaS products, to make that data more accessible to business leaders, Abbott said.

AppDynamics App iQ will also expand to monitor releases through test, development and production deployment phases. Abbott said Nasdaq has worked with AppDynamics to create intelligent release dashboards to provide better automation and performance trends. “That will make it easy to see how system performance is trending over time, as we introduce change,” he said.

While Nasdaq mainly uses AppDynamics App iQ, the exchange also uses Datadog, because it offers event correlation and automated root cause analysis. AppDynamics has previewed automated root cause analysis based on machine learning techniques. Abbott said she looks forward to the addition of that feature, perhaps this year.

Beth Pariseau is senior news writer for TechTarget’s Cloud and DevOps Media Group. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

ThousandEyes-Juniper pact focuses on hybrid WANs

ThousandEyes has deployed its network performance monitoring agents on routers and customer premises equipment, or CPE, made by Juniper Networks to improve visibility for hybrid WANs and other extended networks.

ThousandEyes software, running as virtual network functions on NFX250 branch routers, will support a wide range of capabilities, the companies said, including gauging network health and confirming traffic paths. Among other capabilities, the agents probe latency and bandwidth, monitor MPLS and automate outage detection. They also can report connection errors for FTP, HTTP, Session Initiation Protocol and Real-Time Transport Protocol-based applications, and carry out root-cause analysis for problems stemming from domain name system and Border Gateway Protocol routing.

The proliferation of hybrid WANs, SD-WAN and SaaS offerings, as well as ongoing consolidation of data centers, means enterprises face visibility challenges with their extended networks. The addition of ThousandEyes’ software is aimed at eliminating some of those challenges, said Mihir Maniar, vice president of product management for Juniper Networks. 

“As more and more of our customers move to cloud-centric networks to realize its cost and agility promises, the migration — often to a hybrid public-private environment — can also bring new network blind spots that, if left unchecked, can wreak havoc on service delivery, application development, SLAs [service-level agreements] and the overall end-user experience,” Maniar said in a statement.

Cloud revenues soar 25% in Q3: IDC

Sales of cloud infrastructure products, such as Ethernet and servers, surged in the third quarter of 2017, growing 25.5% year over year and reaching $11.3 billion, according to the most recent study by IDC. The firm’s Worldwide Quarterly Cloud IT Infrastructure Tracker found that public cloud investments fueled most of the sales increase, representing 68% of all cloud IT infrastructure sales during the quarter. Storage platforms generated the highest growth, with revenue up 45% over the same quarter in 2016.

IDC said all regions of the world, except for Latin America, experienced double-digit growth in cloud infrastructure spending, with the fastest growth in Asia-Pacific and in Central and Eastern Europe. Private cloud revenues reached $3.6 billion, an annual increase of 13.1%. Noncloud IT infrastructure sales, meantime, rose 8% to $14.2 billion.

“2017 has been a strong year for public cloud IT infrastructure growth, accelerating throughout the year,” said Kuba Stolarski, research director for computing platforms at IDC, in a statement.

“While hyperscalers such as Amazon and Google are driving the lion’s share of the growth, IDC is seeing strong growth in the lower tiers of public cloud and continued growth in private cloud on a worldwide scale,” he added.

New Intel and AMD platforms launched in 2017 will provide a further boost to the cloud segment, Stolarksi said, as providers and enterprises take steps to upgrade their IT infrastructures.

Lambda MSA issues preliminary optical specification

The 100G Lambda Multi-Source Agreement, or MSA Group, released preliminary interoperability specifications based on 100 Gbps pulse amplitude modulation 4-based optical technology. The new optical interface specification is intended for next-generation networking equipment and is suitable for tasks requiring increased bandwidth and greater bandwidth density.

In addition to ensuring optical receivers from multiple vendors can work together, the new spec increases the distances supported by both 100 Gigabit Ethernet and 400 GbE  systems from the 500 meters currently specified in the IEEE 802.3 Ethernet standard to up to 10 kilometers for 100 GbE and up to 2 kilometers over duplex single-mode fiber for 400 GbE.

The Lambda MSA group is comprised of major networking vendors, such as Arista Networks, Broadcom, Cisco and Juniper Networks, as well as major enterprises, such as Alibaba and Nokia. Final specifications will be released later in 2018, the MSA Group said.

Time-series monitoring tools give high-resolution view of IT

DevOps shops use time-series monitoring tools to glean a nuanced, historical view of IT infrastructure that improves troubleshooting, autoscaling and capacity forecasting.

Time-series monitoring tools are based on time-series databases, which are optimized for time-stamped data collected continuously or at fine-grained intervals. Since they store fine-grained data for a longer term than many metrics-based traditional monitoring tools, they can be used to compare long-term trends in DevOps monitoring data and to bring together data from more diverse sources than the IT infrastructure alone to link developer and business activity with the behavior of the infrastructure.

Time-series monitoring tools include the open source project Prometheus, which is popular among Kubernetes shops, as well as commercial offerings from InfluxData and Wavefront, the latter of which VMware acquired last year.

DevOps monitoring with these tools gives enterprise IT shops such as Houghton Mifflin Harcourt, an educational book and software publisher based in Boston, a unified view of both business and IT infrastructure metrics. It does so over a longer period of time than the Datadog monitoring product the company used previously, which retains data for only up to 15 months in its Enterprise edition.

“Our business is very cyclical as an education company,” said Robert Allen, director of engineering at Houghton Mifflin Harcourt. “Right before the beginning of the school year, our usage goes way up, and we needed to be able to observe that [trend] year over year, going back several years.”

Allen’s engineering team got its first taste of InfluxData as a long-term storage back end for Prometheus, which at the time was limited in how much data could be held in its storage subsystem — Prometheus has since overhauled its storage system in version 2.0. Eventually, Allen and his team decided to work with InfluxData directly.

Houghton Mifflin Harcourt uses InfluxData to monitor traditional IT metrics, such as network performance, disk space, and CPU and memory utilization, in its Amazon Web Services (AWS) infrastructure, as well as developer activity in GitHub, such as pull requests and number of users. The company developed its own load-balancing system using Linkerd and Finagle. And InfluxData also collects data on network latencies in that system, and it ties in with Zipkin’s tracing tool to troubleshoot network performance issues.

Multiple years of highly granular infrastructure data empowers Allen’s team of just five people to support nearly 500 engineers who deliver applications to the company’s massive Apache Mesos data center infrastructure.

InfluxData platform

Time-series monitoring tools boost DevOps automation

Time-series data also allows DevOps teams to ask more nuanced questions about the infrastructure to inform troubleshooting decisions.

“It allows you to apply higher-level statistics to your data,” said Louis McCormack, lead DevOps engineer for Space Ape Games, a mobile video game developer based in London and an early adopter of Wavefront’s time-series monitoring tool. “Instead of something just being OK or not OK, you can ask, ‘How bad is it?’ Or, ‘Will it become very problematic before I need to wake up tomorrow morning?'”

Instead of something just being OK or not OK, you can ask, ‘How bad is it?’ Or, ‘Will it become very problematic before I need to wake up tomorrow morning?’
Louis McCormacklead DevOps engineer, Space Ape Games

Space Ape’s infrastructure to manage is smaller than Houghton Mifflin Harcourt’s, at about 600 AWS instances compared to about 64,000. But Space Ape also has highly seasonal business cycles, and time-series monitoring with Wavefront helps it not only to collect granular historical data, but also to scale the IT infrastructure in response to seasonal fluctuations in demand.

“A service in AWS consumes Wavefront data to make the decision about when to scale DynamoDB tables,” said Nic Walker, head of technical operations for Space Ape Games. “Auto scaling DynamoDB is something Amazon has only just released as a feature, and our version is still faster.”

The company’s apps use the Wavefront API to trigger the DynamoDB autoscaling, which makes the tool much more powerful, but also requires DevOps engineers to learn how to interact with the Wavefront query language, which isn’t always intuitive, Walker said. In Wavefront’s case, this learning curve is balanced by the software’s various prebuilt data visualization dashboards. This was the primary reason Walker’s team chose Wavefront over open source alternatives, such as Prometheus. Wavefront is also offered as a service, which takes the burden of data management out of Space Ape’s hands.

Houghton Mifflin Harcourt chose a different set of tradeoffs with InfluxData, which uses a SQL-like query language that was easy for developers to learn, but the DevOps team must work with outside consultants to build custom dashboards. Because that work isn’t finished, InfluxData has yet to completely replace Datadog at Houghton Mifflin Harcourt, though Allen said he hopes to make the switch this quarter.

Time-series monitoring tools scale up beyond the capacity of traditional metrics monitoring tools, but both companies said there’s room to improve performance when crunching large volumes of data in response to broad queries. Houghton Mifflin Harcourt, for example, queries millions of data points at the end of each month to calculate Amazon billing trends for each of its Elastic Compute Cloud instances.

“It still takes a little bit of a hit sometimes when you look at those tags, but [InfluxEnterprise version] 1.3 was a real improvement,” Allen said.

Allen added that he hopes to use InfluxData’s time-series monitoring tool to inform decisions about multi-cloud workload placement based on cost. Space Ape Games, meanwhile, will explore AI and machine learning capabilities available for Wavefront, though the jury’s still out for Walker and McCormack whether AIOps will be worth the time it takes to implement. In particular, Walker said he’s concerned about false positives from AI analysis against time-series data.

Beth Pariseau is senior news writer for TechTarget’s Data Center and Virtualization Media Group. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

SolarWinds’ AppOptics melds network device monitoring, app behavior

SolarWinds has beefed up its cloud monitoring platform with tools that allow managers to track both application performance and infrastructure components in a single view.

The upgrades to SolarWinds’ Cloud software-as-a-service portfolio include a new application, as well as updates to two existing products.

The new network device monitoring application, AppOptics, uses a common dashboard to track application performance metrics and network component health — both within the enterprise network or throughout a public cloud provider’s network.

The software combines two existing SolarWinds cloud monitoring apps, Librato and TraceView, into a single network device monitoring product, said Christoph Pfister, executive vice president of products at the company, based in Austin, Texas. Initially, AppOptics will support both Amazon Web Services and Microsoft Azure; support for other providers could be added at a later date, Pfister said.

“Infrastructure and application monitoring are now in separate silos,” he said. “We are trying to integrate them. The digital experience has become very important. But behind the scenes, applications have become very complex, making monitoring and troubleshooting challenging.”

AppOptics uses downloaded agents to collect tracing, host and infrastructure monitoring metrics to feed a common dashboard, through which managers can keep tabs on network device monitoring and application behavior and take appropriate steps in the wake of performance degradation.

In addition to launching AppOptics, SolarWinds added a more powerful search engine and more robust analytics to Papertrail, its log management application. And it added capabilities to Pingdom, a digital experience measurement tool, to allow enterprises to react more quickly to issues that might affect user engagement with a website or service.

Both AppOptics and Papertrail are available Nov. 20; SolarWinds will release Pingdom Nov. 27. All are available as downloads from SolarWinds. The cloud monitoring platform is priced at $7.50 per host, per month.

SolarWinds AppOptics monitoring dashboard

Ruckus launches high-speed WLAN switches

Ruckus Wireless Inc. introduced a new group of wireless LAN switches engineered to support network edge and aggregation functions.

The new switches, the ICX 7650 series, come in three models, including a multi-gigabit access switch that supports both 2.5 Gbps and 5 Gbps throughput; a core switch with Layer 3  features and up to 24 10 Gbps and 24 1 Gbps fiber ports of capacity; and a high-performance gigabit switch that can be deployed as a stack of up to 12 switches.

“As more wireless users access cloud and data-intensive applications on their devices, the demand for high-speed, resilient edge networks continues to increase,” said Siva Valliappan, vice president of campus product management at Ruckus, based in Sunnyvale, Calif., in a statement. “The ICX 7650 switch family captures all these requirements, enabling users to scale and future-proof their network infrastructure to meet the increasing demand of wired and wireless network requirements for seven to 10 years,” he added.

The switches, available early next year, are priced starting at $11,900, Ruckus said.

DDoS attacks on rise, thanks to IoT

Distributed denial-of-service, or DDoS, attacks have risen sharply in the past year, according to a new security report from Corero Network Security.

The firm, based in Marlborough, Mass., said Corero enterprise customers experienced an average of 237 DDoS attempts each day during the third quarter of 2017, a 35% increase from the year-earlier period and almost double from what they experienced in the first quarter of 2017.

The company attributed the growth in attacks to DDoS for-hire services and the proliferation of unsecured internet of things (IoT) devices. One piece of malware, dubbed the Reaper, has already infected thousands of IoT gadgets, Corero said.

In addition, Corero’s study found that hackers are using multiple ways to penetrate an organization’s security perimeter. Some 20% of attacks recorded in the second quarter of 2017 used multiple attack vectors, the company said.

Finally, Corero said ransom-oriented DDoS attacks also rose in the third quarter, attributing many of them to one group, the Phantom Squad, which targeted companies across the United States, Europe and Asia.

AIOps tools portend automated infrastructure management

Automated infrastructure management took a step forward with the emergence of AIOps monitoring tools that use machine learning to proactively identify infrastructure problems.

IT monitoring tools released in the last two months by New Relic, BMC and Splunk incorporate AI features, mainly machine learning algorithms, to correlate events in the IT infrastructure with problems in application and business environments. Enterprise IT ops pros have begun to use these tools to address problems before they arise.

New Relic’s machine learning features, codenamed Seymour at its beta launch in 2016, helped the mobile application development team at Scripps Networks Interactive in Knoxville, Tenn., identify misconfigured Ruby application dependencies and head off potential app performance issues.

“Just doing those simple updates allowed them to fix some errors they hadn’t realized were there,” said Mark Kelly, director of cloud and infrastructure architecture at Scripps, which owns web and TV channels, such as Food Network and HGTV that are viewed by an average of 50 to 70 million people per day.

Seymour is now generally available in New Relic’s Radar and Error Profiles features, which add a layer of analytics over the performance data collected by New Relic’s application performance management tools that help users hone their reactions. Radar uses algorithms similar to e-commerce product recommendation engines to tailor dashboards to individual users’ automated infrastructure management needs. The Error Profiles feature narrows down the possible causes of IT infrastructure errors. An engineer can then scan a prioritized list of the most unusual behaviors to identify a problem’s root cause.

“Before Radar, [troubleshooting] required some manual digging — now it’s automatically identifying problem areas we might want to look for,” Kelly said. “It takes some of that searching for the needle in the haystack out of the equation for us.”

Screenshot of APM error messages
A screenshot of New Relic’s Error Profiles feature shows the troubleshooting hints it delivers to IT pros.

Data correlation stems IT ops ticket tsunami

IT monitoring tools from BMC and Splunk also expanded their AIOps features this month. BMC’s TrueSight 11 IT monitoring and management platform will use new algorithms within the TrueSight Intelligence SaaS product to categorize service tickets so IT ops pros can more quickly resolve incidents, as well as assess the financial impact of bugs in application code. Event stream analytics in TrueSight Intelligence can predict IT service deterioration, and a separately licensed TrueSight Cloud Cost Control product forecasts infrastructure costs to optimize workload placement in hybrid clouds.

We want to be able to call the customer and say, ‘Three disk drives are going to fail, and here’s why.’
Chris Adamspresident and COO, Park Place

Park Place Technologies, an after-warranty server management company in Cleveland, Ohio, and a BMC partner, plans to fold TrueSight Intelligence analytics into a product that forewarns customers of equipment outages.

“We have ways to filter email alerts sent by equipment based on subject lines, but TrueSight does it faster, and can pull out strings of data from the emails as well,” said Chris Adams, president and COO of Park Place. “We want to be able to call the customer and say, ‘Three disk drives are going to fail, and here’s why.'”

Version 3.0 of Splunk’s IT Service Intelligence (ITSI) tool also correlates event data to pass along critical alerts to IT admins so they can more easily process Splunk log and monitoring data. ITSI 3.0 root cause analysis features predict the outcome of infrastructure changes, more quickly identify problem servers, and integrate with ITSM tools such as ServiceNow and PagerDuty — which offer their own machine learning features to further prune the flow of IT alerts.

AppDynamics presentation at the AppDynamics Summit October 19, 2017
Linda Tong, left, VP of AppDynamics, speaks during The Convergence of IT and Business presentation at AppDynamics Summit on Thursday, October 19, 2017, in New York City.

Automated infrastructure management takes shape with AIOps

Eventually, IT pros hope that AIOps monitoring tools will go beyond dashboards and into automated infrastructure management action through proactive changes to infrastructure problems, as well as application-pull requests that address code problems through the DevOps pipeline.

“The Radar platform has that potential, especially if it can start integrating into our pipeline and help change events before they happen,” Kelly said. “I want it to help me do some of those automated tasks, detect my stacks going bad in advance, and give me some of that proactive feedback before I have a problem.”

Such products are already on the way. Cisco previewed a feature at its AppDynamics Summit recently that displays a forecast of events along a timeline, and highlights transactions that will be critically impacted by problems as they develop. The still-unnamed tool presents theories about the causes of future problems along with recommended actions for remediation. In the product demo, the user interface presented an “execute” button for recommended remediation, along with a button to choose “other actions.”

Cisco plans to eventually integrate technology from recently acquired Perspica with AppDynamics, which will perform machine learning analysis on streams of infrastructure data at wire speed.

For now, AppDynamics customers said they’re interested in ways such AIOps features can improve business outcomes. But the tools must still prove themselves valuable beyond what humans can forecast based on their own experience.

“It’s not going to replace a good analyst at this point — that’s what the analyst does for us, says how a change is going to affect the business,” said Krishna Dammavalam, SRE for Idexx Labs, a veterinary testing and lab equipment company in Westbrook, Maine. “If machine learning’s predictions are better than the analyst’s, that’s where the product will have value, but if the analyst is still better, there will still be room to grow.”

Beth Pariseau is senior news writer for TechTarget’s Data Center and Virtualization Media Group. Write to her at bpariseau@techtarget.com or follow @PariseauTT on Twitter.

New Akamai products aim to court software developers

Just six months after acquiring applications performance monitoring company SOASTA, Akamai has released a number of new offerings, including a new version of SOASTA’s mPulse, marking perhaps the first time the content delivery network provider has directly marketed itself to developers.

And the process of getting to the new Akamai products has been an eye-opener. “We’ve been looking at ourselves in a mirror and thought we had our eyes open, but we’ve been kidding ourselves,” admitted Ari Weil, strategic product and marketing leader of Akamai. “We’ve heard ‘we don’t care about you guys, you’re not in our consciousness,'” he said.

Standing out in a crowd

In the DevOps world, Akamai is a well-known name on the operations side of things. But despite DevOps bringing the two sides closer together, at the end of the day developers, under pressure to create better applications more quickly, are understandably focused on the tools that can help them get work done faster. The existing Akamai products weren’t really “developer-friendly” and even before being acquired, SOASTA found itself having to occasionally sell the value proposition of its mPulse APM tool.

“They (Akamai) still have some work to do when it comes to building developer awareness,” said Jeffrey Hammond, vice president and principal analyst serving application development and delivery professionals at Forrester Research. “Most developers tend to know who Akamai is conceptually but not necessarily why they need to care, and what Akamai’s technology can do to improve their development efforts.”

But Weil is determined to change all of that. The latest Akamai products are very DevOps-oriented and are designed to make it simple for developers to work with Akamai offerings directly. Now developers can tie in to a variety of public clouds, work with a selection of public APIs and fine-tune performance monitoring to get just the information they need. Akamai has also opened its tools up to working with Varnish, a popular open source http cache system. 

Rethinking the business

But to get to this point with the new Akamai products, Weil and his team had to have lots of conversations with developers to try to understand their perspective. “Developers don’t want to learn how to work with you. They want you to learn how to work with them in the lowest friction way possible,” he explained. And that took some intense rethinking of the business, perhaps no more so when it came to public APIs. “We thought we really knew what we were doing, but we realized we weren’t thinking about what the developers really needed when it came to APIs. This was a material shift for us and how we do business.” The company was also thinking about simplifying the developer’s life when it came to deploying Akamai as code in the cloud, he said. “If you put your app on Akamai, all the risk management issues just go away,” Weil said.

When it came to updating the mPulse Akamai products, Weil said the company tried to keep in mind how developers would use the tool. There are now hooks built into open source test communities, and developers have more control and insight into code analysis than before. And with those tweaks came the feedback Weil wanted to hear. “Developers are telling us this (version of mPulse) helps explain to business why they’re doing what they’re doing and now business can understand the build.”