Tag Archives: data

Sigma analytics platform’s interface simplifies queries

In desperate need of data dexterity, Volta Charging turned to the Sigma analytics platform to improve its business intelligence capabilities and ultimately help fuel its growth.

Volta, based in San Francisco and founded in 2010, is a provider of electric vehicle charging stations, and three years ago, when Mia Oppelstrup started at Volta, the company faced a significant problem.

Because there aren’t dedicated charging stations the same way there are dedicated gas stations, Volta has to negotiate with organizations — mostly retail businesses — for parking spots where Volta can place its charging stations.

Naturally, Volta wants its charging stations placed in the parking spots with the best locations near the business they serve. But before an organization gives Volta those spots, Volta has to show that it makes economic sense, that by putting electric car charging stations closest to the door it will help boost customer traffic through the door.

That takes data. It takes proof.

Volta, however, was struggling with its data. It had the necessary information, but finding the data and then putting it in a digestible form was painstakingly slow. Queries had to be submitted to engineers, and those engineers then had to write code to transform the data before delivering a report.

Any slight change required an entirely new query, which involved more coding, time and labor for the engineers.

But then the Sigma analytics platform transformed Volta’s BI capabilities, Volta executives said.

Curiosity isn’t enough to justify engineering time, but curiosity is a way to get new insights. By working with Sigma and doing queries on my own I’m able to find new metrics.
Mia OppelstrupBusiness intelligence manager, Volta Charging

“If I had to ask an engineer every time I had a question, I couldn’t justify all the time it would take unless I knew I’d be getting an available answer,” said Oppelstrup, who began in marketing at Volta and now is the company’s business intelligence manager. “Curiosity isn’t enough to justify engineering time, but curiosity is a way to get new insights. By working with Sigma and doing queries on my own I’m able to find new metrics.”

Metrics, Oppelstrup added, that she’d never be able to find on her own.

“It’s huge for someone like me who never wrote code,” Oppelstrup said. “It would otherwise be like searching a warehouse with a forklift while blindfolded. You get stuck when you have to wait for an engineer.”

Volta looked at other BI platforms — Tableau and Microsoft’s Power BI, in particular — but just under two years ago chose Sigma and has forged ahead with the platform from the 2014 startup.

The product

Sigma Computing was founded by the trio of Jason Frantz, Mike Speiser and Rob Woollen.

Based in San Francisco, the vendor has gone through three rounds of financing and to date raised $58 million, most recently attracting $30 million in November 2019.

When Sigma was founded, and ideas for the Sigma analytics platform first developed, it was in response to what the founders viewed as a lack of access to data.

“Gartner reported that 60 to 73 percent of data is going unused and that only 30 percent of employees use BI tools,” Woollen, Sigma’s CEO, said. “I came back to that — BI was stuck with a small number of users and data was just sitting there, so my mission was to solve that problem and correct all this.”

Woollen, who previously worked at Salesforce and Sutter Hill Ventures — a main investor in Sigma — and his co-founders set out to make data more accessible. They set out to design a BI platform that could be used by ordinary business users — citizen data scientists — without having to rely so much on engineers, and one that respond quickly no matter the queries users ask of it.

Sigma launched the Sigma analytics platform in November 2018.

Like other BI platforms, Sigma — entirely based in the cloud — connects to a user’s cloud data warehouse in order to access the user’s data. Unlike most BI platforms, however, the Sigma analytics platform is a low-code BI tool that doesn’t require engineering expertise to sift through the data, pull the data relevant to a given query and present it in a digestible form.

A key element of that is the Sigma analytics platform’s user interface, which resembles a spreadsheet.

With SQL running in the background to automatically write the necessary code, users can simply make entries and notations in the spreadsheet and Sigma will run the query.

“The focus is always on expanding the audience, and 30 percent employee usage is the one that frustrates me,” Woollen said. “We’re focused on solving that problem and making BI more accessible to more people.”

The interface is key to that end.

“Products in the past focused on a simple interface,” Woollen said. “Our philosophy is that just because a businessperson isn’t technical that shouldn’t mean they can’t ask complicated questions.”

With the Sigma analytics platform’s spreadsheet interface, users can query their data, for example, to examine sales performance in a certain location, time or week. They can then tweak it to look at a different time, or a different week. They can then look at it on a monthly basis, compare it year over year, add and subtract fields and columns at will.

And rather than file a ticket to the IT department for each separate query, they can run the query themselves.

“The spreadsheet interface combines the power to ask any question of the data without having to write SQL or ask a programmer to do it,” Woollen said.

Giving end users power to explore data

Volta knew it had a data dexterity problem — an inability to truly explore its data given its reliance on engineers to run time- and labor-consuming queries — even before Oppelstrup arrived. The company was looking at different BI platforms to attempt to help, but most of the platforms Volta tried out still demanded engineering expertise, Oppelstrup said.

The outlier was the Sigma analytics platform.

“Within a day I was able to set up my own complex joins and answer questions by myself in a visual way,” Oppelstrup said. “I always felt intimidated by data, but Sigma felt like using a spreadsheet and Google Drive.”

One of the significant issues Volta faced before it adopted the Sigma analytics platform was the inability of its salespeople to show data when meeting with retail outlets and attempting to secure prime parking spaces for Volta’s charging stations.

Because of the difficulty accessing data, the salespeople didn’t have the numbers to prove that by placing charging stations near the door it would increase customer traffic.

With the platform’s querying capability, however, Oppelstrup and her team were able to make the discoveries that armed Volta’s salespeople with hard data rather than simply anecdotes.

They could now show a bank a surge in the use of charging stations near banks between 9 a.m. and 4 p.m., movie theaters a similar surge in the use just before the matinee and again before the evening feature, and grocery stores a surge near stores at lunchtime and after work.

They could also show that the charging stations were being used by actual customers, and not by random people charging up their vehicles and then leaving without also going into the bank, the movie theater or the grocery store.

“It’s changed how our sales team approaches its job — it used to just be about relationships, but now there’s data at every step,” Oppelstrup said.

Sigma enables Oppelstrup to give certain teams access to certain data, everyone access to other data, and importantly, easily redact data fields within a set that might otherwise prevent her from sharing information entirely, she said.

And that gets to the heart of Woollen’s intent when he helped start Sigma — enabling business users to work with more data and giving more people that ability to use BI tools.

“Access leads to collaboration,” he said.

Go to Original Article
Author:

AWS leak exposes passwords, private keys on GitHub

An Amazon Web Services engineer uploaded sensitive data to a public GitHub repository that included customer credentials and private encryption keys.

Cybersecurity vendor UpGuard earlier this month found the exposed GitHub repository within 30 minutes of its creation. UpGuard analysts discovered the AWS leak, which was slightly less than 1 GB and contained log files and resource templates that included hostnames for “likely” AWS customers.

“Of greater concern, however, were the many credentials found in the repository,” UpGuard said in its report Thursday. “Several documents contained access keys for various cloud services. There were multiple AWS key pairs, including one named ‘rootkey.csv,’ suggesting it provided root access to the user’s AWS account.”

The AWS leak also contained a file for an unnamed insurance company that included keys for email and messaging providers, as well as other files containing authentication tokens and API keys for third-party providers. UpGuard’s report did not specify how many AWS customers were affected by the leak.

UpGuard said GitHub’s token scanning feature, which is opt-in, could have detected and automatically revoked some of the exposed credentials in the repository, but it’s unclear how quickly detection would have occurred. The vendor also said the token scanning tool would not have been able to revoke exposed passwords or private keys.

The documents in the AWS leak also bore the hallmarks of an AWS engineer, and some of the documents included the owner’s name. UpGuard said it found a LinkedIn profile for an AWS engineer that matched the owner’s exact full name, and the role matched the types of data found in the repository; as a result, the vendor said it was confident the owner was an AWS engineer.

While it’s unclear why the engineer uploaded such sensitive material to a public GitHub repository, UpGuard said there was “no evidence that the user acted maliciously or that any personal data for end users was affected, in part because it was detected by UpGuard and remediated by AWS so quickly.”

UpGuard said at approximately 11 a.m. on Jan. 13, its data leaks detection engine identified potentially sensitive information had been uploaded to the GitHub repository half an hour earlier. UpGuard analysts reviewed the documents and determined the sensitive nature of the data as well as the identity of the likely owner. An analyst contacted AWS’ security team at 1:18 p.m. about the leak, and by 4 p.m. public access to the repository had been removed. SearchSecurity contacted AWS for comment, but at press time the company had not responded.

Go to Original Article
Author:

Investments in data storage vendors topped $2B in 2019

Data storage vendors received $2.1 billion in private funding in 2019, according to SearchStorage.com analysis of data from websites that track venture funding. Not surprisingly, startups in cloud backup, data management and ultrafast scale-out flash continue to attract the greater interest from private investors.

Six private data storage vendors closed funding rounds over more than $100 million in 2019, all in the backup/cloud sector. It’s a stretch to call most of these startups — all but one of the companies have been selling products for years.

A few vendors with disruptive storage hardware also got decent chunks of money to build out arrays and storage systems, although these rounds were much smaller than the data protection vendors received.

According to a recent report by PwC/ CB Insights MoneyTree, 213 U.S.-based companies closed funding rounds of at least $100 million last year. The report pegged overall funding for U.S. companies at nearly $108 billion, down 9% year on year but well above the $79 billion total from 2017.

Despite talk of a slowing global economy, data growth is expected to accelerate for years to come. And as companies mine new intelligence from older data, data centers need more storage and better management than ever. The funding is flowing more to vendors that manage that data than to systems that store it.

“Investors don’t lead innovation; they follow innovation. They see a hot area that looks like it’s taking off, and that’s when they pour money into it,” said Marc Staimer, president of Dragon Slayer Consulting in Beaverton, Ore.

Here is a glance at the largest funding rounds by storage companies in 2019, starting with software vendors:

Kaseya Limited, $500 million: Investment firm TPG will help Kaseya further diversify the IT services it can offer to manage cloud providers. Kaseya has expanded into backup in recent years, adding web-monitoring software ID Agent last year. That deal followed earlier pickups of Cloud Spanning Apps and Unitrends.

Veeam Software, $500 million: Veeam pioneered backup of virtual machines and serves many Fortune 500 companies. Insight Partners invested half of a billion dollars in Veeam in January 2019, and followed up by buying Veeam outright in January 2020 for a $5 billion valuation. That may lead to an IPO. Veeam headquarters are shifting to the U.S. from Switzerland, and Insight plans to focus on landing more U.S. customers.

Rubrik, $261 million: The converged storage vendor has amassed $553 million since launching in 2014. The latest round of Bain Capital investment reportedly pushed Rubrik’s valuation north of $3 billion. Flush with investment, Rubrik said it’s not for sale — but is shopping to acquire hot technologies, including AI, data analytics and machine learning.

Clumio, $175 million: Sutter Hill Ventures provided $40 million in April, on top of an $11 million 2017 round. It then came back for another $135 million bite in November, joined by Altimeter Capital. Clumio is using the money to add cybersecurity to its backup as a service in Amazon Web Services.

Acronis, $147 million: Acronis was founded in 2003, so it’s halfway into its second decade. But the veteran data storage vendor has a new focus of backup blended with cybersecurity and privacy, similar to Clumio. The Goldman Sachs-led funding helped Acronis acquire 5nine to manage data across hybrid Microsoft clouds.

Druva, $130 million: Viking Global Investors led a six-participant round that brought Druva money to expand its AWS-native backup and disaster recovery beyond North America to international markets. Druva since has added low-cost tiering to Amazon Glacier, and CEO Jaspreet Singh has hinted Druva may pursue an IPO.

Notable 2019 storage funding rounds

Data storage startups in hardware

Innovations in storage hardware underscore the ascendance of flash in enterprise data centers. Although fewer in number, the following storage startups are advancing fabrics-connected devices for high-performance workloads.

Over time, these data storage startups may mature to be able to deliver hardware that blends low latency, high IOPS and manageable cost, emerging as competitors to leading array vendors. For now, these products will have limited market to companies that needs petabytes (PB) (or more) of storage, but the technologies bear watching due to their speed, density and performance potential.

Lightbits Labs, $50 million: The Israel-based startup created the SuperSSD array for NVMe flash. The Lightbits software stack converts generic in-the-box TCP/IP into a switched Ethernet fabric, presenting all storage as a single giant SSD. SuperSSD starts at 64 PB before data reduction. Dell EMC led Lightbits’ funding, with contributions from Cisco and Micron Technology.

Vast Data, $40 million: Vast’s Universal Storage platform is not for everyone. Minimum configuration starts at 1 PB. Storage class memory and low-cost NAND are combined for unified block, file and object storage. Norwest Venture Partners led the round, with participation from Dell Technologies Capital and Goldman Sachs.

Honorable mentions in hardware include Pavilion Data Systems and Liqid. Pavilion is one of the last remaining NVMe all-flash startups, picking up $25 million in a round led by Taiwania Capital and RPS Ventures to flesh out its Hyperparallel Flash Array.

Liqid is trying to break into composable infrastructure, a term coined by Hewlett Packard Enterprise to signify the ability for data centers to temporarily lease capacity and hardware by the rack. Panorama Point Partners provided $28 million to help the startup flesh out its Liqid CI software platform.

Go to Original Article
Author:

Microsoft misconfiguration exposed 250M customer service records

Microsoft became the latest organization to accidentally expose private data on the web.

The software giant Wednesday admitted it had exposed 250 million customer support records on five Elasticsearch servers, which were inadvertently made publicly accessible on the web for nearly a month.

According to Comparitech, which discovered the exposure, most personally identifiable information (PII) such as payment information was redacted. However, exposed information included customer email addresses, IP addresses, locations, descriptions of customer service and support claims and cases, Microsoft support agent emails, case numbers, resolutions, remarks, and internal notes marked as confidential.

“I was immediately stunned by the size and by the structure of data there, and even when I saw that most of the data there was automatically redacted, still there were some records with personal data in plain text,” Bob Diachenko, leader of Comparitech’s security research team, told SearchSecurity.

Microsoft, which corrected the misconfiguration last month, issued a statement that said it found no malicious use of the exposed data. The company said its investigation found that misconfigured Azure security rules were applied to the databases in early December.

On Dec. 28, according to Comparitech, the databases were first indexed by BinaryEdge, a search engine. One day later, Diachenko discovered the exposed databases and immediately contacted Microsoft. Within two days, the servers and data were secured.

“They acted really quickly and professionally,” Diachenko said. “In general, Microsoft’s response was exemplary. I wish every company would have such a brilliant internet response protocol in place.”

“We have solutions to help prevent this kind of mistake, but unfortunately, they were not enabled for this database,” Microsoft said in its statement. It is unknown what these solutions are and why they weren’t in place; when SearchSecurity contacted Microsoft, the company declined to comment beyond the public statement.

Go to Original Article
Author:

New Confluent Platform release boosts event streaming quality

Event streaming is a critical component of modern data management and analysis, bringing real-time data to organizations. One of the most popular tools for event streaming is the open source Apache Kafka technology that is at the foundation of the commercial Confluent platform.

The vendor, based in Mountain View, Calif., has enhanced the platform with capabilities that make event streaming more secure and resilient.

The Confluent Platform 5.4 event streaming update became generally available Wednesday and benefits from improvements that first landed in the Apache Kafka 2.4 update that was released on Dec. 18. Beyond what’s available in the Kafka update, Confluent’s new release adds role-based access control (RBAC) security, improved disaster recovery and enhanced schema validation for data quality.

Confluent is on a path to improve the usability and manageability of Kafka, said Maureen Fleming, an IDC analyst.

“The introduction of Confluent Schema Registry simplifies and improves control over schema validation and enforcement,” Fleming said. “This aligns well with efforts enterprises are going through to ensure their data is trustworthy.”

The Confluent Platform update also introduces support for CloudEvents, an open specification for describing event data. Fleming noted that improvements in audit logging and support for the CloudEvents specification provide mechanisms for more sophisticated monitoring and use of security-related anomaly detection algorithms.

Screenshot of Confluent schema validation process
Confluent Platform schema validation process

“The improved logging also supports regulatory compliance requirements of Confluent’s customers,” she said.

Securing event data

RBAC is a critical security mechanism that can be used to ensure that only authorized users get access to a given service. Confluent Platform has had integration with directory-based security policy systems, including Microsoft ActiveDirectory, in the past, noted Addison Huddy, group product manager at Confluent. He said the new RBAC system provides more control than what Confluent previously delivered.

The introduction of Confluent Schema Registry simplifies and improves control over schema validation and enforcement. This aligns well with efforts enterprises are going through to ensure their data is trustworthy.
Maureen FlemingAnalyst, IDC

“What role-based access control does is it allows you to take the groups that you have defined already inside of something like Active Directory, and you tie those to roles that we defined in the system,” Huddy said.

Confluent Platform 5.4 has a component that enables administrators to define roles and then have policies on those roles enforced across the platform, he added.

Schema Registry improves data quality

The Confluent Schema Registry is a centralized location where teams can upload their data schemas. Kafka as a platform generally has more users who read and consume data from an event stream than users who write data, Huddy noted.

“So now if I’m writing an application, that’s a consumer of data I don’t have to coordinate with directly to say, “Hey, what serialization format did you use?'” Huddy said. “I can go out and reach the schema registry to grab that data.”

Going a step further, the schema registry can also be used to help enforce data quality, by only accepting data that adheres to a given schema model that is defined in the registry.

KsqlDB now in preview

The new Confluent Platform release also includes a technical preview of the ksqldb event streaming database technology that first became generally available as an open source project on Nov. 20.

One of Confluent customers’ main goals is to build enterprise event streaming applications, said Praveen Rangnath, senior director of product marketing at Confluent. Without ksqlDB, building enterprise event streaming applications would be a more complicated process involving putting multiple distributed systems together, he said.

“What we’re trying to do with ksqlDB is essentially integrate those systems into a single solution to just make it super easy for developers to build event streaming applications,” Rangnath said.

Go to Original Article
Author:

Storytelling using data makes information easy to digest

Storytelling using data is helping make analytics digestible across entire organizations.

While the amount of available data has exploded in recent years, the ability to understand the meaning of the data hasn’t kept pace. There aren’t enough trained data scientists to meet demand, often leaving data interpretation in the hands of both line-of-business employees and high-level executives mostly guessing at the underlying meaning behind data points.

Storytelling using data, however, changes that.

A group of business intelligence software vendors are now specializing in data storytelling, producing platforms that go one step further than traditional BI platforms and attempt to give the data context by putting it in the form of a narrative.

One such vendor is Narrative Science, based in Chicago and founded in 2010. On Jan. 6, Narrative Science released a book entitled Let Your People Be People that delves into the importance of storytelling for businesses, with a particular focus on storytelling using data.

Recently, authors Nate Nichols, vice president of product architecture at Narrative Science, and Anna Schena Walsh, director of growth marketing, answered a series of questions about storytelling using data.

Here in Part II of a two-part Q&A they talk about why storytelling using data is a more effective way to interpret data than traditional BI, and how data storytelling can change the culture of an organization. In Part I, they discussed what data storytelling is and how data can be turned into a narrative that has meaning for an organization.

What does emphasis an on storytelling in the workplace look like, beyond a means of explaining the reasoning behind data points?

Nate NicholsNate Nichols

Nate Nichols: As an example of that, I’ve been more intentional since the New Year about applying storytelling to meetings I’ve led, and it’s been really helpful. It’s not like people are gathering around my knee as I launch into a 30-minute story, but just remembering to kick off a meeting with a 3-minute recap of why we’re here, where we’re coming from, what we worked on last week and what the things are that we need going forward. It’s really just putting more time into reminding people of why, the cause and effect, just helping people settle into the right mindset. Storytelling is an empirically effective way of doing it.

We didn’t start this company to be storytellers — we really wanted everyone to understand and be able to act on data. It turned out that the best way to do that was through storytelling. The world is waking up to this. It’s something we used to do — our ancestors sat around the campfire swapping stories about the hunt, or where the best potatoes are to forage for. That’s a thing we used to do, it’s a thing that kids do all the time — they’re bringing other kids into their world — and what’s happening is that a lot of that has been beaten out of us as adults. Because of the way the workforce is going, the way automation is going, we’re heading back to the importance of those soft skills, those storytelling skills.

How is storytelling using data more effective at presenting data than typical dashboards and reports?

Anna Schena WalshAnna Schena Walsh

Anna Schena Walsh: The brain is hard-wired for stories. It’s hard-wired to take in information in that storytelling arc, which is what is [attracting our attention] — what is something we thought we knew, what is something new that surprised us, and what can we do about it? If you can put that in a way that is interesting to people in a way they can understand, that is a way people will remember. That is what really motivates people, and that’s what actually causes people to take action. I think visuals are important parts of some stories, whether it be a chart or a picture, it can help drive stories home, but no matter what you’re doing to give people information, the end is usually the story. It’s verbal, it’s literate, it’s explaining something in some way. In reality, we do this a lot, but we need to be a lot more systematic about focusing on the story part.

What happens when you present an explanation with data?

Nichols: If someone sends you a bar chart and asks you to use it to make decisions and there’s no story with it at all, what your brain does is it makes up a story around it. Historically, what we’ve said is that computers are good at doing charts — we never did charts and graphs and spreadsheets because we thought they were helpful for people, we did them because that was what computers could do. We’ve forgotten that. So when we do these charts, people look at them and make up their own stories, and they may be more or less accurate depending on their intuition about the business. What we’re doing now is we want everyone to be really on the same story, hearing the same story, so by not having a hundred different people come up with a hundred different internal stories in their head, what we’re doing at Narrative Science is to try and make the story external so everyone is telling the same story.

So is it accurate to say that accuracy is a part of storytelling using data?

Schena Walsh: When I think of charts and graphs, interpreting those is a skill — it is a learned skill that comes to some people more naturally than others. In the past few decades there’s been this idea that everybody needs to be able interpret [data]. With storytelling, specifically data storytelling, it takes away the pressure of people interpreting the data for themselves. This allows people, where their skills may not be in that area … they don’t have to sit down and interpret dashboards. That’s not the best use of their talent, and data storytelling brings that information to them so they’re able to concentrate on what makes them great.

What’s the potential end result for organizations that employ data storytelling — what does it enable them to do that other organizations can’t?

With data storytelling there is a massive opportunity to have everybody in your company understand what’s happening and be able to make informed decisions much, much faster.
Anna Schena WalshDirector of growth marketing, Narrative Science

Schena Walsh: With data storytelling there is a massive opportunity to have everybody in your company understand what’s happening and be able to make informed decisions much, much faster. It’s not that information isn’t available — it certainly is — but it takes a certain set of skills to be able to find the meaning. So we look at it as empowering everybody because you’re giving them the information they need very quickly, and also giving them the ability to lean into what makes them great. The way we think about it is that if you can choose to have someone give a two-minute explanation of what’s going on in the business to everyone in the company everyday as they go into work, would you do it? And the answer is yes, and with data storytelling that’s what you can do.

I think what we’ll see as companies keep trying to move toward everyone needing to interpret data, I actually think there’s a lot of potential for burnout there in people who aren’t naturally inclined to do it. I also think there’s a speed element — it’s not as fast to have everybody learn this skill and have to do it every day themselves than to have the information serviced to them in a way they can understand.

Editor’s note: This interview has been edited for clarity and conciseness.

Go to Original Article
Author:

Data-driven storytelling makes data accessible

As organizations wrestle with an abundance of data and a dearth of experts in data interpretation, data-driven storytelling helps those organizations make sense of their information and drive their business forward.

Most business intelligence platforms help organizations transform their information into digestible data visualizations. Many, however, don’t give the data context — attempt to explain why sales dropped in a given month, or rose in another, for example.

Some BI vendors — Tableau and Yellowfin, for example — have added data-driven storytelling capabilities.

Narrative Science, a vendor based in Chicago and founded in 2010, meanwhile, is among a group of vendors whose sole focus is data storytelling, offering a suite of tools that give information context. Narrative Science recently introduced Lexio, a tool that turns data into digestible stories and is particularly suited for mobile devices.

 Nate Nichols, vice president of product architecture at Narrative Science, and Anna Schena Walsh, director of growth marketing at the company, co-authored a book on storytelling entitled Let Your People Be People. In the book, published Jan. 6, the authors look at 36 ways storytelling — with a particular emphasis on data-driven storytelling — can help change organizations, improving operations as well as helping employees not trained in data science use data to their advantage.

Nichols and Schena Walsh recently answered questions about data-driven storytelling. Here, in Part I of a two-part Q&A, they discuss the importance of data-driven storytelling. In Part II, they delve into how data-driven storytelling can improve an organization’s operations.

In a business sense, what is storytelling?

Anna Schena WalshAnna Schena Walsh

Anna Schena Walsh: When we think of storytelling at Narrative Science, we spend a lot of time here thinking about what makes a good story because our software is data storytelling, so we’re going from to data to story. What we realized is that the arc of a good story, when it comes to software, also applies to people. No matter where you sit in a business you are a storyteller, whether you are a salesperson, a marketer, an engineer, and at some point in your career you need to be able to tell a good story. Whether it’s to advocate for yourself, to sell a product or other various different ways, that’s an essential skill for everyone to do precisely and to do it well.

Honing in more narrowly on business intelligence and analytics, how do you define data-driven storytelling?

Nate NicholsNate Nichols

Nate Nichols: It’s what Anna said about storytelling and applying it to data. The real shift here is that there’s been this idea that getting to the right number, or doing some analysis and looking at a number or a chart, was sufficient, and that was where the process stopped. You got a number and then it’s an executive’s job to figure out what to go do with that, or someone else’s job to figure out what to go and do with those numbers. I think what our customers are looking at and what the world is waking up to is that the right answer is just the beginning of the problem. You have the right answer, but then the real work is the communication, and that’s the piece — the storytelling part — that can actually change the world and bring people along with you.

No movement was ever led by just stating an answer and then everyone realizing that was right and joining up of their own accord. It’s really telling the story, giving it the cause and effect, the context, the why.
Nate NicholsVice president of product architecture, Narrative Science

No movement was ever led by just stating an answer and then everyone realizing that was right and joining up of their own accord. It’s really telling the story, giving it the cause and effect, the context, the why. With all of the data and analysis that’s out there, you need to still actually do the work of mobilizing it.

How does data-driven storytelling manifest itself — how do you take the information and turn it into a story?

Nichols: One of the key components is using language, so when our system is writing stories it starts with a question from the user. They want to know how their sales pipeline is, how [operations] were last quarter. There are a lot of systems that can answer that — our system can answer that and tell you how many deals were made, but then it goes into a storytelling mode where it gives a reader the context, why this is happening or what else is happening around this — that context becomes really important. It’s cause and effect, and knowing why things are happening becomes super important.

It starts with an answer, and then brings in all those storytelling elements to express things in a way that makes sense to a person. A computer is good at saying, ‘Sales increased 22 percent week over week,’ but a human would say, ‘Sales are doing great,’ or, ‘Sales jumped a lot, sales shot up.’ It may be less numerically precise, but it’s a lot more intuitive and works with our brains better. Our system is adding on that layer, bringing in the context, bringing in the characters, and then doing a lot of work to put that in a single story that someone can sit down and read and has a beginning and an end.

Your book looks at different ways storytelling can be used by businesses — what are some of them?

Schena Walsh: The book looks at 36 different ways you can use storytelling. One is how to tell a better story, and then how to create a storytelling environment, and at then at the end how to use data storytelling to enable you to realize your talent. Here at Narrative Science we have software that surfaces the story and brings the data we need to us, which allows the employees here not to be spending time looking through analytics, and then also gives them the data points they need to tell their own stories as well. So we actually spend a lot of time here training our people to tell their stories. Nate actually leads a storytelling workshop here at Narrative Science, and a lot of the elements of what we teach our employees and our clients is … in the book.

Why do businesses need to improve their data-driven storytelling abilities — what does it enable them to do that they might not otherwise be able to?

Schena Walsh: One big trend I’ve seen is companies leaning into what was previously referred to as soft skills. As you see a lot more automation of tasks happening, these skills have become more and more important. For us, we truly believe that storytelling unlocks incredible potential for companies. We know we need to spend a lot of time with data, we need that information, and data storytelling allows us to be able to able to tell stories about ourselves, about our companies, about our jobs really well and really precisely.

Editor’s note: This interview has been edited for clarity and conciseness.

Go to Original Article
Author:

AtScale’s Adaptive Analytics 2020.1 a big step for vendor

With data virtualization for analytics at scale a central tenet, AtScale’s Adaptive Analytics 2020.1 platform was unveiled on Wednesday.

The release marks a significant step for AtScale, which specializes in data engineering by serving as a conduit between BI tools and stored data. Not only is it a rebranding of the vendor’s platform — its most recent update was called AtScale 2019.2 and was rolled out in July 2019 — but it also marks a leap in its capabilities.

Previously, as AtScale — based in San Mateo, Calif., and founded in 2013 — built up its capabilities its focus was on how to get big data to work for analytics, said co-founder and vice president of technology David Mariani. And while AtScale did that, it left the data where it was stored and queried one source at a time.

With AtScale’s Adaptive Analytics 2020.1 — available in general release immediately — users can query multiple sources simultaneously and get their response almost instantaneously due to augmented intelligence and machine learning capabilities. In addition, based on their query, their data will be autonomously engineered.

“This is not just an everyday release for us,” Mariani said. “This one is different. With our arrival in the data virtualization space we’re going to disrupt and show its true potential.”

Dave Menninger, analyst at Ventana Research, said that Adaptive Analytics 2020.1 indeed marks a significant step for AtScale.

“This is a major upgrade to the AtScale architecture which introduces the autonomous data engineering capabilities,” he said. “[CTO] Matt Baird and team have completely re-engineered the product to incorporate data virtualization and machine learning to make it easier and faster to combine and analyze data at scale. In some ways you could say they’ve lived up to their name now.”

This is not just an everyday release for us. This one is different. With our arrival in the data virtualization space we’re going to disrupt and show its true potential.
David MarianiCo-founder and vice president of technology, AtScale

AtScale has also completely re-engineered its platform, abandoning its roots in Hadoop, to serve both customers who store their data in the cloud and those who keep their data on premises.

“It’s not really about where the AtScale technology runs,” Menninger said. “Rather, they make it easy to work with cloud-based data sources as well as on premises data sources. This is a big change from their Hadoop-based, on-premises roots.”

AtScale’s Adaptive Analytics 2020.1 includes three main features: Multi-Source Intelligent Data Model, Self-Optimizing Query Acceleration Structures and Virtual Cube Catalog.

Multi-Source Intelligent Data Model is a tool that enables users to create logical data models through an intuitive process. It simplifies data modeling by rapidly assembling the data needed for queries, and then maintains its acceleration structures even as workloads increase.

Self-Optimizing Query Acceleration Structures, meanwhile, allow users to add information to their queries without having to re-aggregate the data over and over.

An organization's internet sales data is displayed on a sample AtScale dashboard.
A sample AtScale dashboard shows an organization’s internet sales data.

And Virtual Cube Catalog is a means of speeding up discoverability with lineage and metadata search capabilities that integrate natively into existing data catalogs. This enables business users and data scientists to locate needed information for whatever their needs may be, according to AtScale.

“The self-optimizing query acceleration provides a key part of the autonomous capabilities,” Menninger said. “Performance tuning big data queries can be difficult and time-consuming. However, it’s the combination of the three capabilities which really makes AtScale stand out.”

Other vendors are attempting to offer similar capabilities, but AtScale’s Adaptive Analytics 2020.1 packages them in a unique way, he added.

“There are competitors offering data virtualization and competitors offering cube-based data models, but AtScale is unique in the way they combine these capabilities with the automated query acceleration,” Menninger said.

Beyond offering a platform that enables data virtualization at scale, speed and efficiency are other key tenets of AtScale’s update, Mariani said. “Data virtualization can now be used to improve complexity and cost,” Mariani said.

Go to Original Article
Author:

SAP Data Hub opens predictive possibilities at Paul Hartmann

Organizations have access to more data than they’ve ever had, and the number of data sources and volume of data just keeps growing.

But how do companies deal with all the data and can they derive real business use from it? Paul Hartmann AG, a medical supply company, is trying to answer those questions by using SAP Data Hub to integrate data from different sources and use the data to improve supply chain operations. The technology is part of the company’s push toward a data-based digital transformation, where some existing processes are digitized and new analytics-based models are being developed.

The early results have been promising, said Sinanudin Omerhodzic, Paul Hartmann’s CIO and chief data officer.

Paul Hartmann is a 200-year-old firm in Heidenheim, Germany that supplies medical and personal hygiene products to customers such as hospitals, nursing homes, pharmacies and retail outlets. The main product groups include wound management, incontinence management and infection management.

Paul Hartmann is active in 35 countries and turns over around $2.2 billion in sales a year. Omerhodzic described the company as a pioneer in digitizing its supply chain operations, running SAP ERP systems for 40 years. However, changes in the healthcare industry have led to questions about how to use technology to address new challenges.

For example, an aging population increases demand for certain medical products and services, as people live longer and consume more products than before.

One prime area for digitization was in Paul Hartmann’s supply chain, as hospitals demand lower costs to order and receive medical products. Around 60% of Paul Hartmann’s orders are still handled by email, phone calls or fax, which means that per-order costs are high, so the company wanted to begin to automate these processes to reduce costs, Omerhodzic said.

One method was to install boxes stocked with products and equipped with sensors in hospital warehouses that automatically re-order products when stock reaches certain levels. This process reduced costs by not requiring any human intervention on the customer side. Paul Hartmann installed 9,000 replenishment boxes in about 100 hospitals in Spain, which proved adept at replacing stock when needed. But it then began to consider the next step: how to predict with greater accuracy what products will be needed when and where to further reduce the wait time on restocking supplies.  

Getting predictive needs new data sources

This new level of supply chain predictive analytics requires accessing and analyzing vast amounts of data from a variety of new sources, Omerhodzic said. For example, weather data could show that a storm may hit a particular area, which could result in more accidents, leading hospitals to stock more bandages in preparation. Data from social media sources that refer to health events such as flu epidemics could lead to calculations on the number of people who could get sick in particular regions and the number of products needed to fight the infections.

“All those external data sources — the population data, weather data, the epidemic data — combined with our sales history data, allow us to predict and forecast for the future how many products will be required in the hospitals and for all our customers,” Omerhodzic said.

Paul Hartmann worked with SAP to implement a predictive system based on SAP Data Hub, a software service that enables organizations to orchestrate data from different sources without having to extract the data from the source. AI and machine learning are used to analyze the data, including the entire history of the company’s sales data, and after just a few months of the pilot project was making better predictions than the sales staff, Omerhodzic said.

“We have 200 years selling our product, so the sales force has a huge wealth of information and experience, but the new system could predict even better than they could,” he said. “This was a huge wake up for us and we said we need to learn more about our data, we need to pull more data inside and see how that could improve or maybe create new business models. So we are now in the process of implementing that.”

Innovation on the edge less disruptive

The use of SAP Data Hub as an innovation center is one example of how SAP can foster digital transformation without directly changing core ERP systems, said Joshua Greenbaum, principal analyst at Enterprise Applications Consulting. This can result in new processes that aren’t as costly or disruptive as a major ERP upgrade.

Joshua GreenbaumJoshua Greenbaum

“Eventually this touches your ERP because you’re going to be making and distributing more bandages, but you can build the innovation layer without it being directly inside the ERP system,” Greenbaum said. “When I discuss digital transformation with companies, the easy wins don’t start with the statement, ‘Let’s replace our ERP system.’ That’s the road to complexity and high costs — although, ultimately, that may have to happen.”

For most organizations, Greenbaum said, change management — not technology — is still the biggest challenge of any digital transformation effort.

Change management challenges

At Paul Hartmann, change management has been a pain point. The company is addressing the technical issues of the SAP Data Hub initiative through education and training programs that enhance IT skills, Omerhodzic said, but getting the company to work with data is another matter.

“The biggest change in our organization is to think more from the data perspective side and the projects that we have today,” he said. “To have this mindset and understanding of what can be done with the data requires a completely different approach and different skills in the business and IT. We are still in the process of learning and establishing the appropriate organization.”

Although the sales organization at Paul Hartmann may feel threatened by the predictive abilities of the new system, change is inevitable and affects the entire organization, and the change must be managed from the top, according to Omerhodzic.

“Whenever you have a change there’s always fear from all people that are affected by it,” he said. “We will still need our sales force in the future — but maybe to sell customer solutions, not the products. You have to explain it to people and you have to explain to them where their future could be.”

Go to Original Article
Author:

Cloudian CEO: AI, IoT drive demand for edge storage

AI and IoT is driving demand for edge storage as data is being created faster than it can be reasonably moved across clouds, object storage vendor Cloudian’s CEO said.

Cloudian CEO Michael Tso said “Cloud 2.0” is giving rise to the growing importance of edge storage among other storage trends. He said customers are getting smarter about how they use the cloud, and that’s leading to growing demand for products that can support private and hybrid clouds. He also detects an increased demand for resiliency against ransomware attacks.

We spoke with Tso about these trends, including the Edgematrix subsidiary Cloudian launched in September 2019 that focuses on AI use cases at the edge. Tso said we can expect more demand for edge storage and spoke about an upcoming Cloudian product related to this. He also talked about how AI relates to object storage, and if Cloudian is preparing other Edgematrix-like spinoffs.

What do you think storage customers are most concerned with now?
Michael Tso: I think there is a lot, but I’ll just concentrate on two things here. One is that they continue to just need lower-cost, easier to manage and highly scalable solutions. That’s why people are shifting to cloud and looking at either public or hybrid/private.

Related to that point is I think we’re seeing a Cloud 2.0, where a lot of companies now realize the public cloud is not the be-all, end-all and it’s not going to solve all their problems. They look at a combination of cloud-native technologies and use the different tools available wisely.

I think there’s the broad brush of people needing scalable solutions and lower costs — and that will probably always be there — but the undertone is people getting smarter about private and hybrid.

Point number two is around data protection. We’re now seeing more and more customers worried about ransomware. They’re keeping backups for longer and longer and there is a strong need for write-once compliant storage. They want to be assured that any ransomware that is attacking the system cannot go back in time and mess up the data that was stored from before.

Cloudian actually invested very heavily in building write-once compliant technologies, primarily for financial and the military market because that was where we were seeing it first. Now it’s become a feature that almost everyone we talked to that is doing data protection is asking for.

People are getting smarter about hybrid and multi-cloud, but what’s the next big hurdle to implementing it?

Tso: I think as people are now thinking about a post-cloud world, one of the problems that large enterprises are coming up with is data migration. It’s not easy to add another cloud when you’re fully in one. I think if there’s any kind of innovation in being able to off-load a lot of data between clouds, that will really free up that marketplace and allow it to be more efficient and fluid.

Right now, cloud is a bunch of silos. Whatever data people have stored in cloud one is kind of what they’re stuck with, because it will take them a lot of money to move data out to cloud two, and it’s going to take them years. So, they’re kind of building strategies around that as opposed to really, truly being flexible in terms of where they keep data.

What are you seeing on the edge?

Tso: We’re continuing to see more and more data being created at the edge, and more and more use cases of the data needing to be stored close to the edge because it’s just too big to move. One classic use case is IoT. Sensors, cameras — that sort of stuff. We already have a number of large customers in the area and we’re continuing to grow in that area.

The edge can mean a lot of different things. Unfortunately, a lot of people are starting to hijack that word and make it mean whatever they want it to mean. But what we see is just more and more data popping up in all kinds of locations, with the need of having low-cost, scalable and hybrid-capable storage.

We’re working on getting a ruggedized, easy-to-deploy cloud storage solution. What we learned from Edgematrix was that there’s a lot of value to having a ruggedized edge AI device. But the unit we’re working on is going to be more like a shipping container or a truck as opposed to a little box like with Edgematrix.

What customers would need a mobile cloud storage device like you just described?

Tso: There are two distinct use cases here. One is that you want a cloud on the go, meaning it is self-contained. It means if the rest of the infrastructure around you has been destroyed, or your internet connectivity has been destroyed, you are still able to do everything you could do with the cloud. The intention is a completely isolatable cloud.

In the military application, it’s very straightforward. You always want to make sure that if the enemy is attacking your communication lines and shooting down satellites, wherever you are in the field, you need to have the same capability that you have during peak time.

But the civilian market, especially in global disaster, is another area that we are seeing demand. It’s state and local governments asking for it. In the event of a major disaster, oftentimes for a period, they don’t have any access to the internet. So the idea is to run in a cloud in a ruggedized unit that is completely stand-alone until connectivity is restored.

AI-focused Edgematrix started as a Cloudian idea. What does AI have to do with object storage?
Tso: AI is an infinite data consumer. Improvements on AI accuracy is a log scale — it’s an exponential scale in terms of the amount of data that you need for the additional improvements in accuracy. So, a lot of the reasons why people are accumulating all this data is to run their AI tools and run AI analysis. It’s part of the reason why people are keeping all their data.

Being S3 object store compatible is a really big deal because that allows us to plug into all of the modern AI workloads. They’re all built on top of cloud-native infrastructure, and what Cloudian provides is the ability to run those workloads wherever the data happens to be stored, and not have to move the data to another location.

Are you planning other Edgematrix-like spinoffs?
Tso: Not in the immediate future. We’re extremely pleased with the way Edgematrix worked out, and we certainly are open to do more of this kind of spin off.

We’re not a small company anymore, and one of the hardest things for startups in our growth stage is balancing creativity and innovation with growing the core business. We seem to have found a good sort of balance, but it’s not something that we want to do in volume because it’s a lot of work.

Go to Original Article
Author: