Tag Archives: analytics

Databricks bolsters security for data analytics tool

Some of the biggest challenges with data management and analytics efforts is security.

Databricks, based in San Francisco, is well aware of the data security challenge, and recently updated its Databricks’ Unified Analytics Platform with enhanced security controls to help organizations minimize their data analytics attack surface and reduce risks. Alongside the security enhancements, new administration and automation capabilities make the platform easier to deploy and use, according to the company.

Organizations are embracing cloud-based analytics for the promise of elastic scalability, supporting more end users, and improving data availability, said Mike Leone, a senior analyst at Enterprise Strategy Group. That said, greater scale, more end users and different cloud environments create myriad challenges, with security being one of them, Leone said.

“Our research shows that security is the top disadvantage or drawback to cloud-based analytics today. This is cited by 40% of organizations,” Leone said. “It’s not only smart of Databricks to focus on security, but it’s warranted.”

He added that Databricks is extending foundational security in each environment with consistency across environments and the vendor is making it easy to proactively simplify administration.

As organizations turn to the cloud to enable more end users to access more data, they’re finding that security is fundamentally different across cloud providers.
Mike LeoneSenior analyst, Enterprise Strategy Group

“As organizations turn to the cloud to enable more end users to access more data, they’re finding that security is fundamentally different across cloud providers,” Leone said. “That means it’s more important than ever to ensure security consistency, maintain compliance and provide transparency and control across environments.”

Additionally, Leone said that with its new update, Databricks provides intelligent automation to enable faster ramp-up times and improve productivity across the machine learning lifecycle for all involved personas, including IT, developers, data engineers and data scientists.

Gartner said in its February 2020 Magic Quadrant for Data Science and Machine Learning Platforms that Databricks Unified Analytics Platform has had a relatively low barrier to entry for users with coding backgrounds, but cautioned that “adoption is harder for business analysts and emerging citizen data scientists.”

Bringing Active Directory policies to cloud data management

Data access security is handled differently on-premises compared with how it needs to be handled at scale in the cloud, according to David Meyer, senior vice president of product management at Databricks.

Meyer said the new updates to Databricks enable organizations to more efficiently use their on-premises access control systems, like Microsoft Active Directory, with Databricks in the cloud. A member of an Active Directory group becomes a member of the same policy group with the Databricks platform. Databricks then maps the right policies into the cloud provider as a native cloud identity.

Databricks uses the open source Apache Spark project as a foundational component and provides more capabilities, said Vinay Wagh, director of product at Databricks.

“The idea is, you, as the user, get into our platform, we know who you are, what you can do and what data you’re allowed to touch,” Wagh said. “Then we combine that with our orchestration around how Spark should scale, based on the code you’ve written, and put that into a simple construct.”

Protecting personally identifiable information

Beyond just securing access to data, there is also a need for many organizations to comply with privacy and regulatory compliance policies to protect personally identifiable information (PII).

“In a lot of cases, what we see is customers ingesting terabytes and petabytes of data into the data lake,” Wagh said. “As part of that ingestion, they remove all of the PII data that they can, which is not necessary for analyzing, by either anonymizing or tokenizing data before it lands in the data lake.”

In some cases, though, there is still PII that can get into a data lake. For those cases, Databricks enables administrators to perform queries to selectively identify potential PII data records.

Improving automation and data management at scale

Another key set of enhancements in the Databricks platform update are for automation and data management.

Meyer explained that historically, each of Databricks’ customers had basically one workspace in which they put all their users. That model doesn’t really let organizations isolate different users, however, and has different settings and environments for various groups.

To that end, Databricks now enables customers to have multiple workspaces to better manage and provide capabilities to different groups within the same organization. Going a step further, Databricks now also provides automation for the configuration and management of workspaces.

Delta Lake momentum grows

Looking forward, the most active area within Databricks is with the company’s Delta Lake and data lake efforts.

Delta Lake is an open source project started by Databrick and now hosted at the Linux Foundation. The core goal of the project is to enable an open standard around data lake connectivity.

“Almost every big data platform now has a connector to Delta Lake, and just like Spark is a standard, we’re seeing Delta Lake become a standard and we’re putting a lot of energy into making that happen,” Meyer said.

Other data analytics platforms ranked similarly by Gartner include Alteryx, SAS, Tibco Software, Dataiku and IBM. Databricks’ security features appear to be a differentiator.

Go to Original Article
Author:

New AI tools in the works for ThoughtSpot analytics platform

The ThoughtSpot analytics platform only has been available for six years, but since 2014 the vendor has quickly gained a reputation as an innovator in the field of business intelligence software.

ThoughtSpot, founded in 2012 and based in Sunnyvale, Calif., was an early adopter of augmented intelligence and machine learning capabilities, and even as other BI vendors have begun to infuse their products with AI and machine learning, the ThoughtSpot analytics platform has continued to push the pace of innovation.

With its rapid rise, ThoughtSpot attracted plenty of funding, and an initial public offering seemed like the next logical step.

Now, however, ThoughtSpot is facing the same uncertainty as most enterprises as COVID-19 threatens not only people’s health around the world, but also organizations’ ability to effectively go about their business.

In a recent interview, ThoughtSpot CEO Sudheesh Nair discussed all things ThoughtSpot, from the way the coronavirus is affecting the company to the status of an IPO.

In part one of a two-part Q&A, Nair talked about how COVID-19 has changed the firm’s corporate culture in a short time. Here in part two, he discusses upcoming plans for the ThoughtSpot analytics platform and when the vendor might be ready to go public.

One of the main reasons the ThoughtSpot analytics platform has been able to garner respect in a short time is its innovation, particularly with respect to augmented intelligence and machine learning. Along those lines, what is a recent feature ThoughtSpot developed that stands out to you?

ThoughtSpot CEO Sudheesh NairSudheesh Nair

Sudheesh Nair: One of the main changes that is happening in the world of data right now is that the source of data is moving to the cloud. To deliver the AI-based, high-speed innovation on data, ThoughtSpot was really counting on running the data in a high-speed memory database, which is why ThoughtSpot was mostly focused on on-premises customers. One of the major changes that happened in the last year is that delivered what we call Embrace. With Embrace we are able to move to the cloud and leave the data in place. This is critical because as data is moving, the cost of running computations will get higher because computing is very expensive in the cloud.

With ThoughtSpot, what we have done is we are able to deliver this on platforms like Snowflake, Amazon Redshift, Google BigQuery and Microsoft Synapse. So now with all four major cloud vendors fully supported, we have the capability to serve all of our customers and leave all of their data in place. This reduces the cost to operate ThoughtSpot — the value we deliver — and the return on investment will be higher. That’s one major change.

Looking ahead, what are some additions to the ThoughtSpot analytics platform customers can expect?

Nair: If you ask people who know ThoughtSpot — and I know there are a lot of people who don’t know ThoughtSpot, and that’s OK — … if you ask them what we do they will say, ‘search and AI.’ It’s important that we continue to augment on that; however, one thing that we’ve found is that in the modern world we don’t want search to be the first thing that you do. What if search became the second thing you do, and the first thing is that what you’ve been looking for comes to you even before you ask?

What if search became the second thing you do, and the first thing is that what you’ve been looking for comes to you even before you ask?
Sudheesh NairCEO, ThoughtSpot

Let’s say you’re responsible for sales in Boston, and you told the system you’re interested in figuring out sales in Boston — that’s all you did. Now the system understands what it means to you, and then runs multiple models and comes back to you with questions you’ll be interested in, and most importantly with insights it thinks you need to know — it doesn’t send a bunch of notifications that you never read. We want to make sure that the insights we’re sending to you are so relevant and so appropriate that every single one adds value. If one of them doesn’t add value, we want to know so the system can understand what it was that was not valuable and then adjust its algorithms internally. We believe that the right action and insight should be in front of you, and then search can be the second thing you do prompted by the insight we sent to you.

What tools will be part of the ThoughtSpot analytics platform to deliver these kinds of insights?

Nair: There are two features we are delivering around it. One is called Feed, which is inspired by our social media curating insights, and conversations and opinions around facts. Right now social media is all opinion, but imagine a fact-driven social media experience where someone says they had a bad a quarter and someone else says it was great and then data shows up so it doesn’t become an opinion based on another opinion. It’s important that it should be tethered to facts. The second one is Monitor, which is the primary feature where the thing you were looking for shows up even before you ask in the format that you like — could be mobile, could be notifications, could be an image.

Those two features are critical innovations for our growth, and we are very focused on delivering them this year.

The last time we spoke we talked about the possibility of ThoughtSpot going public, and you were pretty open in saying that’s something you foresee. It’s about seven months later, where do plans for going public currently stand?

Nair: If you had asked me before COVID-19 I would have had a bit of a different answer, but the big picture hasn’t changed. I still firmly believe that a company like ThoughtSpot will tremendously benefit from going public because our customers are massive customers, and those customers like to spend more with a public company and the trust that comes with it.

Having said that, I talked last time about building a team and predictability, and I feel seven months later that we have built the executive team that can be the best in class when it comes to public companies. But going public also requires being predictable, and we’re getting in that right spot. I think that the next two quarters will be somewhat fluid, which will maybe set us back when it comes to building a plan to take the company public. But that is basically it. I think taken one by one, we have a good product market, we have good business momentum, we have a good team, and we just need to put together the history that is necessary so that the business is predictable and an investor can appreciate it. That’s what we’re focused on. There might be a short-term setback because of what the coronavirus might throw at us, but it’s going to definitely be a couple of more quarters of work.

Does the decline in the stock market related to COVID-19 play into your plans at all?

Nair: It’s absolutely an important event that’s going on and no one knows how it will play out, but when I think about a company’s future I never think about an IPO as a few quarters event. It’s something we want to do, and a couple of quarters here or there is not going to make a major difference. Over the last couple of weeks, we haven’t seen any softness in the demand for ThoughtSpot, but we know that a lot of our customers’ pipelines are in danger from supply impacts from China, so we will wait and see. We need to be very close to our customers right now, helping them through the process, and in that process we will learn and make the necessary course corrections.

Editor’s note: This interview has been edited for clarity and conciseness.

Go to Original Article
Author:

Natural language query tools offer answers within limits

Natural language query has the potential to put analytics in the hands of ordinary business users with no training in data science, but the technology still has a ways to go before it develops into a truly transformational tool.

Natural language query (NLQ) is the capacity to query data by simply asking a question in ordinary language rather than code, either spoken or typed. Ideally, natural language query will empower a business user to do deep analytics without having to code.

That ideal, however, doesn’t exist.

In its current form, natural language query allows someone working at a Ford dealership to ask, “How many blue Mustangs were sold in 2019?” and follow up with, “How many red Mustangs were sold in 2019?” to compare the two.

It allows someone in a clothing store to ask, “What’s the November forecast for sales of winter coats?”

It is not, however, advanced enough to pull together unstructured data sitting in a warehouse, and it’s not advanced enough to do complicated queries and analysis.

Natural language query has the potential to democratize data throughout an organization.
Natural language query enables business users to explore data without having to know code.

“We’ve had voice search, albeit in a limited capacity, for years now,” said Mike Leone, senior analyst at Enterprise Strategy Group (ESG), an IT analysis and research firm in Milford, Mass. “We’re just hitting the point where natural language processing can be effectively used to query data, but we’re not even close to utilizing natural language query for complex querying that traditionally would require extensive back-end work and data science team involvement.”

Similarly, Tony Baer, founder and CEO of the database and analytics advisory firm DBInsight, said that natural language query is not at the point where it allows for deep analysis without the involvement of data scientists.

“You can’t go into a given tool or database and ask any random question,” he said. “It still has to be linked to some structure. We’re not at the point where it’s like talking to a human and the brain can process it. Where we are is that, given guardrails, given some structure to the data and syntax, it’s an alternative to structure a query in a specific way.”

NLQ benefits

At its most basic level, business intelligence improves the decision-making process. And the more people within an organization able to do data-driven analysis, the more informed the decision-making process, not merely at the top of an organization but throughout its workforce.

We’re just hitting the point where natural language processing can be effectively used to query data, but we’re not even close to utilizing natural language query for complex querying.
Mike LeoneSenior analyst, Enterprise Strategy Group

Meanwhile, natural language query doesn’t require significant expertise. It doesn’t force a user to write copious amount of code to come up with an answer to what might be a relatively simple analytical question. It frees business users from having to request the help of data science teams — at least for basic queries. It opens analytics to more users within an organization.

“Good NLQ will help BI power users and untrained business users alike get to insights more quickly, but it’s the business users who need the most help and have the most to gain,” said Doug Henschen, principal analyst at Constellation Research. “These users don’t know how to code SQL and many aren’t even unfamiliar with query constructs such as ‘show me X’ and ‘by Y time period’ and when to ask for pie charts versus bar charts versus line charts.”

“Think of all the people who want to run a report but aren’t able to do so,” echoed Jen Underwood, founder and principal consultant at Impact Analytix, an IT consulting firm in Tampa, Fla. “There’s some true beauty to the search. How many more people would be able to use it because they couldn’t do SQL? It’s simple, and it opens up the ability to do more things.”

In essence, natural language query and other low-code/no-code tools help improve data literacy, and increasing data literacy is a significant push for many organizations.

That said, in its current form it has limits.

“Extending that type of functionality to the business will enable a new demographic of folks to interact with data in a way that is comfortable to them,” Leone said. “But don’t expect a data revolution just because someone can use Alexa to see how many people bought socks on a Tuesday.”

The limitations

Perhaps the biggest hindrance to full-fledged natural language query is the nature of language itself.

Without even delving into the fact that there are more than 5,000 languages worldwide and an estimated 200 to 400 alphabets, individual languages are complicated. There are words that are spelled the same but have different meanings, others that are spelled differently but sound the same, and words that bear no visual or auditory relation to each other but are synonyms.

And within the business world, there are often terms that might mean one thing to one organization and be used differently by another.

Natural language query tools don’t actually understand the spoken or written word. They understand specific code and are programmed to translate a spoken or written query to SQL, and then translate the response from SQL back into the spoken or written word.

“Natural language query has trouble with things like synonyms, and domain-specific terminology — the context is missing,” Underwood said. “You still need humans for synonyms and the terminology a company might have because different companies have different meanings for different words.”

When natural language queries are spoken, accents can cause problems. And whether spoken or written, the slightest misinterpretation by the tool can result in either a useless response or, much worse, something incorrect.

“Accuracy is king when it comes to querying,” ESG’s Leone said. “All it takes is a minor misinterpretation of a voice request to yield an incorrect result.”

Over the next few years, he said, people will come to rely on natural language query to quickly ask a basic question on their devices, but not much more.

“Don’t expect NLQ to replace data science teams,” Leone said. “If anything, NLQ will serve as a way to quickly return a result that could then be used as a launching pad for more complex queries and expert analysis.”

While held back now by the limitations of language, that won’t always be the case. The tools will get more sophisticated, and aided by machine learning, will come to understand a user’s patterns to better comprehend just what they’re asking.

“Most of what’s standing in the way is a lack of experience,” DBInsight’s Baer said. “It’s still early on. Natural language query today is far advanced from where it was two years ago, but there’s still a lot of improvement to be made. I think that improvement will be incremental; machine learning will help.”

Top NLQ tools

Though limited in capability, natural language query tools do save business users significant time when asking basic questions of structured data. And some vendors’ natural language query tools are better than others.

Though one of the top BI vendors following its acquisition of Hyperion in 2007, Oracle lost momentum when data visualizations changed the consumption of analytics. Now that augmented intelligence and machine learning are central tenets of BI, however, Oracle is again pushing the technological capabilities of BI platforms. Oracle Analytics Cloud and Day by Day support voice-based queries and its natural language query works in 28 languages, which Henschen said is the broadest language support available.

“Oracle raised the bar on natural language query a couple of years ago when it released its Day By Day app, which used device-native voice-to-text and introduced explicit thumbs-up/thumbs-down training,” Henschen said.

Another vendor Henschen noted is Qlik, which advanced the natural language capabilities of its platform through its January 2019 acquisition of Crunch Data.

“A key asset was the CrunchBot, since rebranded as the Qlik Insight Bot,” Henschen said.

He added that Qlik Insight Bot is a bot-building feature that works with existing Qlik applications, and the bots can subsequently be embedded in third-party applications, including Salesforce, Slack, Skype and Microsoft Teams.

“It brings NLQ outside of the confines of Qlik Sense and interaction with a BI system,” Henschen said.

Tableau is yet another vendor attempting to ease the analytics process with a natural language processing tool. They introduced Ask Data in February 2019 and Tableau’s September 2019 update included the capability to embed Ask Data in other applications.

“When I think about designing a system and taking it the next step forward, Tableau is doing something. [It remembers if someone ran a similar query] and it gives guidance,” Underwood said. “It has the information and knows what people are asking, and it can surface recommendations.”

Baer similarly mentioned Tableau’s Ask Data, while Leone said that the eventual prevalence of natural language query will ultimately be driven by the Amazon Web Services, Google and Microsoft.

Go to Original Article
Author:

Splice Machine 3.0 integrates machine learning capabilities, database

Databases have long been used for transactional and analytics use cases, but they also have practical utility to help enable machine learning capabilities. After all, machine learning is all about deriving insights from data, which is often stored inside a database.

San Francisco-based database vendor Splice Machine is taking an integrated approach to enabling machine learning with its eponymous database. Splice Machine is a distributed SQL relational database management system that includes machine learning capabilities as part of the overall platform.

Splice Machine 3.0 became generally available on March 3, bringing with it updated machine learning capabilities. It also has new Kubernetes cloud native-based model for cloud deployment and enhanced replication features.

In this Q&A, Monte Zweben, co-founder and CEO of Splice Machine, discusses the intersection of machine learning and databases and provides insight into the big changes that have occurred in the data landscape in recent years.

How do you integrate machine learning capabilities with a database?

Monte ZwebenMonte Zweben

Monte Zweben: The data platform itself has tables, rows and schema. The machine learning manager that we have native to the database has notebooks for developing models, Python for manipulating the data, algorithms that allow you to model and model workflow management that allows you to track the metadata on models as they go through their experimentation process. And finally we have in-database deployment.

So as an example, imagine a data scientist working in Splice Machine working in the insurance industry. They have an application for claims processing and they are building out models inside Splice Machine to predict claims fraud. There’s a function in Splice Machine called deploy, and what it will do is take a table and a model to generate database code. The deploy function builds a trigger on the database table that tells the table to call a stored procedure that has the model in it for every new record that comes in the table.

So what does this mean in plain English? Let’s say in the claims table, every time new claims would come in, the system would automatically trigger, grab those claims, run the model that predicts claim cause and outputs those predictions in another table. And now all of a sudden, you have real-time, in-the-moment machine learning that is detecting claim fraud on first notice of loss.

What does distributed SQL mean to you?

Zweben: So at its heart, it’s about sharing data across multiple nodes. That provides you the ability to parallelize computation and gain elastic scalability. That is the most important distributed attribute of Splice Machine.

In our new 3.0 release, we just added distributed replication. It’s another element of distribution where you have secondary Splice Machine instances in geo-replicated areas, to handle failover for disaster recovery.

What’s new in Splice Machine 3.0?

Zweben: We moved our cloud stack for Splice Machines from an old Mesos architecture to Kubernetes. Now our container-based architecture is all Kubernetes, and that has given us the opportunity to enable the separation of storage and compute. You literally can pause Splice Machine clusters and turn them back on. This is a great utility for consumption based usage of databases.

Along with our upgrade to Kubernetes, we also upgraded our machine learning manager from an older notebook technology called Zeppelin to a newer notebook technology that has really gained momentum in the marketplace, as much as Kubernetes has in the DevOps world. Jupyter notebooks have taken off in the data science space.

We’ve also enhanced our workflow management tool called mlflow, which is an open source tool that originated with Databricks and we’re part of that community. Mlflow allows data scientists to track their experiments and has that record of metadata available for governance.

What’s your view on open source and the risk of a big cloud vendor cannibalizing open source database technology?

Zweben: We do compose many different open source projects into a seamless and highly performant integration. Our secret sauce is how we put these things together at a very low level, with transactional integrity, to enable a single integrated system. This composition that we put together is open source, so that all of the pieces of our data platform are available in our open source repository, and people can see the source code right now.

I’m intensely worried about cloud cannibalization. I switched to an AGPL license specifically to protect against cannibalization by cloud vendors.

On the other hand, we believe we’re moving up the stack. If you look at our machine learning package, and how it’s so inextricably linked with the database, and the reference applications that we have in different segments, we’re going to be delivering more and more higher-level application functionality.

What are some of the biggest changes you’ve seen in the data landscape over the seven years you’ve been running Splice Machine?

Zweben: With the first generation of big data, it was all about data lakes, and let’s just get all the data the company has into one repository. Unfortunately, that has proven time and time again, at company after company, to just be data swamps.

Data repositories work, they’re scalable, but they don’t have anyone using the data, and this was a mistake for several reasons.

Instead of thinking about storing the data, companies should think about how to use the data.
Monte ZwebenCo-founder and CEO, Splice Machine

Instead of thinking about storing the data, companies should think about how to use the data. Start with the application and how you are going to make the application leverage new data sources.

The second reason why this was a mistake was organizationally, because the data scientists who know AI were all centralized in one data science group, away from the application. They are not the subject matter experts for the application.

When you focus on the application and retrofit the application to make it smart and inject AI, you can get a multidisciplinary team. You have app developers, architects, subject-matter experts, data engineers and data scientists, all working together on one purpose. That is a radically more effective and productive organizational structure for modernizing applications with AI.

Go to Original Article
Author:

SMBs struggle with data utilization, analytics

While analytics have become a staple of large enterprises, many small and medium-sized businesses struggle to utilize data for growth.

Large corporations can afford to hire teams of data scientists and provide business intelligence software to employees throughout their organizations. While many SMBs collect data that could lead to better decision-making and growth, data utilization is a challenge when there isn’t enough cash in the IT budget to invest in the right people and tools.

Sensing that SMBs struggle to use data, Onepath, an IT services vendor based in Kennesaw, Ga., conducted a survey of more than 100 businesses with 100 to 500 employees to gauge their analytics capabilities for the “Onepath 2020 Trends in SMB Data Analytics Report.”

Among the most glaring discoveries, the survey revealed that 86% of the companies that invested in personnel and analytics surveyed felt they weren’t able to fully exploit their data.

Phil Moore, Onepath’s director of applications management services, recently discussed both the findings of the survey and the challenges SMBs face when trying to incorporate analytics into their decision-making process.

In Part II of this Q&A, he talks about what failure to utilize data could ultimately mean for SMBs.

What was Onepath’s motivation for conducting the survey about SMBs and their data utilization efforts?

Phil MoorePhil Moore

Phil Moore: For me, the key finding was that we had a premise, a hypothesis, and this survey helped us validate our thesis. Our thesis is that analytics has always been a deep pockets game — people want it, but it’s out of reach financially. That’s talking about the proverbial $50,000 to $200,000 analytics project… Our goal and our mission is to bring that analytics down to the SMB market. We just had to prove our thesis, and this survey proves that thesis.

It tells us that clients want it — they know about analytics and they want it.

What were some of the key findings of the survey?

Moore: Fifty-nine percent said that if they don’t have analytics, it’s going to take them longer to go to market. Fifty-six percent said it will take them longer to service their clients without analytics capabilities. Fifty-four percent, a little over half, said if they didn’t have analytics, or when they don’t have analytics, they run the risk of making a harmful business decision.

We have people trying analytics — 67% are spending $10,000 a year or more, and 75% spent at least 132 hours of labor maintaining their systems — but they’re not getting what they need.
Phil MooreDirector of applications management services, Onepath

That tells us people want it… We have people trying analytics — 67% are spending $10,000 a year or more, and 75% spent at least 132 hours of labor maintaining their systems — but they’re not getting what they need. A full 86 % said they’re underachieving when they’re taking a swing with their analytics solution.

What are the key resources these businesses lack in order to fully utilize data? Is it strictly financial or are there other things as well?

Moore: We weren’t surprised, but what we hadn’t thought about is that the SMB market just doesn’t have the in-house skills. One in five said they just don’t have the people in the company to create the systems.

Might new technologies help SMBs eventually exploit data to its full extent?

Moore: The technologies have emerged and have matured, and one of the biggest things in the technology arena that helps bring the price down, or make it more available, is simply moving to the cloud. An on-premises analytics solution requires hardware, and it’s just an expensive footprint to get off the ground. But with Microsoft and their Azure Cloud and their Office 365, or their Azure Synapse Analytics offering, people can actually get to the technology at a far cheaper price point.

That one technology right there makes it far more affordable for the SMB market.

What about things like low-code/no-code platforms, natural language query, embedded analytics — will those play a role in helping SMBs improve data utilization for growth?

Moore: In the SMB market, they’re aware of things like machine learning, but they’re closer to the core blocking and tackling of looking at [key performance indicators], looking at cash dashboards so they know how much cash they have in the bank, looking at their service dashboard and finding the clients they’re ignoring.

The first and easiest one that’s going to apply to SMBs is low-code/no-code, particularly in grabbing their source data, transforming it and making it available for analytics. Prior to low-code/no-code, it’s really a high-code alternative, and that’s where it takes an army of programmers and all they’re doing is moving data — the data pipeline.

But there will be a set of the SMB market that goes after some of the other technologies like machine learning — we’ve seen some people be really excited about it. One example was looking at [IT help] tickets that are being worked in the service industry and comparing it with customer satisfaction. What they were measuring was ticket staleness, how many tickets their service team were ignoring, and as they were getting stale, their clients would be getting angry for lack of service. With machine learning, they were able to find that if they ignored a printer ticket for two weeks, that is far different than ignoring an email problem for two weeks. Ignoring an email problem for two days leads to a horrible customer satisfaction score. Machine learning goes in and relates that stuff, and that’s very powerful. The small and medium-sized business market will get there, but they’re starting at earlier and more basic steps.

Editor’s note: This Q&A has been edited for brevity and clarity.

Go to Original Article
Author:

Biometrics firm fights monitoring overload with log analytics

Log analytics tools with machine learning capabilities have helped one biometrics startup keep pace with increasingly complex application monitoring as it embraces continuous deployment and microservices.

BioCatch sought a new log analytics tool in late 2017. At the time, the Tel Aviv, Israel, firm employed a handful of workers and had just refactored a monolithic Windows application into microservices written in Python. The refactored app, which captures biometric data on how end users interact with web and mobile interfaces for fraud detection, required careful monitoring to ensure it still worked properly. Almost immediately after it completed the refactoring, BioCatch found the process had tripled the number of logs it shipped to a self-managed Elasticsearch repository.

“In the beginning, we had almost nothing,” said Tamir Amram, operations group lead for BioCatch, of the company’s early logging habits. “And, then, we started [having to ship] everything.”

The team found it could no longer manage its own Elasticsearch back end as that log data grew. Its IT infrastructure also mushroomed into 10 Kubernetes clusters distributed globally on Microsoft Azure. Each cluster hosts multiple sets of 20 microservices that provide multi-tenant security for each of its customers.

At that point, BioCatch had a bigger problem. It had to not only collect, but also analyze all its log data to determine the root cause of application issues. This became too complex to do manually. BioCatch turned to log analytics vendor Coralogix as a potential answer to the problem.

Log analytics tools flourish under microservices

Coralogix, founded in 2015, initially built its log management system on top of a hosted Elasticsearch service but couldn’t generate enough interest from customers.

“It did not go well,” Coralogix CEO Ariel Assaraf recalled of those early years for the business. “It was early in log analytics’ and log management’s appeal to the mainstream, and customers already had ‘good enough’ solutions.”

While the company still hosts Elasticsearch for its customers, based on the Amazon Open Distro for Elasticsearch, it refocused on log analytics, developed machine learning algorithms and monitoring dashboards, and relaunched in 2017.

That year coincided with the emergence of containers and microservices in enterprise IT shops as they sought to refactor monolithic applications with new design patterns. The timing proved fortuitous; since the Coralogix’s relaunch in 2017, it has gained more than 1,200 paying customers, according to Assaraf, at an average deal size of $50,000 a year.

Coralogix isn’t alone among DevOps monitoring vendors reaping the spoils of demand for microservices monitoring tools — not just in log analytics, but AI- and machine learning-driven infrastructure management, or AIOps, as well. These include application performance management (APM) vendors, such as New Relic, Datadog, AppDynamics and Dynatrace, along with Coralogix log analytics competitors Elastic Inc. and Splunk.

We were able to delegate log management to the support team, so the DevOps team wasn’t the only one owning and using logs.
Tamir AmramOperations group lead, BioCatch

In fact, analyst firm 451 Research predicted that the market for Kubernetes monitoring tools will dwarf the market for Kubernetes management products by 2022 as IT pros move from the initial phases of deploying microservices into “day two” management problems. Even more recently, log analytics tools have begun to play an increasing role in IT security operations and DevSecOps.

The newly relaunched Coralogix caught the eye of BioCatch in part because of its partnership with the firm’s preferred cloud vendor, Microsoft Azure. It was also easy to set up and redirect logs from the firm’s existing Elasticsearch instance, and the Coralogix-managed Elasticsearch service eliminated log management overhead for the BioCatch team.

“We were able to delegate log management to the support team, so the DevOps team wasn’t the only one owning and using logs,” Amram said. “Now, more than half of the company works with Coralogix, and more than 80% of those who work with it use it on a daily basis.”

Log analytics correlate app changes to errors

The BioCatch DevOps team adds tags to each application update that direct log data into Coralogix. Then, the software monitors application releases as they’re rolled out in a canary model for multiple tiers of customers. BioCatch rolls out its first application updates to what it calls “ring zero,” a group of early adopters; next, to “ring one;” and so on, according to each customer group’s appetite for risk. All those changes to multiple tiers and groups of microservices result in an average of 1.5 TB of logs shipped per day.

The version tags fed through the CI/CD pipeline to Coralogix enable the tool to identify issues and correlate them with application changes made by BioCatch developers. It also identifies anomalous patterns in infrastructure behavior post-release, which can catch problems that don’t appear immediately.

Coralogix log analytics
Coralogix log analytics uses version tags to correlate application issues with specific developer changes.

“Every so often, an issue will appear a day later because we usually release at off-peak times,” BioCatch’s Amram said. “For example, it can say, ‘sending items to this queue is 20 times slower than usual,’ which shows the developer why the queue is filling up too quickly and saturating the system.”

BioCatch uses Coralogix alongside APM tools from Datadog that analyze application telemetry and metrics. Often, alerts in Datadog prompt BioCatch IT ops pros to consult Coralogix log analytics dashboards. Datadog also began offering log analytics in 2018 but didn’t include this feature when BioCatch first began talks with Coralogix.

Coralogix also maintains its place at BioCatch because its interfaces are easy to work with for all members of the IT team, Amram said. This has grown to include not only developers and IT ops, but solutions engineers who use the tool to demonstrate to prospective customers how the firm does troubleshooting to maintain its service-level agreements.

“We don’t have to search in Kibana [Elasticsearch’s visualization layer] and say, ‘give me all the errors,'” Amram said. “Coralogix recognizes patterns, and if the pattern breaks, we get an alert and can immediately react.”

Go to Original Article
Author:

Sigma analytics platform’s interface simplifies queries

In desperate need of data dexterity, Volta Charging turned to the Sigma analytics platform to improve its business intelligence capabilities and ultimately help fuel its growth.

Volta, based in San Francisco and founded in 2010, is a provider of electric vehicle charging stations, and three years ago, when Mia Oppelstrup started at Volta, the company faced a significant problem.

Because there aren’t dedicated charging stations the same way there are dedicated gas stations, Volta has to negotiate with organizations — mostly retail businesses — for parking spots where Volta can place its charging stations.

Naturally, Volta wants its charging stations placed in the parking spots with the best locations near the business they serve. But before an organization gives Volta those spots, Volta has to show that it makes economic sense, that by putting electric car charging stations closest to the door it will help boost customer traffic through the door.

That takes data. It takes proof.

Volta, however, was struggling with its data. It had the necessary information, but finding the data and then putting it in a digestible form was painstakingly slow. Queries had to be submitted to engineers, and those engineers then had to write code to transform the data before delivering a report.

Any slight change required an entirely new query, which involved more coding, time and labor for the engineers.

But then the Sigma analytics platform transformed Volta’s BI capabilities, Volta executives said.

Curiosity isn’t enough to justify engineering time, but curiosity is a way to get new insights. By working with Sigma and doing queries on my own I’m able to find new metrics.
Mia OppelstrupBusiness intelligence manager, Volta Charging

“If I had to ask an engineer every time I had a question, I couldn’t justify all the time it would take unless I knew I’d be getting an available answer,” said Oppelstrup, who began in marketing at Volta and now is the company’s business intelligence manager. “Curiosity isn’t enough to justify engineering time, but curiosity is a way to get new insights. By working with Sigma and doing queries on my own I’m able to find new metrics.”

Metrics, Oppelstrup added, that she’d never be able to find on her own.

“It’s huge for someone like me who never wrote code,” Oppelstrup said. “It would otherwise be like searching a warehouse with a forklift while blindfolded. You get stuck when you have to wait for an engineer.”

Volta looked at other BI platforms — Tableau and Microsoft’s Power BI, in particular — but just under two years ago chose Sigma and has forged ahead with the platform from the 2014 startup.

The product

Sigma Computing was founded by the trio of Jason Frantz, Mike Speiser and Rob Woollen.

Based in San Francisco, the vendor has gone through three rounds of financing and to date raised $58 million, most recently attracting $30 million in November 2019.

When Sigma was founded, and ideas for the Sigma analytics platform first developed, it was in response to what the founders viewed as a lack of access to data.

“Gartner reported that 60 to 73 percent of data is going unused and that only 30 percent of employees use BI tools,” Woollen, Sigma’s CEO, said. “I came back to that — BI was stuck with a small number of users and data was just sitting there, so my mission was to solve that problem and correct all this.”

Woollen, who previously worked at Salesforce and Sutter Hill Ventures — a main investor in Sigma — and his co-founders set out to make data more accessible. They set out to design a BI platform that could be used by ordinary business users — citizen data scientists — without having to rely so much on engineers, and one that respond quickly no matter the queries users ask of it.

Sigma launched the Sigma analytics platform in November 2018.

Like other BI platforms, Sigma — entirely based in the cloud — connects to a user’s cloud data warehouse in order to access the user’s data. Unlike most BI platforms, however, the Sigma analytics platform is a low-code BI tool that doesn’t require engineering expertise to sift through the data, pull the data relevant to a given query and present it in a digestible form.

A key element of that is the Sigma analytics platform’s user interface, which resembles a spreadsheet.

With SQL running in the background to automatically write the necessary code, users can simply make entries and notations in the spreadsheet and Sigma will run the query.

“The focus is always on expanding the audience, and 30 percent employee usage is the one that frustrates me,” Woollen said. “We’re focused on solving that problem and making BI more accessible to more people.”

The interface is key to that end.

“Products in the past focused on a simple interface,” Woollen said. “Our philosophy is that just because a businessperson isn’t technical that shouldn’t mean they can’t ask complicated questions.”

With the Sigma analytics platform’s spreadsheet interface, users can query their data, for example, to examine sales performance in a certain location, time or week. They can then tweak it to look at a different time, or a different week. They can then look at it on a monthly basis, compare it year over year, add and subtract fields and columns at will.

And rather than file a ticket to the IT department for each separate query, they can run the query themselves.

“The spreadsheet interface combines the power to ask any question of the data without having to write SQL or ask a programmer to do it,” Woollen said.

Giving end users power to explore data

Volta knew it had a data dexterity problem — an inability to truly explore its data given its reliance on engineers to run time- and labor-consuming queries — even before Oppelstrup arrived. The company was looking at different BI platforms to attempt to help, but most of the platforms Volta tried out still demanded engineering expertise, Oppelstrup said.

The outlier was the Sigma analytics platform.

“Within a day I was able to set up my own complex joins and answer questions by myself in a visual way,” Oppelstrup said. “I always felt intimidated by data, but Sigma felt like using a spreadsheet and Google Drive.”

One of the significant issues Volta faced before it adopted the Sigma analytics platform was the inability of its salespeople to show data when meeting with retail outlets and attempting to secure prime parking spaces for Volta’s charging stations.

Because of the difficulty accessing data, the salespeople didn’t have the numbers to prove that by placing charging stations near the door it would increase customer traffic.

With the platform’s querying capability, however, Oppelstrup and her team were able to make the discoveries that armed Volta’s salespeople with hard data rather than simply anecdotes.

They could now show a bank a surge in the use of charging stations near banks between 9 a.m. and 4 p.m., movie theaters a similar surge in the use just before the matinee and again before the evening feature, and grocery stores a surge near stores at lunchtime and after work.

They could also show that the charging stations were being used by actual customers, and not by random people charging up their vehicles and then leaving without also going into the bank, the movie theater or the grocery store.

“It’s changed how our sales team approaches its job — it used to just be about relationships, but now there’s data at every step,” Oppelstrup said.

Sigma enables Oppelstrup to give certain teams access to certain data, everyone access to other data, and importantly, easily redact data fields within a set that might otherwise prevent her from sharing information entirely, she said.

And that gets to the heart of Woollen’s intent when he helped start Sigma — enabling business users to work with more data and giving more people that ability to use BI tools.

“Access leads to collaboration,” he said.

Go to Original Article
Author:

Storytelling using data makes information easy to digest

Storytelling using data is helping make analytics digestible across entire organizations.

While the amount of available data has exploded in recent years, the ability to understand the meaning of the data hasn’t kept pace. There aren’t enough trained data scientists to meet demand, often leaving data interpretation in the hands of both line-of-business employees and high-level executives mostly guessing at the underlying meaning behind data points.

Storytelling using data, however, changes that.

A group of business intelligence software vendors are now specializing in data storytelling, producing platforms that go one step further than traditional BI platforms and attempt to give the data context by putting it in the form of a narrative.

One such vendor is Narrative Science, based in Chicago and founded in 2010. On Jan. 6, Narrative Science released a book entitled Let Your People Be People that delves into the importance of storytelling for businesses, with a particular focus on storytelling using data.

Recently, authors Nate Nichols, vice president of product architecture at Narrative Science, and Anna Schena Walsh, director of growth marketing, answered a series of questions about storytelling using data.

Here in Part II of a two-part Q&A they talk about why storytelling using data is a more effective way to interpret data than traditional BI, and how data storytelling can change the culture of an organization. In Part I, they discussed what data storytelling is and how data can be turned into a narrative that has meaning for an organization.

What does emphasis an on storytelling in the workplace look like, beyond a means of explaining the reasoning behind data points?

Nate NicholsNate Nichols

Nate Nichols: As an example of that, I’ve been more intentional since the New Year about applying storytelling to meetings I’ve led, and it’s been really helpful. It’s not like people are gathering around my knee as I launch into a 30-minute story, but just remembering to kick off a meeting with a 3-minute recap of why we’re here, where we’re coming from, what we worked on last week and what the things are that we need going forward. It’s really just putting more time into reminding people of why, the cause and effect, just helping people settle into the right mindset. Storytelling is an empirically effective way of doing it.

We didn’t start this company to be storytellers — we really wanted everyone to understand and be able to act on data. It turned out that the best way to do that was through storytelling. The world is waking up to this. It’s something we used to do — our ancestors sat around the campfire swapping stories about the hunt, or where the best potatoes are to forage for. That’s a thing we used to do, it’s a thing that kids do all the time — they’re bringing other kids into their world — and what’s happening is that a lot of that has been beaten out of us as adults. Because of the way the workforce is going, the way automation is going, we’re heading back to the importance of those soft skills, those storytelling skills.

How is storytelling using data more effective at presenting data than typical dashboards and reports?

Anna Schena WalshAnna Schena Walsh

Anna Schena Walsh: The brain is hard-wired for stories. It’s hard-wired to take in information in that storytelling arc, which is what is [attracting our attention] — what is something we thought we knew, what is something new that surprised us, and what can we do about it? If you can put that in a way that is interesting to people in a way they can understand, that is a way people will remember. That is what really motivates people, and that’s what actually causes people to take action. I think visuals are important parts of some stories, whether it be a chart or a picture, it can help drive stories home, but no matter what you’re doing to give people information, the end is usually the story. It’s verbal, it’s literate, it’s explaining something in some way. In reality, we do this a lot, but we need to be a lot more systematic about focusing on the story part.

What happens when you present an explanation with data?

Nichols: If someone sends you a bar chart and asks you to use it to make decisions and there’s no story with it at all, what your brain does is it makes up a story around it. Historically, what we’ve said is that computers are good at doing charts — we never did charts and graphs and spreadsheets because we thought they were helpful for people, we did them because that was what computers could do. We’ve forgotten that. So when we do these charts, people look at them and make up their own stories, and they may be more or less accurate depending on their intuition about the business. What we’re doing now is we want everyone to be really on the same story, hearing the same story, so by not having a hundred different people come up with a hundred different internal stories in their head, what we’re doing at Narrative Science is to try and make the story external so everyone is telling the same story.

So is it accurate to say that accuracy is a part of storytelling using data?

Schena Walsh: When I think of charts and graphs, interpreting those is a skill — it is a learned skill that comes to some people more naturally than others. In the past few decades there’s been this idea that everybody needs to be able interpret [data]. With storytelling, specifically data storytelling, it takes away the pressure of people interpreting the data for themselves. This allows people, where their skills may not be in that area … they don’t have to sit down and interpret dashboards. That’s not the best use of their talent, and data storytelling brings that information to them so they’re able to concentrate on what makes them great.

What’s the potential end result for organizations that employ data storytelling — what does it enable them to do that other organizations can’t?

With data storytelling there is a massive opportunity to have everybody in your company understand what’s happening and be able to make informed decisions much, much faster.
Anna Schena WalshDirector of growth marketing, Narrative Science

Schena Walsh: With data storytelling there is a massive opportunity to have everybody in your company understand what’s happening and be able to make informed decisions much, much faster. It’s not that information isn’t available — it certainly is — but it takes a certain set of skills to be able to find the meaning. So we look at it as empowering everybody because you’re giving them the information they need very quickly, and also giving them the ability to lean into what makes them great. The way we think about it is that if you can choose to have someone give a two-minute explanation of what’s going on in the business to everyone in the company everyday as they go into work, would you do it? And the answer is yes, and with data storytelling that’s what you can do.

I think what we’ll see as companies keep trying to move toward everyone needing to interpret data, I actually think there’s a lot of potential for burnout there in people who aren’t naturally inclined to do it. I also think there’s a speed element — it’s not as fast to have everybody learn this skill and have to do it every day themselves than to have the information serviced to them in a way they can understand.

Editor’s note: This interview has been edited for clarity and conciseness.

Go to Original Article
Author:

Citrix’s performance analytics service gets granular

Citrix introduced an analytics service to help IT professionals better identify the cause of slow application performance within its Virtual Apps and Desktops platform.

The company announced the general availability of the service, called Citrix Analytics for Performance, at its Citrix Summit, an event for the company’s business partners, in Orlando on Monday. The service carries an additional cost.

Steve Wilson, the company’s vice president of product for workspace ecosystem and analytics, said many IT admins must deal with performance problems as part of the nature of distributed applications. When they receive a call from workers complaining about performance, he said, it’s hard to determine the root cause — be it a capacity issue, a network problem or an issue with the employee’s device.

Performance, he said, is a frequent pain point for employees, especially remote and international workers.

“There are huge challenges that, from a performance perspective, are really hard to understand,” he said, adding that the tools available to IT professionals have not been ideal in identifying issues. “It’s all been very technical, very down in the weeds … it’s been hard to understand what [users] are seeing and how to make that actionable.”

Part of the problem, according to Wilson, is that traditional performance-measuring tools focus on server infrastructure. Keeping track of such metrics is important, he said, but they do not tell the whole story.

“Often, what [IT professionals] got was the aggregate view; it wasn’t personalized,” he said.

When the aggregate performance of the IT infrastructure is “good,” Wilson said, that could mean that half an organization’s users are seeing good performance, a quarter are seeing great performance, but a quarter are experiencing poor performance.

Steve Wilson, vice president of product for workspace ecosystem and analytics, CitrixSteve Wilson

With its performance analytics service, Citrix is offering a more granular picture of performance by providing metrics on individual employees, beyond those of the company as a whole. That measurement, which Citrix calls a user experience or UX score, evaluates such factors as an employee’s machine performance, user logon time, network latency and network stability.

“With this tool, as a system administrator, you can come in and see the entire population,” Wilson said. “It starts with the top-level experience score, but you can very quickly break that down [to personal performance].”

Wilson said IT admins who had tested the product said this information helped them address performance issues more expeditiously.

“The feedback we’ve gotten is that they’ve been able to very quickly get to root causes,” he said. “They’ve been able to drill down in a way that’s easy to understand.”

A proactive approach

Eric Klein, analyst, VDC Research GroupEric Klein

Eric Klein, analyst at VDC Research Group Inc., said the service represents a more proactive approach to performance problems, as opposed to identifying issues through remote access of an employee’s computer.

“If something starts to degrade from a performance perspective — like an app not behaving or slowing down — you can identify problems before users become frustrated,” he said.

Mark Bowker, senior analyst, Enterprise Strategy GroupMark Bowker

Klein said IT admins would likely welcome any tool that, like this one, could “give time back” to them.

“IT is always being asked to do more with less, though budgets have slowly been growing over the past few years,” he said. “[Administrators] are always looking for tools that will not only automate processes but save time.”

Enterprise Strategy Group senior analyst Mark Bowker said in a press release from Citrix announcing the news that companies must examine user experience to ensure they provide employees with secure and consistent access to needed applications.

IT is always being asked to do more with less.
Eric KleinAnalyst, VDC Research Group

“Key to providing this seamless experience is having continuous visibility into network systems and applications to quickly spot and mitigate issues before they affect productivity,” he said in the release.

Wilson said the performance analytics service was the product of Citrix’s push to the cloud during the past few years. One of the early benefits of that process, he said, has been in the analytics field; the company has been able to apply machine learning to the data it has garnered and derive insights from it.

“We do see a broad opportunity around analytics,” he said. “That’s something you’ll see more and more of from us.”

Go to Original Article
Author:

AtScale’s Adaptive Analytics 2020.1 a big step for vendor

With data virtualization for analytics at scale a central tenet, AtScale’s Adaptive Analytics 2020.1 platform was unveiled on Wednesday.

The release marks a significant step for AtScale, which specializes in data engineering by serving as a conduit between BI tools and stored data. Not only is it a rebranding of the vendor’s platform — its most recent update was called AtScale 2019.2 and was rolled out in July 2019 — but it also marks a leap in its capabilities.

Previously, as AtScale — based in San Mateo, Calif., and founded in 2013 — built up its capabilities its focus was on how to get big data to work for analytics, said co-founder and vice president of technology David Mariani. And while AtScale did that, it left the data where it was stored and queried one source at a time.

With AtScale’s Adaptive Analytics 2020.1 — available in general release immediately — users can query multiple sources simultaneously and get their response almost instantaneously due to augmented intelligence and machine learning capabilities. In addition, based on their query, their data will be autonomously engineered.

“This is not just an everyday release for us,” Mariani said. “This one is different. With our arrival in the data virtualization space we’re going to disrupt and show its true potential.”

Dave Menninger, analyst at Ventana Research, said that Adaptive Analytics 2020.1 indeed marks a significant step for AtScale.

“This is a major upgrade to the AtScale architecture which introduces the autonomous data engineering capabilities,” he said. “[CTO] Matt Baird and team have completely re-engineered the product to incorporate data virtualization and machine learning to make it easier and faster to combine and analyze data at scale. In some ways you could say they’ve lived up to their name now.”

This is not just an everyday release for us. This one is different. With our arrival in the data virtualization space we’re going to disrupt and show its true potential.
David MarianiCo-founder and vice president of technology, AtScale

AtScale has also completely re-engineered its platform, abandoning its roots in Hadoop, to serve both customers who store their data in the cloud and those who keep their data on premises.

“It’s not really about where the AtScale technology runs,” Menninger said. “Rather, they make it easy to work with cloud-based data sources as well as on premises data sources. This is a big change from their Hadoop-based, on-premises roots.”

AtScale’s Adaptive Analytics 2020.1 includes three main features: Multi-Source Intelligent Data Model, Self-Optimizing Query Acceleration Structures and Virtual Cube Catalog.

Multi-Source Intelligent Data Model is a tool that enables users to create logical data models through an intuitive process. It simplifies data modeling by rapidly assembling the data needed for queries, and then maintains its acceleration structures even as workloads increase.

Self-Optimizing Query Acceleration Structures, meanwhile, allow users to add information to their queries without having to re-aggregate the data over and over.

An organization's internet sales data is displayed on a sample AtScale dashboard.
A sample AtScale dashboard shows an organization’s internet sales data.

And Virtual Cube Catalog is a means of speeding up discoverability with lineage and metadata search capabilities that integrate natively into existing data catalogs. This enables business users and data scientists to locate needed information for whatever their needs may be, according to AtScale.

“The self-optimizing query acceleration provides a key part of the autonomous capabilities,” Menninger said. “Performance tuning big data queries can be difficult and time-consuming. However, it’s the combination of the three capabilities which really makes AtScale stand out.”

Other vendors are attempting to offer similar capabilities, but AtScale’s Adaptive Analytics 2020.1 packages them in a unique way, he added.

“There are competitors offering data virtualization and competitors offering cube-based data models, but AtScale is unique in the way they combine these capabilities with the automated query acceleration,” Menninger said.

Beyond offering a platform that enables data virtualization at scale, speed and efficiency are other key tenets of AtScale’s update, Mariani said. “Data virtualization can now be used to improve complexity and cost,” Mariani said.

Go to Original Article
Author: