The NoSQL engine sometimes takes a back seat to Spark and Hadoop analytics in discussions of big data. But independent Hadoop distribution provider MapR Technologies is placing its MapR-DB NoSQL database at the center of updates to its multimodel Converged Data Platform, as disclosed this week at the Strata Data Conference in New York.
New in MapR-DB are solid-state-drive-optimized secondary indexes that natively handle several data types. Also, indexing is enhanced for in-place integrations with the Drill interactive SQL query engine that Map-R fields for those asking questions of Hadoop and NoSQL stores.
These and other new traits — an updated API supporting JSON grammar and NoSQL integration with a cross-data-center change data capture scheme — are part of MapR’s flagship platform update. They are intended to meet needs of users scaling out varied stores to deal with more and more data. They are also meant to speed processing and to move analytics into an organization’s real-time workflow.
Xactly speeding querying
For Xactly, a sales performance management company in San Jose, Calif., with software running on the cloud and in data centers, the enhancements to MapR-DB help to move analytics closer to real time and deeper into operational workflow.
At Xactly, the MapR-DB engine processes JSON data that, together with Spark and other Hadoop-oriented components, creates views for analytics of salespeople’s activity, according to Ron Rasmussen, CTO, product officer and senior vice president of engineering.
“As fast as MapR-DB is, we’d like to see it go faster,” he said, adding that his early view of the secondary indexing now part of MapR-DB shows a significant speed increase of querying capabilities. That is important to developers building interactive querying support into Xactly’s services.
A NoSQL engine like MapR-DB is oriented toward handling streaming data and processing in parallel, so it is useful within an enterprise, according to Mike Matchett, an analyst and consultant at Taneja Group in Hopkinton, Mass.
“The point is to get the streaming data in the hands of people that can use it,” he said. He also marked features like MapR-DB’s new secondary indexing as key in many applications. “Indexing, and speed of indexing, is very important,” he said.
Matchett pegged Apache Cassandra and CouchbaseDB among competitors to MapR-DB.
Big data maturation
“What is good about the MapR platform is its scalability and reliability,” Rasmussen said. He noted Xactly presently uses MapR-DB as a distributed data store in a 15-node cluster configuration.
MapR’s Jack Norris, senior vice president for data and applications, said the use of NoSQL databases in big data workflows is part of the general speed-up of business activity in the age of the web, and artificial intelligence techniques such as machine learning for prediction are part of this, too.
“People are interested in the ability to do in-place machine learning, handling real-time data ingestions and outflows, and getting answers in seconds,” he said.
For Xactly’s Rasmussen, NoSQL improvements and other steps represent greater maturation of big data technologies, which continue to add features across a spectrum of data types and data-processing approaches. The maturation is both on the vendor and user side, he indicated. “Three years ago, we had challenges. It all was new,” he said “Now, we have three-plus years’ experience as a team. Our people are trained.”
Cisco has added to Spark the content protection, compliance and security features the collaboration service needs to attract highly regulated organizations, such as healthcare providers, government agencies and financial institutions.
The improvements, introduced this week, include content protection in the Cisco Spark app for mobile devices, legal team access to all documents and messages, and the option of on-premises deployment of the Spark key server, which handles decryption and encryption of all data flowing in the service.
Cisco is not the only collaboration vendor to add features attractive to organizations that need the highest levels of security and content control. Slack moved in that direction this year by providing support for third-party mobility management and data loss prevention products. Symphony Communication Services has always focused on regulated industries with a secure messaging app used by 80% of global investment banks.
The latest Spark enhancements correct weaknesses that hampered adoption by organizations watched closely by regulators.
“Not having these kinds of controls has slowed implementation of Spark, especially in larger and regulated organizations,” said Irwin Lazar, an analyst at Nemertes Research, based in Mokena, Ill. “In these kinds of companies, we’ve found a reluctance to embrace a cloud-based messaging solution that doesn’t provide end-to-end encryption, enterprise mobility management integration, and the ability for an organization to control its own keys.”
Spark’s key server ensures all content is encrypted and cannot be read by Cisco or anyone else unless authorized by the organization using the service. To satisfy the most security-conscious organizations, Cisco introduced the option of letting them hold the key server on premises rather than in the vendor’s cloud.
Cisco Spark app Control Hub enhancements
Along with in-house key management, Cisco added control features to the Spark management console, called the Control Hub. Through the platform, administrators can identify individuals who get access to all documents and messages in Spark. That level of access is necessary for lawyers and compliance officers.
Besides the new Control Hub features, Cisco opened up the console to third-party security systems through the release of what it calls the Pro Pack. The software, which costs extra, lets organizations integrate third-party compliance and archiving, data loss prevention and identity management systems.
Cisco also beefed up security in the Cisco Spark app that runs on smartphones and tablets. Features include automatically logging off users when they leave the corporate network and adding a method called certificate pinning that prevents man-in-the-middle attacks. Also, managers can set the app to deny access to mobile users that fail to set their devices’ PIN lock after three warnings.
Finally, Cisco made improvements to the Spark analytics engine. Users can more easily manipulate data to determine, for example, whether any users are experiencing poor call quality and whether the problem is affecting others. The better analytics are also available in WebEx, the company’s video conferencing and file-sharing software.
Following just over six months of public preview and a continuous flow of user-driven enhancements from our rapidly growing community, I’m excited to announce the new Power BI will be generally available on July 24. But the news doesn’t stop there – we’re making four big announcements today:
- The Power BI business analytics service (“Power BI 2.0”) will exit preview status, becoming a generally available service on July 24. This supersedes the Power BI for Office 365 service (“Power BI 1.0”), which we will continue to operate during a transition period as users migrate to the updated service.
- Power BI Desktop, (known before today as Power BI Designer), will also exit preview and become generally available on July 24. We’ve renamed the product to better reflect the power and intent of this world-class business analytics tool.
- Today we contributed the Power BI visualization framework and its complete library of visuals to the open source community under an MIT license – source code and more is available today on GitHub.
- T. K. “Ranga” Rengarajan announced a new data service today, Apache Spark for HDInsight, our Hadoop service on Azure. Spark leverages main memory in contrast to the two-stage, disk-based MapReduce paradigm often used with data stored in Hadoop. Query performance over a Hadoop dataset can be 100 times faster with Spark. Last night we updated the Power BI service adding direct support for this new service enabling users to monitor, explore and visualize their HDInsight data in Power BI.
Throughout the Power BI preview period we’ve sustained a blistering pace of innovation. We update the Power BI service every week, adding new features, capabilities and at least one new third-party content pack – plus Power BI Desktop is updated each month.
The extensive list of new features and capabilities delivered over the six month preview period provides some indication of the scale of our investment in Power BI. It is evidence of our commitment to relentless execution to delivering value to our users with unmatched speed and agility.
Though our innovation pace has been high, it is more than sustainable and it is reasonable to expect acceleration. A meaningful percentage of our energy over the last year has been directed toward laying a solid foundation for a global-scale service that can support over a billion users – our aspiration. Laying that foundation was an exit criterion for declaring Power BI generally available. With the foundation in place, more energy will now flow to user-visible product and service enhancements.
Power BI and the disruptive nature of SaaS
We hung a banner in our building when we began work on the new Power BI service that says “5 seconds to sign up. 5 minutes to wow!” As a cloud-hosted, business intelligence and analytics service (“SaaS”), Power BI permits business users to directly connect with and gain insight from their business data.
This is game changing.
In the past, business intelligence solutions started with the installation and maintenance of servers and software. Then, BI professionals would build corporate BI solutions and business analysts would employ “self-service” BI tools to explore and analyze data. These would be published as interactive and operational reports, spreadsheets, dashboards and other analytical artifacts. They would (finally!) hold the potential to deliver value to the ultimate consumers – the business end users who make better business choices based on insights gleaned. That is not five minutes to wow.
This complex flow was mostly inevitable. Business data was locked up in applications and database systems controlled by IT, so any solution had to start with them. This is changing. Increasingly, and I would argue inevitably, business data is contained in SaaS services – Microsoft Dynamics, Salesforce, Workday, Marketo, MailChimp, Google Analytics, Zendesk and countless others. With their login credentials, business users have direct access to this data. The growth of SaaS has itself made it possible to offer business intelligence and analytics software, as a service.
We believe Power BI is, by a very wide margin, the most powerful business analytics SaaS service. And yet even the most non-technical of business users can sign up in five seconds, and gain insights from their business data in less than five minutes with no assistance, from anyone.
While even some legacy business intelligence solutions provided “connectors” for SaaS services, Power BI takes it a few steps further by providing complete, “out of the box” content packs for Power BI for these services. When a Power BI user connects to their Google Analytics data, for example, they get a curated collection of dashboards and reports that continuously update with the latest data from the user’s Google Analytics account. We are effectively acting as a team of IT, BI and business analysis professionals on behalf of the user.
The result is that more people can connect with and gain insight from their data, faster and more simply than ever before. What used to take days, weeks, months or even years of complex coordinated work now happens quickly.
Dozens of services have committed to delivering content packs for Power BI. Sixteen are available in Power BI today, and we are adding at least one new one every single week, steadily increasing the likelihood that any new user will find services they depend on from which they can begin gaining business insights.
Sounds good, but the proof is in the pudding – are users actually showing up, signing up and engaging with the service? In that regard, Power BI has exceeded even our most starry-eyed predictions. Over half a million unique users, across more than 45,000 companies spanning 185 countries have signed up for Power BI in the six months it’s been in preview. That would be a good decade worth of growth for a successful, last-generation business analytics software company. That’s the power of SaaS.
And we’re just getting started.
Power BI sets the standard for modern business intelligence
Beyond the inherent benefits of a SaaS-based approach, Power BI sets a new standard for “modern” business intelligence with a collection of capabilities completely unique in the industry:
- Real-time dashboards and support for streaming data. Data is increasingly streaming from everything. With Power BI you can monitor streaming data sources via live, continuously updating visuals. Maintain an up-to-the-moment pulse on your business. Legacy business intelligence solutions are largely retrospective – and of course Power BI also provides rich and deep analysis of historical data like other BI solutions. But you wouldn’t want to drive your car looking only at your rearview mirror. You shouldn’t drive your business that way either.
- Keep your data where it is. Though business data is increasingly contained in SaaS business applications, there is and probably always will be important data contained in applications and databases “on premises.” Power BI doesn’t require an organization to move or copy data to the cloud to benefit from Power BI. It’s a have your cake and eat it too kind of capability. Whether your data is on-premises or in the cloud, stored in a relational database or Hadoop, or resident in a business application that’s custom, packaged or SaaS, Power BI can send queries to the data versus pulling data to itself. You can do that too – and Power BI provides world-class extract, transform and load (ETL) capabilities. In some cases this may be preferred (as when the source database query performance is too slow for interactive data analysis). The point is, with Power BI you have a choice.
- Cross-platform, native mobile apps. In addition to its rich Web client, there are native Windows, iPhone, iPad and Android apps that keep mobile users connected with their data, wherever they may be. As hard as it is to believe, other mainstream BI vendors still lack native mobile apps.
- Natural language interface. Benefiting from years of groundbreaking natural language interaction work at Microsoft Research, Power BI permits users to simply ask questions of their data. As you type your question, visualizations appear and are successively refined as you continue to type. This capability, perhaps more than any other, has empowered end users to self-serve their business analytics needs. No need to learn complex query languages or master some gesture-based exploration tool. Just ask questions.
- Completely open, but an integrated part of Microsoft’s comprehensive data platform. Across the Microsoft data platform there is bi-directional awareness of and integration with Power BI. For example, when a user creates a new Azure Data Warehouse, or a new Azure Stream Analytics solution, there is a one-click option to make that data available in Power BI. Conversely, if you have an existing HDInsight dataset fronted by Spark, or a tabular data model in SQL Server Analysis Services, there is a one-click option in Power BI to connect back to those sources of data. This makes it easy to compose very powerful data solutions and provides “one throat to choke” accountability – if the solution doesn’t work, you don’t have many vendors pointing fingers at each other. We don’t limit our support only to Microsoft solutions, but we make it easy to go that route if you desire.
Power BI Desktop
The Power BI service allows business users to sign up for a business analytics service, and to get value without requiring support from a BI professional or business analyst.
But as users have signed up for Power BI they’ve asked “how can I get my own data into Power BI?” and “how can I combine my Salesforce, Marketo and Google Analytics data into a set of reports and dashboards in Power BI?”
This is where Power BI Desktop comes in. Power BI Desktop (formerly known, until today, as Power BI Designer) is a powerful analytical tool for use by business analysts – with it analysts can connect to wide range of data sources; cleanse, transform and model interrelationships between the data; explore the resulting mashed up datasets and create beautiful, interactive reports. With a single click, these reports can be published to the Power BI service for use by end users.
On July 24th Power BI Desktop will exit preview and become generally available. Power BI Desktop is a free tool, downloadable from powerbi.com. While other vendors charge thousands of dollars per seat for this class of software, we prefer to make it widely and freely available ultimately resulting in far more content in Power BI, which leads to a larger and more engaged end user community.
Open sourcing the Power BI visualization stack and visuals
Today we contributed the source code of our visualization framework and the complete collection of native visuals supported by Power BI and Power BI Desktop to the open source community under the permissive MIT license. This is not a point in time snapshot, or some set of samples. This is the actual code base we are enhancing and using inside Power BI and Power BI Desktop – the project is available today on GitHub.
Want to build and have us add a new base visualization type to Power BI? Do it and send us a pull request. Want to enhance the behavior of the underlying visualization system? Ditto.
We made this move for many reasons, but perhaps the most important being that it paves the way for completely custom visualizations in Power BI and Power BI Desktop. We’ll soon add the ability to upload completely custom visuals to your Power BI tenant, or to attach them to a report. We hope to see a growing collection of visuals that can be used with Power BI.
Visualize big data with Power BI and Spark for Azure HDInsight
We are also announcing our support of Apache Spark by making it a fully managed service in Azure HDInsight, available in public preview today.
Spark leverages main memory in contrast to the two-stage, disk-based MapReduce paradigm often used with data stored in Hadoop. Query performance over a Hadoop dataset can be 100 times faster with Spark.
Since both services are powered by the cloud, you can deploy a Spark cluster in Azure HDInsight and visualize the backing Hadoop data in Power BI within minutes without investing in hardware or complex integration.
The complete Microsoft data platform
Microsoft continues to make it easier for customers to maximize their data dividends with our data platform and services. It’s never been easier to capture, transform, mash-up, analyze and visualize any data, of any size, at any scale, in its native format using familiar tools, languages and frameworks in a trusted environment on-premises and in the cloud.
If you haven’t joined the hundreds of thousands of Power BI users, you can sign up at PowerBI.com with any business email account to try it for free today. To try the public preview of Spark for Azure HDInsight, go to the Spark page and sign up.