Tag Archives: database

MariaDB X4 brings smart transactions to open source database

MariaDB has come a long way from its MySQL database roots. The open source database vendor released its new MariaDB X4 platform, providing users with “smart transactions” technology to enable both analytical and transactional databases.

MariaDB, based in Redwood City, Calif., was founded in 2009 by the original creator of MySQL, Monty Widenius, as a drop replacement for MySQL, after Widenius grew disillusioned with the direction that Oracle was taking the open source database.

Oracle acquired MySQL via its acquisition of Sun Microsystems in 2008. Now, in 2020, MariaDB still uses the core MySQL database protocol, but the MariaDB database has diverged significantly in other ways that are manifest in the X4 platform update.

The MariaDB X4 release, unveiled Jan. 14, puts the technology squarely in the cloud-native discussion, notably because MariaDB is allowing for specific workloads to be paired with specific storage types at the cloud level, said James Curtis, senior analyst of data, AI and analytics at 451 Research.

“There are a lot of changes that they implemented, including new and improved storage engines, but the thing that stands out are the architectural adjustments made that blend row and columnar storage at a much deeper level — a change likely to appeal to many customers,” Curtis said.

MariaDB X4 smart transactions converges database functions

The divergence with MySQL has ramped up over the past three years, said Shane Johnson, senior director of product marketing at MariaDB. In recent releases MariaDB has added Oracle database compatibility, which MySQL does not include, he noted.

In addition, MariaDB’s flagship platform provides a database firewall and dynamic data masking, both features designed to improve security and data privacy. The biggest difference today, though, between MariaDB and SQL is how MariaDB supports pluggable storage engines, which gain new functionality in the X4 update.

The thing that stands out are the architectural adjustments made that blend row and columnar storage at a much deeper level — a change likely to appeal to many customers.
James CurtisSenior analyst of data, AI and analytics, 451 Research

Previously when using the pluggable storage engine, users would deploy an instance of MariaDB for transactional use cases with the InnoDB storage engine and another instance with the ColumnStore columnar storage engine for analytics, Johnson explained.

In earlier releases, a Change Data Capture process synchronized those two databases. In the MariaDB X4 update, transactional and analytical features have been converged in an approach that MariaDB calls smart transactions.

“So, when you install MariaDB, you get all the existing storage engines, as well as ColumnStore, allowing you to mix and match to use row and columnar data to do transactions and analytics, very simply, and very easily,” Johnson said.

MariaDB X4 aligns cloud storage

Another new capability in MariaDB X4 is the ability to more efficiently use cloud storage back ends.

“Each of the storage mediums is optimized for a different workload,” Johnson said.

For example, Johnson noted that Amazon Web Service’s S3, is a good fit for analytics, because of its high-availability and capacity. He added that for transactional applications with row-based storage, Amazon Elastic Block Storage (EBS) is a better fit. The ability to mix and match both EBS and S3 in the MariaDB X4 platform makes it easier for user to consolidate both analytics and transactional workload in the database.

“The update for X4 is not so much that you can run MariaDB in the cloud, because you’ve always been able to do that, but rather that you can run it with smart transactions and have it optimized for cloud storage services,” Johnson said.

MariaDB database as a service (DBaaS) is coming

MariaDB said it plans to expand its portfolio further this year.

The core MariaDB open source community project is currently at version 10.4, with plans for version 10.5, which will include the smart transactions capabilities, to debut sometime in the coming weeks, according to MariaDB.

The new smart transaction capabilities have already landed in the MariaDB Enterprise 10.4 update. The MariaDB Enterprise Server has more configuration settings and hardening for enterprise use cases.

The full MariaDB X4 platform goes a step further with the MariaDB MaxScale database proxy, which provides automatic failover, transaction replay and a database firewall, as well as utilities that developers need to build database applications.

Johnson noted that traditionally new features tend to land in the community version first, but as it happened, during this cycle MariaDB developers were able to get the features into the enterprise release quicker.

MariaDB has plans to launch a new DBaaS product this year. Users can already deploy MariaDB to a cloud of choice on their own. MariaDB also has a managed service that provides full management for a MariaDB environment.

“With the managed service, we take care of everything for our customers, where we deploy MariaDB on their cloud of choice and we will manage it, administer it, operate and upgrade, it,” Johnson said. “We will have our own database as a service rolling out this year, which will provide an even better option.”

Go to Original Article

ArangoDB 3.6 accelerates performance of multi-model database

By definition, a multi-model database provides multiple database models for different use cases and user needs. Among the popular options users have for a multi-model database is ArangoDB from the open source database vendor.

ArangoDB 3.6, released into general availability Jan. 8, brings a series of new updates to the multi-model database platform. Among the updates are improved performance capabilities for queries and overall database operations. Also, the new OneShard feature from the San Mateo, Calif.-based vendor is a way for organizations to create robust data resilience as well as use synchronization capabilities.

For Kaseware, based in Denver, ArangoDB has been a core element since the company was founded in 2016, enabling the law enforcement software vendor’s case management system.

“I specifically sought out a multi-model database because for me, that simplified things,” said Scott Baugher, the co-founder, president and CTO of Kaseware, and a former FBI special agent. “I had fewer technologies in my stack, which meant fewer things to keep updated and patched.”

Kaseware uses ArangoDB as a document, key/value, and graph database. Baugher noted that the one other database the company uses is ElasticSearch, for its full-text search capabilities. Kaseware uses ElasticSearch because until fairly recently, ArangoDB did not offer full-text search capabilities, he said.

“If I were starting Kaseware over again now, I’d take a very hard look at eliminating ElasticSearch from our stack as well,” Baugher said. “I say that not because ElasticSearch isn’t a great product, but it would allow me to even further simplify my deployment stack.” 

Adding OneShard to ArangoDB 3.6

With OneShard, users will gain a new option for database distribution. OneShard is a feature for users for whom data is small enough to fit on a single node, but the requirement for fault tolerance still requires the database to replicate data across multiple nodes, said Joerg Schad, head of engineering and machine learning at ArangoDB.

I specifically sought out a multi-model database because for me, that simplified things. I had fewer technologies in my stack, which meant fewer things to keep updated and patched.
Scott BaugherCo-founder, president and CTO of Kaseware

“ArangoDB will basically colocate all data on a single node and hence offer local performance and transactions as queries can be evaluated on a single node,” Schad said. “It will still replicate the data synchronously to achieve fault tolerance.”

Baugher said he’ll be taking a close look at OneShard.

He noted that Kaseware now uses ArangoDB’s “resilient single” database setup, which in his view is similar, but less robust. 

“One main benefit of OneShard seems to be the synchronous replication of the data to the backup or failover databases versus the asynchronous replication used by the active failover configuration,” Baugher said.

Baugher added that OneShard also allows database reads to happen from any database node. This contrasts with active failover, in that reads are limited to the currently active node only. 

“So for read-heavy applications like ours, OneShard should not only offer performance benefits, but also let us make better use of our standby nodes by having them respond to read traffic,” he said.

More performance gains in ArangoDB 3.6

The ArangoDB 3.6 multi-model database also provides users with faster query execution thanks to a new feature for subquery optimization. Schad explained that when writing queries, it is a typical pattern to build a complex based on multiple simple queries. 

“With the improved subquery optimization, ArangoDB optimizes and processes such queries more efficiently by merging them into one which especially improves performance for larger data sizes up to a factor of 28x,” he said.

The new database release also enables parallel execution of queries to further improve performance. Schad said that if a query requires data from multiple nodes, with ArangoDB 3.6 operations can be parallelized to be performed concurrently. The end results, according to Schad, are improvements of 30% to 40% for queries involving data across multiple nodes.

Looking forward to the next release of ArangoDB, scalability improvements will be at the top of the agenda, he said.

“For the upcoming 3.7 release, we are already working on improving the scalability even further for larger data sizes and larger clusters,” Schad said.

Go to Original Article

Oracle looks to grow multi-model database features

Perhaps no single vendor or database platform over the past three decades has been as pervasive as the Oracle database.

Much as the broader IT market has evolved, so too has Oracle’s database. Oracle has added new capabilities to meet changing needs and competitive challenges. With a move toward the cloud, new multi-model database options and increasing automation, the modern Oracle database continues to move forward. Among the executives who have been at Oracle the longest is Juan Loaiza, executive vice president of mission critical database technologies, who has watched the database market evolve, first-hand, since 1988.

In this Q&A, Loaiza discusses the evolution of the database market and how Oracle’s namesake database is positioned for the future.

Why have you stayed at Oracle for more than three decades and what has been the biggest change you’ve seen over that time?

Juan LoaizaJuan Loaiza

Juan Loaiza: A lot of it has to do with the fact that Oracle has done well. I always say Oracle’s managed to stay competitive and market-leading with good technology.

Oracle also pivots very quickly when needed. How do you survive for 40 years? Well, you have to react and lead when technology changes.

Decade after decade, Oracle continues to be relevant in the database market as it pivots to include an expanding list of capabilities to serve users.

The big change that happened a little over a year ago is that Thomas Kurian [former president of product development] left Oracle. He was head of all development and when he left what happened is that some of the teams, like database and apps, ended rolling up to [Oracle founder and CTO] Larry Ellison. Larry is now directly managing some of the big technology teams. For example, I work directly with Larry.

What is your view on the multi-model database approach?

Loaiza: This is something we’re starting to talk more about. So the term that people use is multi-model but we’re using a different term, we’re using a term called converged database and the reason for that is because multi-model is kind of one component of it.

Multi-model really talks about different data models that you can model inside the database, but we’re also doing much more than that. Blockchain is an example of converging technology that is not even thought about normally as database technology into the database. So we’re going well beyond the conventional kind of multi-model of, Hey, I can do this, data format, and that data format.

Initially, the relational database was the mainstream database people used for both OLTP [online transaction processing] and analytics. What has happened in the last 10 to 15 years is that there have been a lot of new database technologies to come around, things like NoSQL, JSON, document databases, databases for geospatial data and graph databases too. So there’s a lot of specialty databases that have come around. What’s happening is, people are having to cobble together a complex kind of web of databases to solve one problem and that creates an enormous amount of complexity.

With the idea of a converged database, we’re taking all the good ideas, whether it’s NoSQL, blockchain or graph, and we’re building it into the Oracle database. So you can basically use one data store and write your application to that.

The analogy that we use is that of a smartphone. We used to have a music device and a phone device and a calendar device and a GPS device and all these things and what’s happened is they’ve all been converged into a smartphone.

Are companies actually shifting their on-premises production database deployments to the cloud?

Loaiza: There’s definitely a switch to the cloud. There are two models to cloud; one is kind of the grassroots. So we’re seeing some of that, for example, with our autonomous database that people are using now. So they’re like, ‘Hey, I’m in the finance department, and I need a reporting database,’ or, ‘hey, I’m in the marketing department, and I need some database to run some campaign with.’ So that’s kind of a grassroots and those guys are building a new thing and they want to just go to cloud. It’s much easier and much quicker to set up a database and much more agile to go to the cloud.

The second model is where somebody up in the hierarchy says, ‘Hey, we have a strategy to move to cloud.’ Some companies want to move quickly and some companies say, ‘Hey, you know, I’m going to take my time,’ and there’s everything in the middle.

Will autonomous database technology mean enterprises will need fewer database professionals?

Loaiza: The autonomous database addresses the mundane aspects of running a database. Things like tuning the database, installing it, configuring it, setting up HA [high availability], among other tasks. That doesn’t mean that there’s nothing for database professionals to do.

Like every other field where there is automation, what you do is you move upstream, you say, ‘Hey, I’m going to work on machine learning or analytics or blockchain or security.’ There’s a lot of different aspects of data management that require a lot of labor.

One of the nice things that we have in this industry is there is no real unemployment crisis in IT. There’s a lot of unfilled jobs.

So it’s pretty straightforward for someone who has good skills in data management to just move upstream and do something that’s going to add more specific value then just configuring and setting up databases, which is really more of a mechanical process.

This interview has been edited for clarity and conciseness.

Go to Original Article

Redis Labs eases database management with RedisInsight

The robust market of tools to help users of the Redis database manage their systems just got a new entrant.

Redis Labs disclosed the availability of its RedisInsight tool, a graphical user interface (GUI) for database management and operations.

Redis is a popular open source NoSQL database that is also increasingly being used in cloud-native Kubernetes deployments as users move workloads to the cloud. Open source database use is growing quickly according to recent reports as the need for flexible, open systems to meet different needs has become a common requirement.

Among the challenges often associated with databases of any type is ease of management, which Redis is trying to address with RedisInsight.

“Database management will never go out of fashion,” said James Governor, analyst and co-founder at RedMonk. “Anyone running a Redis cluster is going to appreciate better memory and cluster management tools.”

Governor noted that Redis is following a tested approach, by building out more tools for users that improve management. Enterprises are willing to pay for better manageability, Governor noted, and RedisInsight aims to do that.

RedisInsight based on RDBtools

The RedisInsight tool, introduced Nov. 12, is based on the RDBTools technology that Redis Labs acquired in April 2019. RDBTools is an open source GUI for users to interact with and explore data stores in a Redis database.

Database management will never go out of fashion.
James GovernorAnalyst and co-founder, RedMonk

Over the last seven months, Redis added more capabilities to the RDBTools GUI, expanding the product’s coverage for different applications, said Alvin Richards, chief product officer at Redis.

One of the core pieces of extensibility in Redis is the ability to introduce modules that contain new data structures or processing frameworks. So for example, a module could include time series, or graph data structures, Richards explained.

“What we have added to RedisInsight is the ability to visualize the data for those different data structures from the different modules,” he said. “So if you want to visualize the connections in your graph data for example, you can see that directly within the tool.”

RedisInsight overview dashboard
RedisInsight overview dashboard

RDBTools is just one of many different third-party tools that exist for providing some form of management and data insight for Redis. There are some 30 other third-party GUI tools in the Redis ecosystem, though lack of maturity is a challenge.

“They tend to sort of come up quickly and get developed once and then are never maintained,” Richards said. “So, the key thing we wanted to do is ensure that not only is it current with the latest features, but we have the apparatus behind it to carry on maintaining it.”

How RedisInsight works

For users, getting started with the new tool is relatively straightforward. RedisInsight is a piece of software that needs to be downloaded and then connected to an existing Redis database. The tool ingests all the appropriate metadata and delivers the visual interface to users.

RedisInsight is available for Windows, macOS and Linux, and also available as a Docker container. Redis doesn’t have a RedisInsight as a Service offering yet.

“We have considered having RedisInsight as a service and it’s something we’re still working on in the background, as we do see demand from our customers,” Richards said. “The challenge is always going to be making sure we have the ability to ensure that there is the right segmentation, security and authorization in place to put guarantees around the usage of data.”

Go to Original Article

Cloud database services multiply to ease admin work by users

NEW YORK — Managed cloud database services are mushrooming, as more database and data warehouse vendors launch hosted versions of their software that offer elastic scalability and free users from the need to deploy, configure and administer systems.

MemSQL, TigerGraph and Yellowbrick Data all introduced cloud database services at the 2019 Strata Data Conference here. In addition, vendors such as Actian, DataStax and Hazelcast said they soon plan to roll out expanded versions of managed services they announced earlier this year.

Technologies like the Amazon Redshift and Snowflake cloud data warehouses have shown that there’s a viable market for scalable database services, said David Menninger, an analyst at Ventana Research. “These types of systems are complex to install and configure — there are many moving parts,” he said at the conference. With a managed service in the cloud, “you simply turn the service on.”

Menninger sees cloud database services — also known as database as a service (DBaaS) — as a natural progression from database appliances, an earlier effort to make databases easier to use. Like appliances, the cloud services give users a preinstalled and preconfigured set of data management features, he said. On top of that, the database vendors run the systems for users and handle performance tuning, patching and other administrative tasks.

Overall, the growing pool of DBaaS technologies provides good options “for data-driven companies needing high performance and a scalable, fully managed analytical database in the cloud at a reasonable cost,” said William McKnight, president of McKnight Consulting Group.

Database competition calls for cloud services

For database vendors, cloud database services are becoming a must-have offering to keep up with rivals and avoid being swept aside by cloud platform market leaders AWS, Microsoft and Google, according to Menninger. “If you don’t have a cloud offering, your competitors are likely to eat your lunch,” he said.

Strata Data Conference
The Strata Data Conference was held from Sept. 23 to 26 in New York City.

Todd Blaschka, TigerGraph’s chief operating officer, also pointed to the user adoption of the Atlas cloud service that NoSQL database vendor MongoDB launched in 2016 as a motivating factor for other vendors, including his company. “You can see how big of a revenue generator that has been,” Blaschka said. Services like Atlas “allow more people to get access [to databases] more quickly,” he noted.

Blaschka said more than 50% of TigerGraph’s customers already run its namesake graph database in the cloud, using a conventional version that they have to deploy and manage themselves. But with the company’s new TigerGraph Cloud service, users “don’t have to worry about knowing what a graph is or downloading it,” he said. “They can just build a prototype database and get started.”

TigerGraph Cloud is initially available in the AWS cloud; support will also be added for Microsoft Azure and then Google Cloud Platform (GCP) in the future, Blaschka said.

Yellowbrick Data made its Yellowbrick Cloud Data Warehouse service generally available on all three of the cloud platforms, giving users a DBaaS alternative to the on-premises data warehouse appliance it released in 2017. Later this year, Yellowbrick also plans to offer a companion disaster recovery service that provides cloud-based replicas of on-premises or cloud data warehouses.

More cloud database services on the way

MemSQL, one of the vendors in the NewSQL database category, detailed plans for a managed cloud service called Helios, which is currently available in a private preview release on AWS and GCP. Azure support will be added next year, said Peter Guagenti, MemSQL’s chief marketing officer.

About 60% of MemSQL’s customers run its database in the cloud on their own now, Guagenti said. But he added that the company, which primarily focuses on operational data, was waiting for the Kubernetes StatefulSets API object for managing stateful applications in containers to become available in a mature implementation before launching the Helios service.

Actian, which introduced a cloud service version of its data warehouse platform on AWS last March, said it will make the Avalanche service available on Azure this fall and on GCP at a later date.

We ultimately are the caretaker of the system. We may not do the actual work, but we guide them on it.
Naghman WaheedData platforms lead, Bayer Crop Science

DataStax, which offers a commercial version of the Cassandra open source NoSQL database, said it’s looking to make a cloud-native platform called Constellation and a managed version of Cassandra that runs on top of it generally available in November. The new technologies, which DataStax announced in May, will initially run on GCP, with support to follow on AWS and Azure.

Also, in-memory data grid vendor Hazelcast plans in December to launch a version of its Hazelcast Cloud service for production applications. The Hazelcast Cloud Dedicated edition will be deployed in a customer’s virtual private cloud instance, but Hazelcast will configure and maintain systems for users. The company released free and paid versions of the cloud service for test and development uses in March on AWS, and it also plans to add support for Azure and GCP in the future.

Managing managed database services vendors

Bayer AG’s Bayer Crop Science division, which includes the operations of Monsanto following Bayer’s 2018 acquisition of the agricultural company, uses managed database services on Teradata data warehouses and Oracle’s Exadata appliance. Naghman Waheed, data platforms lead at Bayer Crop Science, said the biggest benefit of both on-premises and cloud database services is offloading routine administrative tasks to a vendor.

“You don’t have to do work that has very little value,” Waheed said after speaking about a metadata management initiative at Bayer in a Strata session. “Why would you want to have high-value [employees] doing that work? I’d rather focus on having them solve creative problems.”

But he said there were some startup issues with the managed services, such as standard operating procedures not being followed properly. His team had to work with Teradata and Oracle to address those issues, and one of his employees continues to keep an eye on the vendors to make sure they live up to their contracts.

“We ultimately are the caretaker of the system,” Waheed said. “We do provide guidance — that’s still kind of our job. We may not do the actual work, but we guide them on it.”

Go to Original Article

Amazon Quantum Ledger Database brings immutable transactions

The Amazon Web Services Quantum Ledger Database is now generally available.

The database provides a cryptographically secured ledger as a managed service. It can be used to store both structured and unstructured data, providing what Amazon refers to as an immutable transaction log.

The new database service was released on Sept. 10, 10 months after AWS introduced it as a preview technology.

The ability to provide a cryptographically and independently verifiable audit trail of immutable data has multiple benefits and use cases, said Gartner vice president and distinguished analyst Avivah Litan.

“This is useful for establishing a system of record and for satisfying various types compliance requirements, such as regulatory compliance,” Litan said. “Gartner estimates that QLDB and other competitive offerings that will eventually emerge will gain at least 20% of permissioned blockchain market share over the next three years.”

A permissioned blockchain has a central authority in the system to help provide overall governance and control. Litan sees the Quantum Ledger Database as satisfying several key requirements in multi-company projects, which are typically complementary to existing database systems.

Among the requirements is that once data is written to the ledger, the data is immutable and cannot be deleted or updated. Another key requirement that QLDB satisfies is that it provides a cryptographically and independently verifiable audit trail.

“These features are not readily available using traditional legacy technologies and are core components to user interest in adopting blockchain and distributed ledger technology,” Litan said. “In sum, QLDB is optimal for use cases when there is a trusted authority recognized by all participants and centralization is not an issue.”

How AWS Quantum Ledger Database works shown in diagram graphic
Diagram of how AWS Quantum Ledger Database works

Centralized ledger vs. de-centralized blockchain

The basic promise of many blockchain-based systems is that they are decentralized, and each party stores a copy of the ledger. For a transaction to get stored in a decentralized and distributed ledger, multiple parties have to come to a consensus. In this way, blockchains achieve trust in a distributed and decentralized way.

“Customers who need a decentralized application can use Amazon Managed Blockchain today,” said Rahul Pathak, general manager of databases, analytics and blockchain at AWS. “However, there are customers who primarily need the immutable and verifiable components of a blockchain to ensure the integrity of their data is maintained.”

By quantum, we imply indivisible, discrete changes. In QLDB, all the transactions are recorded in blocks to a transparent journal where each block represents a discrete state change.
Rahul PathakGeneral manager of databases, analytics and blockchain, Amazon Web Services

For customers who want to maintain control and act as the central trusted entity, just like any database application works today, a decentralized system with multiple entities is not the right fit for their needs, Pathak said.

“Amazon [Quantum Ledger Database] combines the data integrity capabilities of blockchain with the ease and simplicity of a centrally owned datastore, allowing a single entity to act as the central trusted authority,” Pathak said.

While QLDB includes the term “quantum” in its name, it’s not a reference to quantum computing

“By quantum, we imply indivisible, discrete changes,” Pathak said. “In QLDB, all the transactions are recorded in blocks to a transparent journal where each block represents a discrete state change.”

How the Amazon Quantum Ledger Database works

The immutable nature of QLDB is a core element of the database’s design. Pathak explained that QLDB uses a cryptographic hash function to generate a secure output file of the data’s change history, known as a digest. The digest acts as a proof of the data’s change history, enabling customers to look back and validate the integrity of their data changes.

From a usage perspective QLDB supports the PartiQL open standard query language that supports SQL-compatible access to data. Pathak said that customers can build applications with the Amazon QLDB Driver for Java to write code that accesses and manipulates the ledger database.

“This is a Java driver that allows you to create sessions, execute PartiQL commands within the scope of a transaction, and retrieve results,” he said. 

Developed internally at AWS

The Quantum Ledger Database is based on technology that AWS has been using for years, according to Pathak. AWS has been using an internal version of Amazon QLDB to store configuration data for some of its most critical systems, and has benefitted from being able to view an immutable history of changes, he said.

“Over time, our customers have asked us for the same ledger capability, and a way to verify that the integrity of their data is intact,” he said. “So, we built Amazon QLDB to be immutable and cryptographically verifiable.”

Go to Original Article

Startup Dgraph Labs growing graph database technology

Dgraph Labs Inc. is set to grow its graph database technology with the help of a cash infusion of venture financing.

The company was founded in 2015 as an effort to advance the state of graph database technology. Dgraph Labs’ founder and CEO Manish Jain previously worked at Google, where he led a team that was building out graph database systems. Jain decided there was a need for a high-performance graph database technology that could address different enterprise use cases.

Dgraph said July 31 it had completed an $11.5 million Series A funding round.

The Dgraph technology is used by a number of different organizations and projects. Among them is Intuit, which uses Dgraph as the back-end graph database for its’ open source project K-Atlas.

“We were looking for a graph database with high performance in querying large-scale data sets, fully distributed, highly available and as cloud-native as possible,” said Dawei Ding, engineering manager at Intuit.

Ding added that Dgraph’s graph database technology stood out from both architectural design as well as performance benchmarking perspective. Moreover, he noted that being fully open sourced made Dgraph an even more attractive solution for Intuit’s open source software project.

The graph database landscape

Multiple technologies are available in the graph database landscape, including from Neo4j, Amazon Neptune and DataStax Enterprise Graph, among others. In Jain’s view, many graph database technologies are actually graph layers, rather than full graph databases.

“By graph layer, what I mean is that they don’t control storage; they just overlay a graph layer on top of some other database,” Jain said.

So, for example, he said a common database used by graph layer-type technologies is Apache Cassandra or, in Amazon’s case, Amazon Aurora.

Screenshot of graph database from Dgraph Labs showing information about all movies directed by Steven Spielberg
Graph database of all the movies directed by Steven Spielberg, their country of filming, genres, actors in those movies and the characters played by those actors.

“The problem with that approach is that to do the graph traversal or to do a graph join, you need to first bring the data to the layer before you can interact with it and do interesting things on it,” Jain commented. “So, there’s multiple back and forth steps and, therefore, the performance likely will decrease.”

In contrast, the founding principle behind Dgraph was that graph database technology could be developed that can scale horizontally while also upping performance, because the database controls how data is stored on disk.

Open source and the enterprise

Dgraph is an open source project and hit its 1.0 milestone in December 2017. The project has garnered more than 10,000 stars on GitHub, which Jain points to as a measure of the undertaking’s popularity.

Going a step further is the company’s Dgraph Enterprise platform, which provides more capabilities that an organization might need for support, access control and management. Jain said Dgraph Labs is using the open core model, in which the open source application is free to use, but then if an organization wants certain features, it must pay for it.

Jain stressed that the core open source project is functional on its own — so much so that an organization could choose to run a 20 node Dgraph cluster with replication and consistency for free.

Why graph databases matter

We were looking for a graph database with high performance in querying large-scale data sets, fully distributed, highly available and as cloud-native as possible.
Dawei DingEngineering manager, Intuit

A problem with typical relational databases is that with every data model comes a new table, or new schema, and, over time, that can become a scaling challenge, Jain said. He added that in the relational database approach, large data sets tend to become siloed over time as well. With a graph database, it is possible to unify disparate data sources.

As an example of how a graph database technology approach can help to eliminate isolated data sources, Jain said one of Dgraph’s largest enterprise customers took 60 different vertical data silos stored in traditional database and put all of it into a Dgraph database.

“Now, they’re able to run queries across all of these data sets, to be able to power not only their apps, but also power real-time analytics,” Jain said.

What’s next for Dgraph Labs

With the new funding, Jain said the plan is to open new offices for the company as well as expand the graph database technology.

One key area of future expansion is building a Dgraph cloud managed service. Another area that will be worked on is third-party integration, with different technologies such as Apache Spark for data analysis.

“Now that we have a bunch of big companies using Dgraph, they need some additional features, like, for example, encryption, and so we are putting a good chunk of our time into building out capabilities,” he said.

Go to Original Article

Third-party Kubernetes tools hone performance for database containerization

Enterprise IT shops that want to modernize legacy applications or embark on a database containerization project are the target users for Kubernetes tools released this month.

Robin Systems, which previously offered its own container orchestration utility, embraced Kubernetes with hyper-converged infrastructure software it claimed can optimize quality of service for database containerization, AI and machine learning applications, as well as big data applications, such as Spark and Hadoop. Turbonomic also furthered its Kubernetes optimization features with support for multi-cloud container management. Turbonomic’s Self-Managing Kubernetes tool could also help with database containerization, because it takes performance and cost optimization into account.

The products join other third-party Kubernetes management tools that must add value over features offered natively in pure upstream Kubernetes implementations — a difficult task as the container orchestration engine matures. However, the focus on database containerization and its performance challenges aligns with the enterprise market’s momentum, analysts said.

“For many vendors, performance is an afterthought, and the monitoring and management side is an afterthought,” said Milind Govekar, analyst with Gartner. “History keeps repeating itself along those lines, but now we can make mistakes faster because of automation and worse mistakes with containers, because they’re easier to spin up.”

While early adopters such as T-Mobile already use DC/OS for database containerization, most enterprises aren’t yet ready for stateful applications in containers.

“Stateless apps are still the low-hanging fruit,” said Jay Lyman, analyst with 451 Research. “It will be a slow transition for organizations pushing [containerization] into data-rich applications.”

Robin Systems claims superior database containerization approach

Now, we can make mistakes faster because of automation and worse mistakes with containers, because they’re easier to spin up.
Milind Govekaranalyst, Gartner

Robin Systems faces more of an uphill battle against both pure Kubernetes and established third-party tools with its focus on big data apps and database containerization. Mesosphere has already targeted this niche for years with DC/OS. And enterprises can also look to Red Hat OpenShift for database containerization, given the platform’s maturity and users’ familiarity with Red Hat’s products.

Robin Systems’ founders claimed better quality-of-service guarantees for individual containers and workloads than OpenShift and DC/OS, because the company designed and controls all levels of its software-defined infrastructure package, which includes network and storage management, in addition to container orchestration. It guarantees minimum and maximum application performance throughout the infrastructure — including CPU, memory, and network and storage IOPS allocations — within one policy, whereas competitors integrate with tools such as the open source Container Network Interface plug-in, OpenShift Container Storage and Portworx persistent storage.

Control over the design of the storage layer enables Robin’s platform to take cluster-wide snapshots of Kubernetes deployments and their associated applications, which isn’t possible natively on OpenShift or DC/OS yet.

Plenty of vendors claim a superior approach with their Kubernetes tools, and many major enterprise IT shops have already chosen a strategic Kubernetes vendor for production application development and deployment.

However, companies such as John Hancock also must modernize a massive portfolio of legacy applications, including IBM DB2 and Microsoft SQL Server databases in versions so old they’re no longer supported by the original manufacturers.

John Hancock, a Boston-based insurer and a division of financial services group Manulife Financial Corp., is conducting a proof of concept with Robin Systems, as it mulls database containerization for IBM DB2. The company wants to move the mainframe-based system into the Microsoft Azure cloud for development and testing, which it sees as simpler and more affordable than its current approach to managing internally developed production apps with Pivotal’s PaaS offering by a separate department.

“It’s not going to fly if [a database containerization platform] will take four people eight months to get working,” said Kurt Straube, systems director for the insurance company. Robin’s hyper-converged infrastructure approach, which bundles networking and storage with container and compute management, might be a shortcut to database containerization for legacy apps where ease of use and low cost are paramount.

Turbonomics Kubernetes tool
Turbonomic’s Kubernetes tool manages MongoDB performance.

Turbonomic targets container placement, not workloads

While Robin Systems’ platform approach puts it squarely into competition with PaaS vendors such as Red Hat OpenShift, Pivotal Container Service (PKS) and Mesosphere’s DC/OS, Turbonomic’s product spans Kubernetes platforms such as Amazon Elastic Container Service for Kubernetes, Azure Kubernetes Service, Google Kubernetes Engine and PKS.

Turbonomic’s Kubernetes tool optimizes container placement across different services, but doesn’t manage individual container workloads. It fills a potential need in the market, and it keeps Turbonomic out of direct competition with established Kubernetes tools.

“There are many PaaS vendors that can manage Kubernetes clusters, but what they can’t do is tell a user how to optimize the number of containers on a cluster so that the right resources are available to each container,” Gartner’s Govekar said.

A number of tools manage VM placement between multiple cloud infrastructure services, such as the Open Service Broker API. However, “many of these tools don’t do a great job from a performance optimization standpoint specifically,” Govekar said.

Oracle Autonomous Database Cloud gets transaction processing

Oracle is now offering transaction processing capabilities as part of its Autonomous Database Cloud software platform, which is designed to automate database administration tasks for Oracle users in the cloud.

The vendor launched a new Oracle Autonomous Transaction Processing (ATP) cloud service, expanding on the data warehouse service that debuted in March as the first Autonomous Database Cloud offering. The addition of Oracle ATP enables the automated system to handle both transaction and analytical processing workloads, Oracle executive chairman and CTO Larry Ellison said during a launch event that was streamed live.

Ellison reiterated Autonomous Database Cloud’s primary selling point: that automated administration functions driven partly by machine learning algorithms eliminate the need for hands-on configuration, tuning and patching work by database administrators (DBAs).

That frees up DBAs to focus on more productive data management tasks and could lead to lower labor costs for customers, he claimed.

“There’s nothing to learn, and there’s nothing to do,” said Ellison, who also repeated previous jabs at cloud platforms market leader Amazon Web Services (AWS) and previewed the upcoming 19c release of the flagship Oracle Database software that underlies Autonomous Database Cloud.

Cloud success still a test for Oracle

However, while Ellison taunted Amazon for its longtime reliance on Oracle databases and expressed skepticism about his competitor’s ability to execute a reported plan to completely move off of them by 2020, Oracle lags behind not only AWS but also Microsoft and Google in the ranks of cloud platform vendors.

Make no mistake, Oracle still has to prove themselves in the cloud.
Adam Ronthalanalyst, Gartner

“Make no mistake, Oracle still has to prove themselves in the cloud,” Gartner database analyst Adam Ronthal said in an email after the announcement.

And Oracle isn’t starting from a position of strength. Overall, the technology lineup that Oracle currently offers on its namesake cloud doesn’t match the breadth of what users can get on AWS, Microsoft Azure and the Google Cloud Platform, Ronthal said.

But Oracle ATP “helps close that gap, at least in the data management space,” he said.

Together, ATP and the Autonomous Data Warehouse (ADW) service that preceded it “are Oracle coming out to the world with products that are built and architected for cloud,” with promises of scalability, elasticity and a low operational footprint for users, Ronthal said.

Oracle's Larry Ellison speaking at the launch of Oracle Autonomous Transaction Processing
Larry Ellison, Oracle’s executive chairman and CTO, introduces the Autonomous Transaction Processing cloud database service.

The Autonomous Database Cloud services are only available on the Oracle Cloud, and Oracle also limits other key data management technologies to its own cloud platform; for example, it doesn’t offer technical support for its Oracle Real Application Clusters software on other clouds.

In addition, Ronthal noted that it’s typically more expensive to run regular Oracle databases on AWS and Azure than on Oracle’s cloud because of software licensing changes Oracle made last year.

“Oracle is doing everything it can to make its cloud the most attractive place to run Oracle databases,” Ronthal said.

But now the company needs to build some momentum by convincing customers to adopt Oracle ATP and ADW, he added — even if that’s likely to primarily involve existing Oracle users migrating to the cloud services, as opposed to new customers.

Oracle’s autonomous services get a look

Clothing retailer Gap Inc. is a case in point, although the San Francisco company’s use of Oracle databases could grow as part of a plan to move more of its data processing operations to the Oracle Cloud.

For example, Gap is working with Oracle on a proof-of-concept project to convert an on-premises Teradata data warehouse to Oracle ADW, said F.S. Nooruddin, the retailer’s chief IT architect.

That’s a first step in the potential consolidation of various data warehouses into the ADW service, he said. Gap also plans to look closely at Oracle ATP for possible transaction processing uses, according to Nooruddin, who took part in a customer panel discussion during the ATP launch event.

Gap already runs Oracle’s retail applications and Hyperion enterprise performance management software in the cloud.

As the retailer’s use of the cloud expands, the Autonomous Database Cloud technologies could help ensure that all of its Oracle database instances, from test and development environments to production systems, are properly patched and secured, Nooruddin said.

Ellison said Oracle ATP also automatically scales the transaction processing infrastructure allotted to users up and down as workloads fluctuate, so they can meet spikes in demand without paying for compute, network and storage resources they don’t need.

That capability appeals to Gap, too, said Connie Santilli, the company’s vice president of enterprise systems and strategy. Gap’s transaction processing and downstream reporting workloads increase sharply during the holiday shopping season — a common occurrence in the retail industry. But Santilli said Gap had to build its on-premises IT architecture to handle the peak performance level, with less flexibility for downsizing systems when the full processing resources aren’t required.

Cloud costs and considerations for Oracle users

In taking aim at AWS, Ellison again said Oracle would guarantee a 50% reduction in infrastructure costs to Amazon users that migrate to Autonomous Database Cloud — a vow he first made at the Oracle OpenWorld 2017 conference.

Meanwhile, Ellison said Oracle customers can use existing on-premises database licenses to make the switch to Oracle ATP and ADW, avoiding the need to pay for the software again. In such cases, users would continue to pay their current annual support fees plus the cost of their cloud infrastructure usage.

The ATP and ADW services layer the automation capabilities Oracle developed on top of Oracle Database 18c, which Oracle released in February as part of a new plan to update the database software annually. During the ATP launch, Ellison disclosed some details about the planned 19c release and the capabilities it will add to Autonomous Database Cloud.

When databases are upgraded to the 19c-based cloud services, the software will automatically check built-in query execution plans and retain the existing ones if they’ll run faster than new ones, Ellison said. That eliminates the need for DBAs to do regression testing on the plans themselves, he added.

Other new features coming with Oracle Database 19c include the ability to configure Oracle ATP and ADW on dedicated Exadata systems in the Oracle Cloud instead of sharing a multitenant pool of the machines, and to deploy the cloud services in on-premises data centers through Oracle’s [email protected] program.

Oracle’s official roadmap shows 19c becoming available in January 2019, but Ellison claimed that was “worst case” and said the new release may be out before the end of this year.

MongoDB 4.0, Stitch aim to broaden use of NoSQL database

MongoDB Inc. is releasing several technologies designed to make its namesake NoSQL database a viable option for more enterprise applications, led by a MongoDB 4.0 update with expanded support for the ACID transactions that are a hallmark of mainstream relational databases.

Beyond MongoDB 4.0, the company, at its MongoDB World user conference in New York, also launched a serverless platform called Stitch that’s meant to streamline application development, initially for use with the MongoDB Atlas hosted database service in the cloud.

In addition, MongoDB made a mobile version of the database available for beta testing and enabled Atlas users to distribute data to different geographic areas globally for faster performance and regulatory compliance.

While MongoDB is one of the most widely used NoSQL technologies, the open source document database still has a tiny presence compared to relational behemoths like Oracle Database and Microsoft SQL Server. MongoDB, which went public in October 2017, reported total revenue of just $154.5 million for its fiscal year that ended in January — amounting to a small piece of the overall database market.

But MongoDB 4.0’s support for ACID transactions across multiple JSON documents could make it a stronger alternative to relational databases, according to Stephen O’Grady, an analyst at technology research and consulting firm RedMonk in Portland, Maine.

The ACID properties — atomicity, consistency, isolation and durability — ensure that database transactions are processed accurately and reliably. Previously, MongoDB only offered a form of such guarantees at the individual document level. MongoDB 4.0, which has been in beta testing since February, supports multi-document ACID transactions — a must-have requirement for many enterprise users with transactional workloads to run, O’Grady said.

“Particularly in financial shops, if you can’t give me an ACID guarantee, that’s just a non-starter,” he said.

O’Grady said he doesn’t expect companies to replace the back-end relational databases that run their ERP systems with MongoDB, but he added that the document database is now a more feasible option for users who are looking to take advantage of the increased data flexibility and lower costs offered by NoSQL software in other types of transactional applications.

New technologies from MongoDB
MongoDB’s new product offerings, at a glance.

Moving from Oracle to MongoDB

That’s the case at Acxiom Corp., which collects and analyzes customer data to help companies target their online marketing efforts to web users.

Acxiom already converted two Oracle-based systems to MongoDB: a metadata repository three years ago, and a real-time operational data store (ODS) that was switched over in January. And the Conway, Ark., company wants to move more data processing work to MongoDB in the future, said Chris Lanaux, vice president of its product and engineering group.

Oracle and other relational databases are much more expensive to run and aren’t as cloud-friendly as MongoDB is, Lanaux said.

When you’re moving 90 miles per hour, it’s helpful to have guaranteed consistency. Now we don’t have to worry about that anymore.
John Riewertssenior director of engineering, Acxiom Corp.

John Riewerts, senior director of engineering on Lanaux’s team, added that Amazon Web Services and cloud platform providers offer individual flavors of relational databases. With MongoDB, “it’s just a flip of a switch for us to decide which cloud platform to put it on,” he said.

The ACID transactions support in MongoDB 4.0 is a big step forward for the NoSQL database, Riewerts said. Acxiom writes transactions to multiple documents in both the metadata system and the ODS; currently, it does workarounds to make sure that all of the data gets updated properly, but that isn’t optimal, according to Riewerts.

“When you’re moving 90 miles per hour, it’s helpful to have guaranteed consistency,” he said. “Now we don’t have to worry about that anymore.”

Acxiom also was an early user of the MongoDB Stitch backend-as-a-service platform, which was released for beta testing a year ago. Stitch gives developers an API that connects to MongoDB at the back end, plus built-in capabilities for creating JavaScript functions, integrating with other cloud services and setting triggers to automatically invoke real-time actions when data is updated.

Scott Jones, a principal architect at Acxiom, said the serverless technology enabled two developers in the product and engineering group to deploy the ODS on the MongoDB Atlas cloud service without having to wait for the company’s IT department to set up the system.

“We’re not dealing with anything really but the business logic of what we’re trying to build,” he noted.

More still needed from MongoDB

Lanaux said MongoDB still has to deliver some additional functionality before Acxiom can move other applications to the NoSQL database. For example, improvements to a connector that links MongoDB to SQL-based BI and analytics tools could pave the way for some data analytics jobs to be shifted.

“But we’re betting on [MongoDB],” he said. “Thus far, they’ve checked every box that they’ve promised us.”

Ovum analyst Tony Baer said MongoDB also needs to stay focused on competing against its primary document database rivals, including DataStax Enterprise and Amazon DynamoDB, as well as Microsoft’s Azure Cosmos DB multimodel database.

Particularly in the cloud, DynamoDB and Azure Cosmos DB “are going to challenge them,” Baer said, noting that Amazon and Microsoft can bill their products as the default NoSQL offerings for their cloud platforms. Stitch may help counter that, though, by keeping MongoDB “true to its roots as a developer-friendly database,” he added.

MongoDB 4.0 lists for $14,990 per server. MongoDB Stitch users will be charged 50 cents for each GB of data transferred between Stitch and their front-end applications, as well as back-end services other than Atlas. They’ll also pay for using compute resources at a rate of $0.000025 per GB-second, which is calculated by multiplying the execution time of each processing request by the amount of memory that’s consumed.