Google has released a beta version of BigQuery ML, new software that lets users build some machine learning models inside the Google BigQuery cloud data warehouse with standard SQL commands.
BigQuery ML eliminates the need to move data sets from Google BigQuery to a separate tool to develop and train analytical models. Its SQL support also opens the machine learning process to SQL-savvy data analysts who might not be versed in more-advanced languages like R, Python and Scala that data scientists typically use to build machine learning models.
However, the new technology is limited in what it can do. Google said BigQuery ML initially supports only two types of models: linear regression ones that predict numerical values, such as sales forecasts, and binary logistic regression models that can be used to do two-group customer segmentation, identify email as spam and do other relatively simple classifications in data sets.
In addition, BigQuery ML is based on the standard batch variant of the gradiant descent methodology that drives machine learning algorithms instead of the so-called stochastic version.
The stochastic approach “is far more common in today’s large-scale machine learning systems,” Google acknowledged in a blog post about BigQuery ML. The company added, though, that the batch variant “has numerous practical advantages” on the performance, stability and tuning of machine learning models.
Broadening the machine learning user base
BigQuery ML likely won’t convince many data scientists who analyze data stored in BigQuery to change how they build models, said Daniel Mintz, chief data evangelist at software vendor Looker Data Sciences Inc., which has teamed up with Google to enable its data modeling and analytics platform to function as a front-end tool for BigQuery ML users.
“Professional data scientists, the people who do this all the time, are going to continue to use the tools they’re most comfortable with,” Mintz said.
But, Mintz added, BigQuery ML makes it feasible for the hordes of data analysts “who know SQL but haven’t done much with machine learning yet” to start developing models without having to learn new languages or deploy additional analytics tools.
Miguel Angel Campo-Rembadosenior vice president of data science and analytics, 20th Century Fox
And, in some cases, busy data scientists may be able to speed up the model-building process to better support business needs for information by using BigQuery ML.
For example, film studio 20th Century Fox is an early user of the technology. In a keynote session at the Google Cloud Next ’18 conference in San Francisco that was streamed online, Miguel Angel Campo-Rembado, the studio’s senior vice president of data science and analytics, said its marketing team needs analytics input on a continual basis to assess advertising and promotional campaigns for movies.
“But we have a lean team of data scientists, and it can get a bit challenging to support all of the campaigns in all possible cases,” Campo-Rembado said.
Less of a machine learning maze to run
With BigQuery ML, Campo-Rembado added, his team was able to build a linear regression model in just 30 seconds to analyze movie trailers to help pinpoint audiences that should be targeted in promoting the latest Maze Runner movie released in January. All it took was adding a CREATE MODEL statement in BigQuery ML to an existing SQL query for audience analysis, he said.
That, and the ability to keep the entire process inside Google BigQuery, enabled the analytics team to quickly run the model and deliver the results to the Los Angeles-based studio’s marketers “within minutes,” according to Campo-Rembado.
At its core, BigQuery ML is a set of SQL extensions designed to support machine learning and predictive analytics. Google, which announced the technology at Google Cloud Next, said users can build machine learning models in BigQuery ML with simple SQL statements like this:
CREATE MODEL dataset.model_name
AS SELECT * FROM input_table;
More work to do on BigQuery ML
Google didn’t say how long the beta-testing cycle will last or when it expects to make BigQuery ML generally available.
In its blog post, the company said that it plans to do more to boost the technology’s performance and that it will explore adding support for other types of machine learning algorithms to broaden BigQuery ML’s potential uses.
Looker, based in Santa Cruz, Calif., said its integration with BigQuery ML lets analytics teams use its namesake platform to prepare data for analysis, build and run their analytical models in a Google BigQuery data warehouse, and then disseminate the resulting information to business executives and workers.
“From a user’s perspective, it’s all a Looker front end,” Mintz said. “BigQuery ML is running under the hood, but it looks like one tool to users.” He added that BigQuery ML is the first tool Looker has seen that directly integrates machine learning capabilities into a data warehouse’s SQL interface.