Predibase Takes Declarative Approach to AutoML The New Stack – thenewstack.io

Its no secret that creating and deploying machine learning models takes too long. In Algorithmias 2021 Enterprise Trends in Machine Learning, 25% of respondents said creating a model took one week to one month, while 24% put that time at one month to one quarter. And 37% said it took one quarter to one year to deploy a model.

At Uber, an intent classification system that involved 1,500 lines of TensorFlow took five months to create and seven months to deploy.

A second machine learning project, fraud detection, with 900 lines of PyTorch, took five months to create and four months to deploy.

A product recommendation tool with 1,200 lines of PyTorch took six months to create and seven months to deploy.

San Francisco-based startup Predibase is out to change that by providing a low-code declarative ML platform that both data scientists and non-experts can use, easing the pressure on organizations to hire more scarce and expensive data scientists. Users can just state what they want to do starting with just six lines of Python code and let the system figure out how to do it and the infrastructure required.

Its built atop two machine learning technologies created by the Predibase founders at Uber: Ludwig and Horovod. Ludwig is an open source,declarative machine learning frameworkthat provides the simplicity of an autoML solution with the flexibility of writing your own PyTorch code. Horovod, an open source component of Ubers Michelangelo deep learning toolkit makes it easier to start and speed up distributed deep learning projects with TensorFlow.

The experience is that data science organizations have to basically reinvent the wheel and create a bespoke solution for every single one of these products, and theres not much in common among them. Because of that, the whole organization becomes a bottleneck for machine learning adoption, said Predibase CEO, Piero Molino. The result is that it just takes too long for machine learning models to bring value to an organization.

In contrast, he compares a declarative configuration system to what Kubernetes has done for infrastructure.

Our vision is to make machine learning as easy as writing a SQL query, Molino said.

The basic idea is to let users specify entire model pipelines as configurations the parts they care about and automate the rest.

Traditional machine learning projects involve a complicated ML life cycle that spans feature and data engineering; model development and training; and model production and governance. Cross-functional data science teams struggle to manage these phases in a coherent and sustainable way, said Kevin Petrie, vice president of research at Eckerson Group.

Predibase represents a level of innovation to simplify the ML life cycle. Predibase proposes to let data science teams specify the desired inputs and outputs for their ML model. That is, they create configuration files that Predibase then figures out how to implement. Data science teams still can customize as many parameters, etc. as they like by making modular changes to meet new or changing customer requirements.

In short, Predibase proposes to minimize the complexity of the ML life cycle, which is the biggest barrier to success with data science projects.

Its easy to get started. That Uber intent classification system could be created, for example with six lines of code. You get something that is readable and reproducible and shareable, he said.

But one of the advantages is that you retain all the flexibility and control that an expert needs. So you can specify through the configuration all the details about the models choosing among different model architectures, training parameters, about the preprocessing of the data. Its all accessible through a parameter in the configuration, which makes it easy to iterate and improve models. Make changes with just a new configuration.

Its also extensible. So if youre an expert developer, you can add your own keys to the configuration. You can extend this by adding your own piece of PyTorch, for instance, and then it can be referenced from the configuration.

The company has deep expertise in machine learning.

Molino, the creator of Ludwig, previously was staff research scientist at Stanford University and co-founder and senior research scientist at Uber AI.

Fellow Predibase co-founders are:

Predibase enables users to easily connect to structured and unstructured data stored anywhere on the cloud data stack; write model pipeline configurations and run on a scalable distributed infrastructure to train models as easily as on a single machine; deploy model pipelines with the click of a button and query them immediately.

Predibase is building the first declarative ML platform that enables enterprises to develop and operationalize models, from data to deployment, without having to choose between simplicity and the power of fine-grained controls. The rapid success of both the open source foundations and the beta of its commercial platform in the Fortune 500 has been incredibly exciting, Greylock Partner Saam Motamedi said at the recent announcement of a $16.25 million Series A round.

Still in private beta with Fortune 500 customers, Predibase is looking toward a general release in the second half of this year.

Customers have been using datasets of about 1 billion to 2 billion rows about 100 to 200 columns and several hundred gigabytes. Internal benchmarking has run up to 2 terabytes. Ludwig and Horovod, however, have been tested on much larger data set sizes even than that, according to Rishi.

The company maintains it takes a different approach than other automated machine learning products.

Thinking of something like DataRobot or Google Cloud AutoML, for example, [they] provide these interfaces where you kind of bring in data, click a button and you get models out, explained Molino. We found that thats actually pretty unsatisfying for a lot of users and customers because they tend to be black boxes that dont have any configurability or control. So the minute that the platform doesnt give you a good out-of-the-box model, youre kind of stuck, and you end up graduating out.

Users can access the capabilities in Predibase purely through Python, through the UI or through PQL (Predictive Query language), an extension of SQL.

The PQL extension includes predicates that allow you to bring machine learning and data together, Rishi explained. Its flexibility puts machine learning in the language of, of data users, so they can use filter, group by aggregate, join or any other commands that theyre familiar with in SQL. Its extensible. Simply add new features as an additional predicate. Predibase makes it just as easy to use text and image and other types of fields as standard tabular fields.

This is really simple. It brings machine learning into the hands of a broader set of users that are familiar with SQL, but at the same time, behind the scenes, the power and flexibility of the Ludwig configuration system provide state-of-the-art performance on both structured and unstructured data, and the combination of the two, Molino said.

And finally, we also abstract away the infrastructure based on Horovod, they can train and deploy models at scale. And its basically a big-tech-level infrastructure without the need to have a big-tech-level engineering team to build it, right. Its already built for you.

Models can be queried as REST APIs, through the Python SDK and through the PQL language. Though the entire process is encapsulated in the platform, the models also can be exported, should the user need to run them elsewhere.

A model repositories page summarizes the models just as configuration, making comparing model versions easy.

The company is spending the first half of this year making that product enterprise-ready with robustness, enterprise-grade security and enabling multicloud deployments, Molino said. After a GA launch, it wants to pursue integrations with the wider ML ecosystem, with tools like dbt, for instance, and eventually make Predibase self-service.

Feature image via Pixabay

Read this article:

Predibase Takes Declarative Approach to AutoML The New Stack - thenewstack.io

Related Posts

Comments are closed.