Comet partners with Snowflake to enhance the reproducibility of machine learning datasets – VentureBeat

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

MLOps platformComettoday announced a strategic partnership withSnowflake aimed at empowering data scientists to build superior machine learning (ML) models at an accelerated pace.

Comet said that the collaboration will enable integration of Comets solutions into Snowflakes unified platform, enabling developers to track and version their Snowflake queries and datasets within their Snowflake environment.

Comet says that this integration will enable the tracing of a models lineage and performance, offering more visibility and comprehension than with traditional development processes. It will also have an impact on model performance in response to changes in data.

Overall, the company believes, using Snowflake data in the Comet platform will result in a streamlined and more transparent model development process.

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

Snowflakes Data Cloud and Comets ML platform combined will allow customers to build, train, deploy and monitor models significantly faster, according to the companies.

In addition, this partnership fosters a feedback loop between model development in Comet and data management in Snowflake, Comet CEO Gideon Mendels told VentureBeat.

>>Dont miss our special issue: Building the foundation for customer data quality.<<

Mendels said that integrating such a loop can continuously improve models and bridge the gap between experimenting with models and deploying them, fulfilling the key promise of ML the ability to learn and adapt over time. He said that the clear versioning between datasets and models will enable organizations to better address data changes and their impact on models in production.

Comets new offering follows its recent release of asuite of toolsand integrations designed to accelerate workflows for data scientists working with large language models (LLMs).

When data scientists or developers execute queries to extract datasets from Snowflake for their ML models, Comet will be able to log, version and directly link these queries to the resulting models.

Mendels said this approach offers several advantages, including increased reproducibility, collaboration, auditability and iterative improvement.

The integration between Comet and Snowflake aims to provide a more robust, transparent and efficient framework for ML development by enabling the tracking and versioning of Snowflake queries and datasets within Snowflake itself, he explained. By versioning the SQL queries and datasets, data scientists can always trace back to the exact version of the data that was used to train a specific model version. This is crucial for model reproducibility.

In ML, training data is just as important as the model itself. Alterations in the data, such as introducing new features, addressing missing values, or modifying data distributions, can profoundly affect a models performance.

Comet says that by tracing a models lineage, it becomes possible to establish a connection between changes in model performance and specific alterations in the data. This not only aids in debugging and comprehending performance, it guides data quality and feature engineering.

Mendels said that tracking queries and data over time can create a feedback loop that drives continuous improvements in both the data management and the model development stages.

Model lineage can facilitate collaboration among a team of data scientists, as it allows anyone to understand a models history and how it was developed without the need for extensive documentation, said Mendels. This is particularly useful when team members leave or when new members join the team, allowing for seamless knowledge transfer.

The company claims that customers currently using Comet such as Uber, Etsy and Shopify typically report a 70% to 80% improvement in their ML velocity.

This is due to faster research cycles, the ability to understand model performance and detect issues faster, better collaboration and more, said Mendels. With the joint solution, this should increase even more as today there are still challenges in bridging the two systems. Customers save on ingress and consumption costs by keeping the data within Snowflake instead of transferring it over the wire and saving it in other locations.

Mendels said that Comet aims to establish itself as the de facto AI development platform.

Our view is that businesses will only see real value from AI after they deploy these models based on their own data, he said. Whether they are training from scratch, fine-tuning an OSS model or using context injection to ChatGPT, Comets mandate is to make this process seamless and bridge the gap between research and production.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Excerpt from:
Comet partners with Snowflake to enhance the reproducibility of machine learning datasets - VentureBeat

Related Posts

Comments are closed.