Practicing Data-Centric AI with Cleanlab

The reliability of ML/AI models critically depends on the quality of the underlying data. In order to produce the famed AI models we all admire, companies like OpenAI, Google, Tesla spent vast sums of money to curate datasets to an extremely high degree. Your teams may not be able to afford such investment, however, and thus you’ll need to rely on automated data curation.

Cleanlab is a leading provider of automated data curation software based on technology pioneered at MIT that is now used at over 10% of the Fortune 500. The cleanlab open-source Python library is the most popular library for Data-Centric AI, and is used by thousands of data scientists across all industries/applications. This library helps you automatically find & fix a variety of dataset issues relevant to ML modeling (label errors, outliers, near duplicates, data drift, and others).

In this conversation with Jonas Mueller, Cleanlab Co-Founder and Chief Scientist, we delve into the meaning of the term ‘data-centric AI.’ While it is often perceived as a rebranding of ‘machine learning, the Cleanlab team views data-centric AI as distinct from and complementary to machine learning: in machine learning, you optimize your model over a static dataset, but in data-centric AI you also iterate on the dataset to ensure your model is of the highest possible quality. Often, AI tools are used to execute data-centric AI, meaning that mastering all three of these interrelated concepts is vital as datasets grow in size and diversity, and AI grows in power.

Jonas discusses how to integrate all three concepts in a virtuous cycle that can be applied across all data modalities and ML tasks, and he presents practical techniques/tools you can use to operationalize data-centric AI, many of which are based on cutting-edge research by Cleanlab scientists.

Jonas @ LinkedIn: https://www.linkedin.com/in/jonasmueller/
Jonas @ Twitter/X: https://twitter.com/jomulr

Cleanlab: https://cleanlab.ai
Cleanlab @ LinkedIn: https://www.linkedin.com/company/cleanlab/
Cleanlab @ Twitter/X: https://twitter.com/CleanlabAI

cleanlab @ GitHub: https://github.com/cleanlab/cleanlab

February 7, 2024