Organizing a Machine Learning Monorepo with Pants | by Michał Oleszak | Aug, 2023

MLOps

Streamline your ML workflow management

Michał Oleszak
Towards Data Science

Have you ever copy-pasted chunks of utility between projects, resulting in multiple versions of the same code living in different repositories? Or, perhaps, you had to make pull requests to tens of projects after the name of the GCP bucket in which you store your was updated?

Situations described above arise way too often in ML teams, and their consequences vary from a single developer’s annoyance to the team’s inability to ship their code as needed. Luckily, there’s a remedy.

Let’s dive into the world of monorepos, an architecture widely adopted in major companies like Google, and how they can enhance your ML . A monorepo offers a plethora of advantages which, despite some drawbacks, make it a compelling choice for managing complex machine learning ecosystems.

We will briefly debate monorepos’ merits and demerits, examine why it’s an excellent architecture choice for machine learning teams, and peek into how Big Tech is using it. Finally, we’ll see how to harness the power of the Pants build to organize your machine learning monorepo into a robust CI/CD build system.

Strap in as we embark on this journey to streamline your ML .

This article was first published on the neptune.ai .

Machine Learning Monorepo. Image by the author, via neptune.ai.

A monorepo (short for monolithic ) is a software strategy where code for many projects is stored in the same repository. The idea can be as broad as all of the company code written in a variety of languages stored together (did somebody say Google?) or as narrow as a couple of Python projects developed by a small team thrown into a single repository.

In this blog post, we focus on repositories storing machine learning code.

Source link