Questions tagged [mlops]

23 questions
9
votes
2 answers

MLOps for beginner

I am 1 year old in ML and have been using jupyter notebook to build static models all these days, do some analysis and present my results to the bosses as it was all POC. Now, we would like to scale the solution to become automatic and be able to…
8
votes
1 answer

MLflow real world experience

Can someone provide a summary of the real world deployment experience of MLflow? We have a few ML models (e.g., LightGBM, tensorflow v2, etc.) and want to avoid framework like SageMaker (due to customer requirement). So we are looking into various…
David293836
  • 197
  • 1
  • 6
8
votes
1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
4
votes
2 answers

Meaningfully compare target vs observed TPR & FPR

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$…
2
votes
1 answer

Data preprocessing framework/library alternatives

I am currently working on some python machine learning projects that are soon to be deployed to production. As such, in our team we are interested in doing this the most "correct" way, following MLOps principles. Specifically, I am currently…
1
vote
1 answer

Training a CNN in production on new data

How should I approach training a convolutional neural network in production on new data when I detect model performance degradation due to data or concept drift? Resources like this one and this one lead me to conclude that I need to fine tune the…
1
vote
1 answer

What is the difference between Covariate Shift, Label Shift, Concept Shift, Concept Drift, and Prior Probability Shift?

As a beginner in MLOps, I was overwhelmed by some confusing definitions. As far as I understand, when we have a classifier or regressor with y = f(X) function: Covariate Shift is changing the distribution of independent variables (X), Label Shift…
1
vote
1 answer

Sustain learning separately - continuous learning

This question is to seek suggestions on how to architect the continuous learning approach in distributed manner. Let me explain the situation: In my classification problem, I have classes which can grow in large number over a period of time, as…
Sandeep Bhutani
  • 884
  • 1
  • 7
  • 22
1
vote
0 answers

How is model scheduling set up in practice?

I have been working on various machine learning models so far, but never yet on the deployment phase of an ML project. I have vaguely used Apache Airflow and I'm aware that it is a tool for scheduling DAGs, but I never set up such a scheduling on…
lazarea
  • 289
  • 1
  • 11
1
vote
2 answers

Automate Clustering predictions and RFM metrics

We did a POC for customer segmentation and followed the below approach a) extract data from source system (SAP business objects) b) Use python jupyter notebook to manipulate, merge and group data (multiple csv files) c) We cluster based on some…
The Great
  • 2,525
  • 16
  • 40
0
votes
0 answers

How do I verify and test a machine learning model against reality during time?

As a software engineers we familiar with a concept of testing (unit, integration, e2e) Tests give us a level of confidence about the code and changes in our code. Looks like for ML the "code" is the data that was used for the model. And…
0
votes
0 answers

Is my idea of a Feature Store wrong?

Cross-posted on Reddit ML. Should a Feature Store be part of an enterprise data catalog? To me, a feature store seems to be a highly niche data catalog but missing a lot of the benefits of having an enterprise data catalog / data discovery tool. My…
0
votes
1 answer

What features used by CNN model should a feature store actually store?

According to MLOPs principle, it is recommended to have a feature store. The question is in the context of doing image classification using deep learning models like convolutional neural networks which does automatic feature engineering(using…
0
votes
0 answers

What are best practices for MLOps?

What are the best practices or design patterns for structuring data science projects and MLOps architecture in small teams? 1. Context and Background: I work in a small data science team (<5). We exclusively develop predictive analytics solutions.…
bayes2021
  • 218
  • 1
  • 4
0
votes
0 answers

Azure ML or Azure Data Factory for sampling/data asset pipeline

We are currently storing images, and text in AWS s3, and small percentage of data comes with annotations. Every week we should remove most of the annotated data and keep only data that are relevant for further training the models which we are…
carak
  • 1
1
2