MLOps课程: Introduction to Machine Learning in Production 第一周

机器学习项目的整个流程

一个误区:机器学习项目在部署的时候,不止有软件工程问题,还有更多的挑战。

MLOps的整个过程

  1. 设计机器学习解决方案
  2. 机器学习实验和开发
  3. 机器学习运维

核心点: 世界在变,你的模型也应该随着变化

部署的例子: 工业化的缺陷检测

可能遇到的挑战

  1. 数据漂移 预测数据不清晰,与训练数据集有偏差

从模型训练完成到部署然后产生价值,可能还需要6个月的时间。

2.生产中的机器学习系统

模型代码只占5%-10%

论文: 机器学习系统的组件

机器学习项目生命周期

例子: 语音识别

  1. Scoping:确定语音识别的工作范围。 确定关键的指标,包括准确率,延迟和吞吐量,QPS等 还有花费的时间,计算资源,项目时间规划等
  2. Data: 定义数据,包括标注数据的一致性,怎么做归一化。
  3. Modeling: 建模:算法,超参数和数据。 主要是为了获取高性能的模型,需要固定代码,修改超参数和数据。 ml system = code + data 主要是改善数据
  4. Deployment: 部署: 边缘设备程序,预测服务器,监控

内容漂移和数据漂移

数据变化很缓慢 数据变化很快

软件问题

机器学习系统部署模式

shadow mode: 影子部署模式

Canary deployment: 金丝雀部署

blue green depoyment: 蓝绿部署

自动化的等级

监控机器学习系统

机器学习流水线的监控

第一周作业

Week 1: Overview of the ML Lifecycle and Deployment

If you wish to dive more deeply into the topics covered this week, feel free to check out these optional references. You won’t have to read these to complete this week’s practice quizzes.

Concept and Data Drift

Monitoring ML Models

A Chat with Andrew on MLOps: From Model-centric to Data-centric

Papers

Konstantinos, Katsiapis, Karmarkar, A., Altay, A., Zaks, A., Polyzotis, N., … Li, Z. (2020). Towards ML Engineering: A brief history of TensorFlow Extended (TFX). http://arxiv.org/abs/2010.02013

Paleyes, A., Urma, R.-G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: A survey of case studies. http://arxiv.org/abs/2011.09926

Sculley, D., Holt, G., Golovin, D., Davydov, E., & Phillips, T. (n.d.). Hidden technical debt in machine learning systems. Retrieved April 28, 2021, from Nips.c https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

作业

https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/tree/main/course1/week1-ungraded-lab

MLOps 专项课程笔记

目录

  1. Introduction to Machine Learning in Production
  2. Machine Learning Data Lifecycle in Production
  3. Machine Learning Modeling Pipelines in Production
  4. Deploying Machine Learning Models in Production

1.Introduction to Machine Learning in Production

  • Week 1: Overview of the ML Lifecycle and Deployment
  • Week 2: Select and Train a Model
  • Week 3: Data Definition and Baseline

2.Machine Learning Data Lifecycle in Production

  • Week 1: Collecting, Labeling and Validating Data
  • Week 2: Feature Engineering, Transformation and Selection
  • Week 3: Data Journey and Data Storage
  • Week 4 (Optional): Advanced Labeling, Augmentation and Data Preprocessing

3.Machine Learning Modeling Pipelines in Production

  • Week 1: Neural Architecture Search
  • Week 2: Model Resource Management Techniques
  • Week 3: High-Performance Modeling
  • Week 4: Model Analysis

4.Deploying Machine Learning Models in Production

  • Week 1: Model Serving: Introduction
  • Week 2: Model Serving: Patterns and Infrastructure
  • Week 3: Model Management and Delivery
  • Week 4: Model Monitoring and Logging