Kubeflow Fundamentals: An Introduction
source link: https://dzone.com/articles/kubeflow-fundamentals-an-introduction
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Welcome to the first in a series of blog posts where we’ll walk you through a detailed introduction to Kubeflow. In this series, we’ll explore what Kubeflow is, how it works, and how to make it work for you. In this first blog, we’ll tackle the fundamentals, and use them as a foundation to introduce more advanced topics. Ok, let’s dive right in!
What is Kubeflow?
Kubeflow as a project got its start over at Google. The idea was to create a simpler way to run TensorFlow jobs on Kubernetes. So, Kubeflow was created as a way to run TensorFlow, based on a pipeline called TensorFlow Extended and then ultimately extended to support multiple architectures and multiple clouds so that it could be used as a framework to run entire machine learning pipelines.
The Kubeflow open source project was formally announced by David Aronchick and Jeremy Lewi at the end of 2017 in the Kubernetes blog post, “Introducing Kubeflow – A Composable, Portable, Scalable ML Stack Built for Kubernetes.”
In a nutshell, Kubeflow is the machine learning toolkit for Kubernetes.
Why Kubeflow?
At the time of Kubeflow’s announcement, there were two big IT trends that were beginning to pick up steam – the mainstreaming of cloud-native architectures, plus the widespread investment in data science and machine learning.
As a result, Kubeflow was perfectly positioned at the convergence of these two trends. It was cloud-native by design and was specifically designed for machine learning use cases. Since 2017, it should be readily apparent to even the most casual observer of IT trends, that Kubernetes and machine learning have only increased in popularity and have shown to be an obvious technological pairing.
What Challenges Does Kubeflow Aim to Solve?
The charter of the Kubeflow project continues to be, “To make deployments of machine learning workflows on Kubernetes simple, portable and scalable, by providing a straightforward way to deploy best-of-breed open-source systems for machine learning to diverse infrastructures.” With the added benefit that wherever you can run Kubernetes, you can run Kubeflow!
Every organization that is actively deploying machine learning workloads (or attempting to!), knows that there are a lot of problems that need to be solved along the way. Kubeflow aims to be the technology that can solve these problems for both data scientists and operations teams. Challenges like:
- Data loading
- Verification
- Splitting
- Processing
- Feature engineering
- Model training
- Model verification
- Hyperparameter tuning
- Model serving
- Security and compliance
- Data management
- Reproducibility
- Observation and monitoring
You can learn more about why these challenges can be difficult for some organizations to overcome by reading this great blog post: Why 90% of machine learning models never hit the market. Spoiler alert, it isn’t always the software’s fault!
Getting Familiar With Kubeflow Components
There are seven core components that makeup Kubeflow. Let’s do a quick overview of each one and the role it plays. (Don’t worry, in upcoming posts we’ll dive into each one of these components!)
Central Dashboard
The central user interface (UI) in Kubeflow. Within the dashboard, you can access a variety of components including Pipelines, Notebooks, Katib, Artifact Store, and manage contributors.
Notebook Servers
Jupyter notebooks work well in Kubeflow because they can easily integrate with the typical authentication and access control mechanisms you may find in an enterprise. With security sorted out, users can then confidently create notebook pods/servers directly in the Kubeflow cluster using images provided by the admins, and easily submit single node or distributed training jobs, vs having to get everything configured on their laptop.
Kubeflow Pipelines
Kubeflow Pipelines is used for building and deploying portable, scalable machine learning workflows based on Docker containers. It consists of a UI for managing training experiments, jobs, and runs, plus an engine for scheduling multi-step ML workflows. There are also two SDKs, one that allows you to define and manipulate pipelines, while the other offers an alternative way for Notebooks to interact with the system.
KFServing
KFServing provides a Kubernetes Custom Resource Definition for serving machine learning models on a variety of frameworks including TensorFlow, XGBoost, sci-kit-learn, PyTorch, and ONNX. Aside from providing a CRD, it also helps encapsulate many of the complex challenges that come with autoscaling, networking, health checking, and server configuration.
We should note that the KFserving component is currently in Beta.
Katib
Katib (which means “secretary” in Arabic) provides automated machine learning (AutoML) in Kubeflow. Like Kfserving, Katib is agnostic to machine learning frameworks. It can perform hyperparameter tuning, early stopping, and neural architecture search written in a variety of languages.
Also, similar to KFServing, Katib is also currently in Beta.
Training Operators
In Kubeflow you train machine learning models with operators. There are currently five operators that are supported. They include:
- TensorFlow training via tf-operator
- PyTorch training via PyTorch-operator
- MPI training via mpi-operator
- MXNet training via mxnet-operator
Multi-Tenancy
In a typical machine learning production environment, the same pool of (expensive) resources will need to be shared across different teams and individual users. As such, administrators will need a mechanism for isolating users and their resources so they don’t view or change the resource allocations of others. Fortunately, with the latest Kubeflow v1.3 release there is now support for multi-user isolation so users “only see what they should see” and cannot modify the resources of other users.
Kubeflow Interfaces
In Kubeflow there are a variety of interfaces that you can interact with. The first is the UI (which we already covered), the balance is an assortment of APIs and SDKs that you interact with programmatically. They include:
- Kubeflow Metadata API and SDK
- PyTorchJob Custom Resource Definition
- TFJob Custom Resource Definition
- Kubeflow Pipelines API and SDK
- A Kubeflow Pipelines domain-specific language (DSL)
- Kubeflow Fairing SDK
Kubeflow as a Machine Learning Workflow
Stay tuned for the next blog in this series where we’ll explore what a typical machine learning workflow looks like and how specific Kubeflow components fit into the workflow.
We’ll also cover what choices are available in regards to distributions and installation options.
Recommend
-
204
kubeflow/kubeflow: Machine Learning Toolkit for Kubernetes Skip to content...
-
77
云栖君导读:本系列将介绍如何在阿里云容器服务上运行Kubeflow, 本文介绍如何使用TfJob运行分布式模型训练。 第一篇:阿里云上使用JupyterHub...
-
61
TensorFlow作为现在最为流行的深度学习代码库,在数据科学家中间非常流行,特别是可以明显加速训练效率的分布式训练更是杀手级的特性。但是如何真正部署和运行大规模的分布式模型训练,却成了新的挑战。
-
46
-
2
A visual introduction to Azure Fundamentals Nitya Narasimhan Feb 10, 2021 17 Minute Read
-
2
Kubeflow Fundamentals Part 3: Distributions and Installations ...
-
5
Kubeflow Fundamentals Part 4: External Add-ons This serie...
-
5
Kubeflow Fundamentals Part 5: Getting Started With Notebooks ...
-
2
Kubeflow Fundamentals Part 6: Working With Jupyter Lab Notebooks ...
-
2
This article was published as a part of the Data Science Blogathon. Introduction Have you ever wondered how Instagram recommends similar kinds of reels w...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK