Kubeflow is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
platform for
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
MLOps on
Kubernetes
Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
introduced by
Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
. The different stages in a typical
machine learning lifecycle are represented with different
software components
Component-based software engineering (CBSE), also called component-based development (CBD), is a branch of software engineering that emphasizes the separation of concerns with respect to the wide-ranging functionality available throughout a give ...
in Kubeflow, including model development ''(
Kubeflow Notebooks)'', model training ''(
Kubeflow Pipelines'',''
Kubeflow Training Operator)'', model serving ''(
KServe)'', and
automated machine learning
Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready ...
''(
Katib
A katib ( ar, كَاتِب, ''kātib'') is a writer, scribe, or secretary in the Arabic-speaking world, Persian World, and other Islamic areas as far as India. In North Africa, the local pronunciation of the term also causes it to be written ketib ...
)''.
Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component.
History
The Kubeflow project was first announced at ''
KubeCon + CloudNativeCon North America 2017'' by
Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan to address a perceived lack of flexible options for building production-ready machine learning systems. The project has also stated it began as a way for Google to open-source how they ran
TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learning ...
internally.
The first release of Kubeflow (Kubeflow 0.1) was announced at ''KubeCon + CloudNativeCon Europe 2018'' with claims of having already become among the top 2% of
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
projects ever. Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage.
Components
''Kubeflow Notebooks'' for model development
Machine learning models are developed in the notebooks component called ''Kubeflow Notebooks''. The component runs web-based development environments inside a Kubernetes cluster, with native support for
Jupyter Notebook
Project Jupyter () is a project with goals to develop open-source software, open standards, and services for interactive computing across multiple programming languages. It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger ...
,
Visual Studio Code
Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code complet ...
, and
RStudio
RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server ...
.
''Kubeflow Pipelines'' for model training
Once developed, models are trained in the ''Kubeflow Pipelines'' component. The component acts as a platform for building and deploying
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide a ...
, scalable machine learning workflows based on Docker containers.
Google Cloud Platform
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside ...
has adopted the ''Kubeflow Pipelines DSL'' within its ''Vertex AI Pipelines'' product.
''Kubeflow Training Operator'' for model training
For certain machine learning models and libraries, the ''Kubeflow Training Operator'' component provides
Kubernetes custom resources support. The component runs distributed or non-distributed
TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learning ...
,
PyTorch
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
,
Apache MXNet,
XGBoost, and
MPI training jobs on Kubernetes.
''KServe'' for model serving
The ''KServe'' component (previously named KFServing
) provides
Kubernetes custom resources for serving machine learning models on arbitrary frameworks including
TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learning ...
,
XGBoost,
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support-vector m ...
,
PyTorch
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
, and
ONNX
The Open Neural Network Exchange (ONNX) [] is an Open-source software, open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and ...
. KServe was developed collaboratively by
Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
,
IBM,
Bloomberg Bloomberg may refer to:
People
* Daniel J. Bloomberg (1905–1984), audio engineer
* Georgina Bloomberg (born 1983), professional equestrian
* Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ma ...
,
NVIDIA
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
, and
Seldon.
Publicly disclosed adopters of KServe include
Bloomberg Bloomberg may refer to:
People
* Daniel J. Bloomberg (1905–1984), audio engineer
* Georgina Bloomberg (born 1983), professional equestrian
* Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ma ...
,
Gojek
PT Gojek Indonesia (stylized in all lower case and stylized ''j'' as goȷek, formerly styled as GO-JEK) is an Indonesian on-demand multi-service platform and digital payment technology group based in Jakarta. Gojek was first established in Indo ...
, and others.
''Katib'' for automated machine learning
Lastly, Kubeflow includes a component for
automated training and development of machine learning models, the ''Katib'' component. It is described as a Kubernetes-native project and features
hyperparameter tuning,
early stopping In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with ...
, and
neural architecture search
Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed a ...
.
Release timeline
Notes
References
External links
*
*
{{Google FOSS
2018 software
Cloud infrastructure
Data mining and machine learning software
Software using the Apache license
Free software programmed in Python
Free software programmed in Go