Mixture of experts (MoE) refers to a
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from
ensemble techniques in that typically only a few, or 1, expert model will be run, rather than combining results from all models.
An example from
computer vision is combining one
neural network model for human detection with another for
pose estimation
3D pose estimation is a process of predicting the transformation of an object from a user-defined reference pose, given an image or a 3D scan. It arises in computer vision or robotics where the pose or transformation of an object can be used for ...
.
Hierarchical mixture
If the output is conditioned on multiple levels of (probabilistic) gating functions, the mixture is called a hierarchical mixture of experts.
A gating network decides which expert to use for each input region. Learning thus consists of learning the parameters of:
* individual learners and
* gating network.
Applications
Meta uses MoE in its NLLB-200 system. It uses multiple MoE models that share capacity for use by low-resource
language models
A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
with relatively little data.
References
Extra reading
*{{cite journal, last1=Masoudnia, first1=Saeed, last2=Ebrahimpour, first2=Reza, title=Mixture of experts: a literature survey, journal=Artificial Intelligence Review, date=12 May 2012, volume=42, issue=2, pages=275–293, doi=10.1007/s10462-012-9338-y, s2cid=3185688
Machine learning algorithms