reproducing_property_together_show_that_applying_
_to_any_training_point_
_produces
:
which_we_observe_is_independent_of_
.__Consequently,_the_value_of_the_error_function_
_in_(*)_is_likewise_independent_of_
.__For_the_second_term_(the_regularization_term),_since_
_is_orthogonal_to_
_and_
_is_strictly_monotonic,_we_have
:
Therefore_setting_
_does_not_affect_the_first_term_of_(*),_while_it_strictly_decreases_the_second_term.__Consequently,_any_minimizer_
_in_(*)_must_have_
,_i.e.,_it_must_be_of_the_form
:
which_is_the_desired_result.
_Generalizations
The_Theorem_stated_above_is_a_particular_example_of_a_family_of_results_that_are_collectively_referred_to_as_"representer_theorems";_here_we_describe_several_such.
The_first_statement_of_a_representer_theorem_was_due_to_Kimeldorf_and_Wahba_for_the_special_case_in_which
:
for_
.__Schölkopf,_Herbrich,_and_Smola_generalized_this_result_by_relaxing_the_assumption_of_the_squared-loss_cost_and_allowing_the_regularizer_to_be_any_strictly_monotonically_increasing_function_
_of_the_Hilbert_space_norm.
It_is_possible_to_generalize_further_by_augmenting_the_regularized_empirical_risk_functional_through_the_addition_of_unpenalized_offset_terms.__For_example,_Schölkopf,_Herbrich,_and_Smola_also_consider_the_minimization
:
i.e.,_we_consider_functions_of_the_form_
,_where_
_and_
_is_an_unpenalized_function_lying_in_the_span_of_a_finite_set_of_real-valued_functions_
.__Under_the_assumption_that_the_
_matrix_
_has_rank_
,_they_show_that_the_minimizer_
_in_
admits_a_representation_of_the_form
:
where_
_and_the_
_are_all_uniquely_determined.
The_conditions_under_which_a_representer_theorem_exists_were_investigated_by_Argyriou,_Micchelli,_and_Pontil,_who_proved_the_following:
Theorem:_Let_
_be_a_nonempty_set,_
_a_positive-definite_real-valued_kernel_on_
_with_corresponding_reproducing_kernel_Hilbert_space_
,_and_let_
_be_a_differentiable_regularization_function.__Then_given_a_training_sample_
_and_an_arbitrary_error_function_
,_a_minimizer
:
of_the_regularized_empirical_risk_admits_a_representation_of_the_form
:
where_
_for_all_
,_if_and_only_if_there_exists_a_nondecreasing_function_
_for_which
:
Effectively,_this_result_provides_a_necessary_and_sufficient_condition_on_a_differentiable_regularizer_
_under_which_the_corresponding_regularized_empirical_risk_minimization_
_will_have_a_representer_theorem.__In_particular,_this_shows_that_a_broad_class_of_regularized_risk_minimizations_(much_broader_than_those_originally_considered_by_Kimeldorf_and_Wahba)_have_representer_theorems.
_Applications
Representer_theorems_are_useful_from_a_practical_standpoint_because_they_dramatically_simplify_the_regularized_Empirical_risk_minimization.html" "title="Reproducing kernel Hilbert space#The Reproducing Property">reproducing property together show that applying
to any training point
produces
:
which we observe is independent of
. Consequently, the value of the error function
in (*) is likewise independent of
. For the second term (the regularization term), since
is orthogonal to
and
is strictly monotonic, we have
:
Therefore setting
does not affect the first term of (*), while it strictly decreases the second term. Consequently, any minimizer
in (*) must have
, i.e., it must be of the form
:
which is the desired result.
Generalizations
The Theorem stated above is a particular example of a family of results that are collectively referred to as "representer theorems"; here we describe several such.
The first statement of a representer theorem was due to Kimeldorf and Wahba for the special case in which
:
for
. Schölkopf, Herbrich, and Smola generalized this result by relaxing the assumption of the squared-loss cost and allowing the regularizer to be any strictly monotonically increasing function
of the Hilbert space norm.
It is possible to generalize further by augmenting the regularized empirical risk functional through the addition of unpenalized offset terms. For example, Schölkopf, Herbrich, and Smola also consider the minimization
:
i.e., we consider functions of the form
, where
and
is an unpenalized function lying in the span of a finite set of real-valued functions
. Under the assumption that the
matrix
has rank
, they show that the minimizer
in
admits a representation of the form
:
where
and the
are all uniquely determined.
The conditions under which a representer theorem exists were investigated by Argyriou, Micchelli, and Pontil, who proved the following:
Theorem: Let
be a nonempty set,
a positive-definite real-valued kernel on
with corresponding reproducing kernel Hilbert space
, and let
be a differentiable regularization function. Then given a training sample
and an arbitrary error function
, a minimizer
:
of the regularized empirical risk admits a representation of the form
:
where
for all
, if and only if there exists a nondecreasing function
for which
:
Effectively, this result provides a necessary and sufficient condition on a differentiable regularizer
under which the corresponding regularized empirical risk minimization
will have a representer theorem. In particular, this shows that a broad class of regularized risk minimizations (much broader than those originally considered by Kimeldorf and Wahba) have representer theorems.
Applications
Representer theorems are useful from a practical standpoint because they dramatically simplify the regularized Empirical risk minimization">empirical risk minimization
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance. The core idea is that we cannot know exactly how well an alg ...
problem