Semivariance
   HOME

TheInfoList



OR:

In
spatial statistics Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early devel ...
the theoretical variogram 2\gamma(\mathbf_1,\mathbf_2) is a function describing the degree of spatial dependence of a spatial
random field In physics and mathematics, a random field is a random function over an arbitrary domain (usually a multi-dimensional space such as \mathbb^n). That is, it is a function f(x) that takes on a random value at each point x \in \mathbb^n(or some other ...
or
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that ap ...
Z(\mathbf). The semivariogram \gamma(\mathbf_1,\mathbf_2) is half the variogram. In the case of a concrete example from the field of
gold mining Gold mining is the extraction of gold resources by mining. Historically, mining gold from alluvial deposits used manual separation processes, such as gold panning. However, with the expansion of gold mining to ores that are not on the surface, ...
, a variogram will give a measure of how much two samples taken from the mining area will vary in gold percentage depending on the distance between those samples. Samples taken far apart will vary more than samples taken close to each other.


Definition

The semivariogram \gamma(h) was first defined by Matheron (1963) as half the average squared difference between the values at points (\mathbf_1 and \mathbf_2) separated at distance h. Formally :\gamma(h)=\frac\iiint_V \left (M+h) - f(M) \right2dV, where M is a point in the geometric field V, and f(M) is the value at that point. The triple integral is over 3 dimensions. h is the separation distance (e.g., in meters or km) of interest. For example, the value f(M) could represent the iron content in soil, at some location M (with
geographic coordinates The geographic coordinate system (GCS) is a spherical or ellipsoidal coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the various ...
of latitude, longitude, and elevation) over some region V with element of volume dV. To obtain the semivariogram for a given \gamma(h), all pairs of points at that exact distance would be sampled. In practice it is impossible to sample everywhere, so the empirical variogram is used instead. The variogram is twice the semivariogram and can be defined, equivalently, as the
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
of the difference between field values at two locations (\mathbf_1 and \mathbf_2, note change of notation from M to \mathbf and f to Z) across realizations of the field (Cressie 1993): :2\gamma(\mathbf_1,\mathbf_2)=\text\left(Z(\mathbf_1) - Z(\mathbf_2)\right) = E\left (Z(\mathbf_1)-\mu(\mathbf_1))-(Z(\mathbf_2) - \mu(\mathbf_2)))^2\right If the spatial random field has constant mean \mu, this is equivalent to the expectation for the squared increment of the values between locations \mathbf_1 and s_2 (Wackernagel 2003) (where \mathbf_1 and \mathbf_2 are points in space and possibly time): :2\gamma(\mathbf_1,\mathbf_2)=E\left left(Z(\mathbf_1)-Z(\mathbf_2)\right)^2\right. In the case of a
stationary process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Con ...
, the variogram and semivariogram can be represented as a function \gamma_s(h)=\gamma(0,0+h) of the difference h=\mathbf_2-\mathbf_1 between locations only, by the following relation (Cressie 1993): :\gamma(\mathbf_1,\mathbf_2)=\gamma_s(\mathbf_2-\mathbf_1). If the process is furthermore
isotropic Isotropy is uniformity in all orientations; it is derived . Precise definitions depend on the subject area. Exceptions, or inequalities, are frequently indicated by the prefix ' or ', hence '' anisotropy''. ''Anisotropy'' is also used to describ ...
, then the variogram and semivariogram can be represented by a function \gamma_i(h):=\gamma_s(h e_1) of the distance h=\, \mathbf_2-\mathbf_1\, only (Cressie 1993): :\gamma(\mathbf_1,\mathbf_2)=\gamma_i(h). The indexes i or s are typically not written. The terms are used for all three forms of the function. Moreover, the term "variogram" is sometimes used to denote the semivariogram, and the symbol \gamma is sometimes used for the variogram, which brings some confusion.


Properties

According to (Cressie 1993, Chiles and Delfiner 1999, Wackernagel 2003) the theoretical variogram has the following properties: * The semivariogram is nonnegative \gamma(\mathbf_1,\mathbf_2)\geq 0, since it is the expectation of a square. * The semivariogram \gamma(\mathbf_1,\mathbf_1)=\gamma_i(0)=E\left((Z(\mathbf_1)-Z(\mathbf_1))^2\right)=0 at distance 0 is always 0, since Z(\mathbf_1)-Z(\mathbf_1)=0. * A function is a semivariogram if and only if it is a conditionally negative definite function, i.e. for all weights w_1,\ldots,w_N subject to \sum_^N w_i=0 and locations s_1,\ldots,s_N it holds: ::\sum_^N\sum_^N w_\gamma(\mathbf_i,\mathbf_j)w_j \leq 0 : which corresponds to the fact that the variance var(X) of X=\sum_^N w_i Z(x_i) is given by the negative of this double sum and must be nonnegative. * If the
covariance function In probability theory and statistics, the covariance function describes how much two random variables change together (their ''covariance'') with varying spatial or temporal separation. For a random field or stochastic process ''Z''(''x'') on a doma ...
of a stationary process exists it is related to variogram by
2\gamma(\mathbf_1,\mathbf_2)=C(\mathbf_1,\mathbf_1)+C(\mathbf_2,\mathbf_2)-2C(\mathbf_1,\mathbf_2)
* If a stationary random field has no spatial dependence (i.e. C(h)=0 if h\not= 0), the semivariogram is the constant var(Z(\mathbf)) everywhere except at the origin, where it is zero. * \gamma(\mathbf_1,\mathbf_2)=E\left even_function In mathematics, even functions and odd functions are functions which satisfy particular symmetry relations, with respect to taking additive inverses. They are important in many areas of mathematical analysis, especially the theory of power se ...
. *_If_the_random_field_is_ stationary_and_
ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies tha ...
,_the_\lim__\gamma_s(h)_=_var(Z(\mathbf))_corresponds_to_the_variance_of_the_field._The_limit_of_the_semivariogram_is_also_called_its_''sill''. *_As_a_consequence_the_semivariogram_might_be_non_continuous_only_at_the_origin._The_height_of_the_jump_at_the_origin_is_sometimes_referred_to_as_''nugget''_or_nugget_effect._


_Parameters

In_summary,_the_following_parameters_are_often_used_to_describe_variograms: *_''nugget''_n:_The_height_of_the_jump_of_the_semivariogram_at_the_discontinuity_at_the_origin._ *_''sill''_s:_Limit_of_the_variogram_tending_to_infinity_lag_distances.__ *_''range''_r:_The_distance_in_which_the_difference_of_the_variogram_from_the_sill_becomes_negligible._In_models_with_a_fixed_sill,_it_is_the_distance_at_which_this_is_first_reached;_for_models_with_an_asymptotic_sill,_it_is_conventionally_taken_to_be_the_distance_when_the_semivariance_first_reaches_95%_of_the_sill.


_Empirical_variogram

Generally,_an_empirical_variogram_is_needed_for_measured_data,_because_sample_information_Z_is_not_available_for_every_location._The_sample_information_for_example_could_be_concentration_of_iron_in_soil_samples,_or_pixel_intensity_on_a_camera._Each_piece_of_sample_information_has_coordinates_\mathbf=(x,y)_for_a_2D_sample_space_where_x_and_y_are_geographical_coordinates._In_the_case_of_the_iron_in_soil,_the_sample_space_could_be_3_dimensional._If_there_is_temporal_variability_as_well_(e.g.,_phosphorus_content_in_a_lake)_then_\mathbf_could_be_a_4_dimensional_vector_(x,y,z,t)._For_the_case_where_dimensions_have_different_units_(e.g.,_distance_and_time)_then_a_scaling_factor_B_can_be_applied_to_each_to_obtain_a_modified_Euclidean_distance. Sample_observations_are_denoted_Z(\mathbf_i)=z_i._Samples_may_be_taken_at_k_total_different_locations._This_would_provide_as_set_of_samples_z_1,\ldots,z_k_at_locations_\mathbf_1,\ldots,\mathbf_k._Generally,_plots_show_the_semivariogram_values_as_a_function_of_sample_point_separation_h._In_the_case_of_empirical_semivariogram,_separation_distance_bins_h_\pm_\delta_are_used_rather_than_exact_distances,_and_usually_isotropic_conditions_are_assumed_(i.e.,_that_\gamma_is_only_a_function_of_h_and_does_not_depend_on_other_variables_such_as_center_position)._Then,_the_empirical_semivariogram_\hat(h_\pm_\delta)_can_be_calculated_for_each_bin: :\hat(h_\pm_\delta):=\frac\sum__, z_i-z_j, ^2 Or_in_other_words,_each_pair_of_points_separated_by_h_(plus_or_minus_some_bin_width_tolerance_range_\delta)_are_found._These_form_the_set_of_points_N(h_\pm_\delta)_\equiv_\._The_number_of_these_points_in_this_bin_is_, N(h_\pm_\delta), ._Then_for_each_pair_of_points_i,j,_the_square_of_the_difference_in_the_observation_(e.g.,_soil_sample_content_or_pixel_intensity)_is_found_(, z_i-z_j, ^2)._These_squared_differences_are_added_together_and_normalized_by_the_natural_number_, N(h_\pm_\delta), ._By_definition_the_result_is_divided_by_2_for_the_semivariogram_at_this_separation. For_computational_speed,_only_the_unique_pairs_of_points_are_needed._For_example,_for_2_observations_pairs_ math>(z_a,z_b),(z_c,z_d)taken_from_locations_with_separation_h_\pm_\delta_only_ math>(z_a,z_b),(z_c,z_d)need_to_be_considered,_as_the_pairs_ math>(z_b,z_a),(z_d,z_c)do_not_provide_any_additional_information.


_Variogram_models

The_empirical_variogram_cannot_be_computed_at_every_lag_distance_h_and_due_to_variation_in_the_estimation_it_is_not_ensured_that_it_is_a_valid_variogram,_as_defined_above._However_some_
Geostatistical Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petr ...
_methods_such_as_
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
_need_valid_semivariograms._In_applied_geostatistics_the_empirical_variograms_are_thus_often_approximated_by_model_function_ensuring_validity_(Chiles&Delfiner_1999)._Some_important_models_are_(Chiles&Delfiner_1999,_Cressie_1993): *_The_exponential_variogram_model *:_\gamma(h)=(s-n)(1-\exp(-h/(ra)))+n_1_(h). *_The_spherical_variogram_model *:_\gamma(h)=(s-n)\left(\left(\frac-\frac\right)1_(h)+1_(h)\right)+n1_(h). *_The_Gaussian_variogram_model *:_\gamma(h)=(s-n)\left(1-\exp\left(-\frac\right)\right)_+_n1_(h). The_parameter_a_has_different_values_in_different_references,_due_to_the_ambiguity_in_the_definition_of_the_range._E.g._a=1/3_is_the_value_used_in_(Chiles&Delfiner_1999)._The_1_A(h)_function_is_1_if_h\in_A_and_0_otherwise.


_Discussion

Three_functions_are_used_in_
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
_for_describing_the_spatial_or_the_temporal_correlation_of_observations:_these_are_the_
correlogram In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations r_h\, versus h\, (the time lags) is an autocorrelogram. If cross-correlation is plott ...
,_the_
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
_and_the_semivariogram._The_last_is_also_more_simply_called_variogram. The_variogram_is_the_key_function_in_
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
_as_it_will_be_used_to_fit_a_model_of_the_temporal/
spatial_correlation In wireless communication, spatial correlation is the correlation between a signal's spatial direction and the average received signal gain. Theoretically, the performance of wireless communication systems can be improved by having multiple anten ...
_of_the_observed_phenomenon._One_is_thus_making_a_distinction_between_the_''experimental_variogram''_that_is_a_visualisation_of_a_possible_spatial/temporal_correlation_and_the_''variogram_model''_that_is_further_used_to_define_the_weights_of_the_
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
_function._Note_that_the_experimental_variogram_is_an_empirical_estimate_of_the_
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
_of_a_
Gaussian_process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
._As_such,_it_may_not_be_
positive_definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...
_and_hence_not_directly_usable_in_
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
,_without_constraints_or_further_processing._This_explains_why_only_a_limited_number_of_variogram_models_are_used:_most_commonly,_the_linear,_the_spherical,_the_Gaussian_and_the_exponential_models.


_Applications

The_empirical_variogram_is_used_in_
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
_as_a_first_estimate_of_the_variogram_model_needed_for_spatial_interpolation_by_
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
._ *_Empirical_variograms_for_the_spatiotemporal_variability_of_column-averaged_
carbon_dioxide Carbon dioxide ( chemical formula ) is a chemical compound made up of molecules that each have one carbon atom covalently double bonded to two oxygen atoms. It is found in the gas state at room temperature. In the air, carbon dioxide is t ...
_was_used_to_determine_coincidence_criteria_for_satellite_and_ground-based_measurements. *_Empirical_variograms_were_calculated_for_the_density_of_a_heterogeneous_material_(Gilsocarbon). *Empirical_variograms_are_calculated_from_observations_of_
strong_ground_motion In seismology, strong ground motion is the strong earthquake shaking that occurs close to (less than about 50 km from) a causative fault. The strength of the shaking involved in strong ground motion usually overwhelms a seismometer, forci ...
_from_
earthquake An earthquake (also known as a quake, tremor or temblor) is the shaking of the surface of the Earth resulting from a sudden release of energy in the Earth's lithosphere that creates seismic waves. Earthquakes can range in intensity, fr ...
s._These_models_are_used_for_
seismic_risk Seismic risk refers to the risk of damage from earthquake to a building, system, or other entity. Seismic risk has been defined, for most management purposes, as the potential economic, social and environmental consequences of hazardous events th ...
_and_loss_assessments_of_spatially-distributed_infrastructure.


_Related_concepts

The_squared_term_in_the_variogram,_for_instance_(Z(\mathbf_1)_-_Z(\mathbf_2))^2,_can_be_replaced_with_different_powers:_A_''madogram''_is_defined_with_the_
absolute_difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for ...
,_, Z(\mathbf_1)_-_Z(\mathbf_2), ,_and_a_''rodogram''_is_defined_with_the_
square_root In mathematics, a square root of a number is a number such that ; in other words, a number whose '' square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...
_of_the_absolute_difference,_, Z(\mathbf_1)_-_Z(\mathbf_2), ^._
Estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
s_based_on_these_lower_powers_are_said_to_be_more_ resistant_to_
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s._They_can_be_generalized_as_a_"variogram_of_order_''α''", :2\gamma(\mathbf_1,\mathbf_2)=E\left Z(\mathbf_1)-Z(\mathbf_2)\^\alpha\right/math>, in_which_a_variogram_is_of_order_2,_a_madogram_is_a_variogram_of_order_1,_and_a_rodogram_is_a_variogram_of_order_0.5. When_a_variogram_is_used_to_describe_the_correlation_of_different_variables_it_is_called_''cross-variogram''._Cross-variograms_are_used_in_ co-kriging. Should_the_variable_be_binary_or_represent_classes_of_values,_one_is_then_talking_about_''indicator_variograms''._Indicator_variogram_is_used_in_ indicator_kriging.


_References


_Further_reading

*_Cressie,_N.,_1993,_Statistics_for_spatial_data,_Wiley_Interscience. *_Chiles,_J._P.,_P._Delfiner,_1999,_Geostatistics,_Modelling_Spatial_Uncertainty,_Wiley-Interscience. *_Wackernagel,_H.,_2003,_Multivariate_Geostatistics,_Springer. *_Burrough,_P._A._and_McDonnell,_R._A.,_1998,_Principles_of_Geographical_Information_Systems.
Isobel_Clark,_1979,_Practical_Geostatistics,_Applied_Science_Publishers
*_Clark,_I.,_1979,_''Practical_Geostatistics'',_Applied_Science_Publishers. *_David,_M.,_1978,_''Geostatistical_Ore_Reserve_Estimation'',_Elsevier_Publishing. *_Hald,_A.,_1952,_''Statistical_Theory_with_Engineering_Applications'',_John_Wiley_&_Sons,_New_York. *_Journel,_A._G._and_Huijbregts,_Ch._J.,_1978_''Mining_Geostatistics'',_Academic_Press.
Glass,_H.J.,_2003,_Method_for_assessing_quality_of_the_variogram,_The_Journal_of_The_South_African_Institute_of_Mining_and_Metallurgy


_External_links


AI-GEOSTATS:_an_educational_resource_about_geostatistics_and_spatial_statistics


{{Commons_category, Variogram Geostatistics Statistical_deviation_and_dispersion Spatial_processeshtml" ;"title="Z(\mathbf_1)-Z(\mathbf_2), ^2\right]=\gamma(\mathbf_2,\mathbf_1) is a symmetric function. * Consequently, \gamma_s(h)=\gamma_s(-h) is an
even function In mathematics, even functions and odd functions are functions which satisfy particular symmetry relations, with respect to taking additive inverses. They are important in many areas of mathematical analysis, especially the theory of power se ...
. * If the random field is stationary and
ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies tha ...
, the \lim_ \gamma_s(h) = var(Z(\mathbf)) corresponds to the variance of the field. The limit of the semivariogram is also called its ''sill''. * As a consequence the semivariogram might be non continuous only at the origin. The height of the jump at the origin is sometimes referred to as ''nugget'' or nugget effect.


Parameters

In summary, the following parameters are often used to describe variograms: * ''nugget'' n: The height of the jump of the semivariogram at the discontinuity at the origin. * ''sill'' s: Limit of the variogram tending to infinity lag distances. * ''range'' r: The distance in which the difference of the variogram from the sill becomes negligible. In models with a fixed sill, it is the distance at which this is first reached; for models with an asymptotic sill, it is conventionally taken to be the distance when the semivariance first reaches 95% of the sill.


Empirical variogram

Generally, an empirical variogram is needed for measured data, because sample information Z is not available for every location. The sample information for example could be concentration of iron in soil samples, or pixel intensity on a camera. Each piece of sample information has coordinates \mathbf=(x,y) for a 2D sample space where x and y are geographical coordinates. In the case of the iron in soil, the sample space could be 3 dimensional. If there is temporal variability as well (e.g., phosphorus content in a lake) then \mathbf could be a 4 dimensional vector (x,y,z,t). For the case where dimensions have different units (e.g., distance and time) then a scaling factor B can be applied to each to obtain a modified Euclidean distance. Sample observations are denoted Z(\mathbf_i)=z_i. Samples may be taken at k total different locations. This would provide as set of samples z_1,\ldots,z_k at locations \mathbf_1,\ldots,\mathbf_k. Generally, plots show the semivariogram values as a function of sample point separation h. In the case of empirical semivariogram, separation distance bins h \pm \delta are used rather than exact distances, and usually isotropic conditions are assumed (i.e., that \gamma is only a function of h and does not depend on other variables such as center position). Then, the empirical semivariogram \hat(h \pm \delta) can be calculated for each bin: :\hat(h \pm \delta):=\frac\sum_ , z_i-z_j, ^2 Or in other words, each pair of points separated by h (plus or minus some bin width tolerance range \delta) are found. These form the set of points N(h \pm \delta) \equiv \. The number of these points in this bin is , N(h \pm \delta), . Then for each pair of points i,j, the square of the difference in the observation (e.g., soil sample content or pixel intensity) is found (, z_i-z_j, ^2). These squared differences are added together and normalized by the natural number , N(h \pm \delta), . By definition the result is divided by 2 for the semivariogram at this separation. For computational speed, only the unique pairs of points are needed. For example, for 2 observations pairs math>(z_a,z_b),(z_c,z_d)taken from locations with separation h \pm \delta only math>(z_a,z_b),(z_c,z_d)need to be considered, as the pairs math>(z_b,z_a),(z_d,z_c)do not provide any additional information.


Variogram models

The empirical variogram cannot be computed at every lag distance h and due to variation in the estimation it is not ensured that it is a valid variogram, as defined above. However some
Geostatistical Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petr ...
methods such as
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
need valid semivariograms. In applied geostatistics the empirical variograms are thus often approximated by model function ensuring validity (Chiles&Delfiner 1999). Some important models are (Chiles&Delfiner 1999, Cressie 1993): * The exponential variogram model *: \gamma(h)=(s-n)(1-\exp(-h/(ra)))+n 1_(h). * The spherical variogram model *: \gamma(h)=(s-n)\left(\left(\frac-\frac\right)1_(h)+1_(h)\right)+n1_(h). * The Gaussian variogram model *: \gamma(h)=(s-n)\left(1-\exp\left(-\frac\right)\right) + n1_(h). The parameter a has different values in different references, due to the ambiguity in the definition of the range. E.g. a=1/3 is the value used in (Chiles&Delfiner 1999). The 1_A(h) function is 1 if h\in A and 0 otherwise.


Discussion

Three functions are used in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
for describing the spatial or the temporal correlation of observations: these are the
correlogram In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations r_h\, versus h\, (the time lags) is an autocorrelogram. If cross-correlation is plott ...
, the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
and the semivariogram. The last is also more simply called variogram. The variogram is the key function in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
as it will be used to fit a model of the temporal/
spatial correlation In wireless communication, spatial correlation is the correlation between a signal's spatial direction and the average received signal gain. Theoretically, the performance of wireless communication systems can be improved by having multiple anten ...
of the observed phenomenon. One is thus making a distinction between the ''experimental variogram'' that is a visualisation of a possible spatial/temporal correlation and the ''variogram model'' that is further used to define the weights of the
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
function. Note that the experimental variogram is an empirical estimate of the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
of a
Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
. As such, it may not be
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...
and hence not directly usable in
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
, without constraints or further processing. This explains why only a limited number of variogram models are used: most commonly, the linear, the spherical, the Gaussian and the exponential models.


Applications

The empirical variogram is used in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
as a first estimate of the variogram model needed for spatial interpolation by
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
. * Empirical variograms for the spatiotemporal variability of column-averaged
carbon dioxide Carbon dioxide ( chemical formula ) is a chemical compound made up of molecules that each have one carbon atom covalently double bonded to two oxygen atoms. It is found in the gas state at room temperature. In the air, carbon dioxide is t ...
was used to determine coincidence criteria for satellite and ground-based measurements. * Empirical variograms were calculated for the density of a heterogeneous material (Gilsocarbon). *Empirical variograms are calculated from observations of
strong ground motion In seismology, strong ground motion is the strong earthquake shaking that occurs close to (less than about 50 km from) a causative fault. The strength of the shaking involved in strong ground motion usually overwhelms a seismometer, forci ...
from
earthquake An earthquake (also known as a quake, tremor or temblor) is the shaking of the surface of the Earth resulting from a sudden release of energy in the Earth's lithosphere that creates seismic waves. Earthquakes can range in intensity, fr ...
s. These models are used for
seismic risk Seismic risk refers to the risk of damage from earthquake to a building, system, or other entity. Seismic risk has been defined, for most management purposes, as the potential economic, social and environmental consequences of hazardous events th ...
and loss assessments of spatially-distributed infrastructure.


Related concepts

The squared term in the variogram, for instance (Z(\mathbf_1) - Z(\mathbf_2))^2, can be replaced with different powers: A ''madogram'' is defined with the
absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for ...
, , Z(\mathbf_1) - Z(\mathbf_2), , and a ''rodogram'' is defined with the
square root In mathematics, a square root of a number is a number such that ; in other words, a number whose '' square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...
of the absolute difference, , Z(\mathbf_1) - Z(\mathbf_2), ^.
Estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
s based on these lower powers are said to be more resistant to
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s. They can be generalized as a "variogram of order ''α''", :2\gamma(\mathbf_1,\mathbf_2)=E\left Z(\mathbf_1)-Z(\mathbf_2)\^\alpha\right/math>, in which a variogram is of order 2, a madogram is a variogram of order 1, and a rodogram is a variogram of order 0.5. When a variogram is used to describe the correlation of different variables it is called ''cross-variogram''. Cross-variograms are used in co-kriging. Should the variable be binary or represent classes of values, one is then talking about ''indicator variograms''. Indicator variogram is used in indicator kriging.


References


Further reading

* Cressie, N., 1993, Statistics for spatial data, Wiley Interscience. * Chiles, J. P., P. Delfiner, 1999, Geostatistics, Modelling Spatial Uncertainty, Wiley-Interscience. * Wackernagel, H., 2003, Multivariate Geostatistics, Springer. * Burrough, P. A. and McDonnell, R. A., 1998, Principles of Geographical Information Systems.
Isobel Clark, 1979, Practical Geostatistics, Applied Science Publishers
* Clark, I., 1979, ''Practical Geostatistics'', Applied Science Publishers. * David, M., 1978, ''Geostatistical Ore Reserve Estimation'', Elsevier Publishing. * Hald, A., 1952, ''Statistical Theory with Engineering Applications'', John Wiley & Sons, New York. * Journel, A. G. and Huijbregts, Ch. J., 1978 ''Mining Geostatistics'', Academic Press.
Glass, H.J., 2003, Method for assessing quality of the variogram, The Journal of The South African Institute of Mining and Metallurgy


External links


AI-GEOSTATS: an educational resource about geostatistics and spatial statistics


{{Commons category, Variogram Geostatistics Statistical deviation and dispersion Spatial processes>Z(\mathbf_1)-Z(\mathbf_2), ^2\right\gamma(\mathbf_2,\mathbf_1) is a symmetric function. * Consequently, \gamma_s(h)=\gamma_s(-h) is an
even function In mathematics, even functions and odd functions are functions which satisfy particular symmetry relations, with respect to taking additive inverses. They are important in many areas of mathematical analysis, especially the theory of power se ...
. * If the random field is stationary and
ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies tha ...
, the \lim_ \gamma_s(h) = var(Z(\mathbf)) corresponds to the variance of the field. The limit of the semivariogram is also called its ''sill''. * As a consequence the semivariogram might be non continuous only at the origin. The height of the jump at the origin is sometimes referred to as ''nugget'' or nugget effect.


Parameters

In summary, the following parameters are often used to describe variograms: * ''nugget'' n: The height of the jump of the semivariogram at the discontinuity at the origin. * ''sill'' s: Limit of the variogram tending to infinity lag distances. * ''range'' r: The distance in which the difference of the variogram from the sill becomes negligible. In models with a fixed sill, it is the distance at which this is first reached; for models with an asymptotic sill, it is conventionally taken to be the distance when the semivariance first reaches 95% of the sill.


Empirical variogram

Generally, an empirical variogram is needed for measured data, because sample information Z is not available for every location. The sample information for example could be concentration of iron in soil samples, or pixel intensity on a camera. Each piece of sample information has coordinates \mathbf=(x,y) for a 2D sample space where x and y are geographical coordinates. In the case of the iron in soil, the sample space could be 3 dimensional. If there is temporal variability as well (e.g., phosphorus content in a lake) then \mathbf could be a 4 dimensional vector (x,y,z,t). For the case where dimensions have different units (e.g., distance and time) then a scaling factor B can be applied to each to obtain a modified Euclidean distance. Sample observations are denoted Z(\mathbf_i)=z_i. Samples may be taken at k total different locations. This would provide as set of samples z_1,\ldots,z_k at locations \mathbf_1,\ldots,\mathbf_k. Generally, plots show the semivariogram values as a function of sample point separation h. In the case of empirical semivariogram, separation distance bins h \pm \delta are used rather than exact distances, and usually isotropic conditions are assumed (i.e., that \gamma is only a function of h and does not depend on other variables such as center position). Then, the empirical semivariogram \hat(h \pm \delta) can be calculated for each bin: :\hat(h \pm \delta):=\frac\sum_ , z_i-z_j, ^2 Or in other words, each pair of points separated by h (plus or minus some bin width tolerance range \delta) are found. These form the set of points N(h \pm \delta) \equiv \. The number of these points in this bin is , N(h \pm \delta), . Then for each pair of points i,j, the square of the difference in the observation (e.g., soil sample content or pixel intensity) is found (, z_i-z_j, ^2). These squared differences are added together and normalized by the natural number , N(h \pm \delta), . By definition the result is divided by 2 for the semivariogram at this separation. For computational speed, only the unique pairs of points are needed. For example, for 2 observations pairs math>(z_a,z_b),(z_c,z_d)taken from locations with separation h \pm \delta only math>(z_a,z_b),(z_c,z_d)need to be considered, as the pairs math>(z_b,z_a),(z_d,z_c)do not provide any additional information.


Variogram models

The empirical variogram cannot be computed at every lag distance h and due to variation in the estimation it is not ensured that it is a valid variogram, as defined above. However some
Geostatistical Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petr ...
methods such as
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
need valid semivariograms. In applied geostatistics the empirical variograms are thus often approximated by model function ensuring validity (Chiles&Delfiner 1999). Some important models are (Chiles&Delfiner 1999, Cressie 1993): * The exponential variogram model *: \gamma(h)=(s-n)(1-\exp(-h/(ra)))+n 1_(h). * The spherical variogram model *: \gamma(h)=(s-n)\left(\left(\frac-\frac\right)1_(h)+1_(h)\right)+n1_(h). * The Gaussian variogram model *: \gamma(h)=(s-n)\left(1-\exp\left(-\frac\right)\right) + n1_(h). The parameter a has different values in different references, due to the ambiguity in the definition of the range. E.g. a=1/3 is the value used in (Chiles&Delfiner 1999). The 1_A(h) function is 1 if h\in A and 0 otherwise.


Discussion

Three functions are used in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
for describing the spatial or the temporal correlation of observations: these are the
correlogram In the analysis of data, a correlogram is a chart of correlation statistics. For example, in time series analysis, a plot of the sample autocorrelations r_h\, versus h\, (the time lags) is an autocorrelogram. If cross-correlation is plott ...
, the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
and the semivariogram. The last is also more simply called variogram. The variogram is the key function in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
as it will be used to fit a model of the temporal/
spatial correlation In wireless communication, spatial correlation is the correlation between a signal's spatial direction and the average received signal gain. Theoretically, the performance of wireless communication systems can be improved by having multiple anten ...
of the observed phenomenon. One is thus making a distinction between the ''experimental variogram'' that is a visualisation of a possible spatial/temporal correlation and the ''variogram model'' that is further used to define the weights of the
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
function. Note that the experimental variogram is an empirical estimate of the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
of a
Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
. As such, it may not be
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...
and hence not directly usable in
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
, without constraints or further processing. This explains why only a limited number of variogram models are used: most commonly, the linear, the spherical, the Gaussian and the exponential models.


Applications

The empirical variogram is used in
geostatistics Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including pe ...
as a first estimate of the variogram model needed for spatial interpolation by
kriging In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging giv ...
. * Empirical variograms for the spatiotemporal variability of column-averaged
carbon dioxide Carbon dioxide ( chemical formula ) is a chemical compound made up of molecules that each have one carbon atom covalently double bonded to two oxygen atoms. It is found in the gas state at room temperature. In the air, carbon dioxide is t ...
was used to determine coincidence criteria for satellite and ground-based measurements. * Empirical variograms were calculated for the density of a heterogeneous material (Gilsocarbon). *Empirical variograms are calculated from observations of
strong ground motion In seismology, strong ground motion is the strong earthquake shaking that occurs close to (less than about 50 km from) a causative fault. The strength of the shaking involved in strong ground motion usually overwhelms a seismometer, forci ...
from
earthquake An earthquake (also known as a quake, tremor or temblor) is the shaking of the surface of the Earth resulting from a sudden release of energy in the Earth's lithosphere that creates seismic waves. Earthquakes can range in intensity, fr ...
s. These models are used for
seismic risk Seismic risk refers to the risk of damage from earthquake to a building, system, or other entity. Seismic risk has been defined, for most management purposes, as the potential economic, social and environmental consequences of hazardous events th ...
and loss assessments of spatially-distributed infrastructure.


Related concepts

The squared term in the variogram, for instance (Z(\mathbf_1) - Z(\mathbf_2))^2, can be replaced with different powers: A ''madogram'' is defined with the
absolute difference The absolute difference of two real numbers x and y is given by , x-y, , the absolute value of their difference. It describes the distance on the real line between the points corresponding to x and y. It is a special case of the Lp distance for ...
, , Z(\mathbf_1) - Z(\mathbf_2), , and a ''rodogram'' is defined with the
square root In mathematics, a square root of a number is a number such that ; in other words, a number whose '' square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...
of the absolute difference, , Z(\mathbf_1) - Z(\mathbf_2), ^.
Estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
s based on these lower powers are said to be more resistant to
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s. They can be generalized as a "variogram of order ''α''", :2\gamma(\mathbf_1,\mathbf_2)=E\left Z(\mathbf_1)-Z(\mathbf_2)\^\alpha\right/math>, in which a variogram is of order 2, a madogram is a variogram of order 1, and a rodogram is a variogram of order 0.5. When a variogram is used to describe the correlation of different variables it is called ''cross-variogram''. Cross-variograms are used in co-kriging. Should the variable be binary or represent classes of values, one is then talking about ''indicator variograms''. Indicator variogram is used in indicator kriging.


References


Further reading

* Cressie, N., 1993, Statistics for spatial data, Wiley Interscience. * Chiles, J. P., P. Delfiner, 1999, Geostatistics, Modelling Spatial Uncertainty, Wiley-Interscience. * Wackernagel, H., 2003, Multivariate Geostatistics, Springer. * Burrough, P. A. and McDonnell, R. A., 1998, Principles of Geographical Information Systems.
Isobel Clark, 1979, Practical Geostatistics, Applied Science Publishers
* Clark, I., 1979, ''Practical Geostatistics'', Applied Science Publishers. * David, M., 1978, ''Geostatistical Ore Reserve Estimation'', Elsevier Publishing. * Hald, A., 1952, ''Statistical Theory with Engineering Applications'', John Wiley & Sons, New York. * Journel, A. G. and Huijbregts, Ch. J., 1978 ''Mining Geostatistics'', Academic Press.
Glass, H.J., 2003, Method for assessing quality of the variogram, The Journal of The South African Institute of Mining and Metallurgy


External links


AI-GEOSTATS: an educational resource about geostatistics and spatial statistics


{{Commons category, Variogram Geostatistics Statistical deviation and dispersion Spatial processes