Sobol sequences (also called LP
τ sequences or (''t'', ''s'') sequences in base 2) are an example of quasi-random
low-discrepancy sequence In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of ''N'', its subsequence ''x''1, ..., ''x'N'' has a low discrepancy.
Roughly speaking, the discrepancy of a sequence is low if the proportion of poi ...
s. They were first introduced by the Russian mathematician
Ilya M. Sobol
Ilya Meyerovich Sobol’ (russian: Илья Меерович Соболь; born 15 August 1926) is a Russian mathematician, known for his work on Monte Carlo methods. His research spans several applications, from nuclear studies to astrophysics, ...
(Илья Меерович Соболь) in 1967.
[Sobol,I.M.
(1967), "Distribution of points in a cube and approximate evaluation of integrals". ''Zh. Vych. Mat. Mat. Fiz.'' 7: 784–802 (in Russian); ''U.S.S.R Comput. Maths. Math. Phys.'' 7: 86–112 (in English).]
These sequences use a base of two to form successively finer uniform partitions of the unit interval and then reorder the coordinates in each dimension.
Good distributions in the ''s''-dimensional unit hypercube
Let ''I
s'' =
,1sup>''s'' be the ''s''-dimensional unit hypercube, and ''f'' a real integrable function over ''I
s''. The original motivation of Sobol was to construct a sequence ''x
n'' in ''I
s'' so that
:
and the convergence be as fast as possible.
It is more or less clear that for the sum to converge towards the integral, the points ''x
n'' should fill ''I
s'' minimizing the holes. Another good property would be that the projections of ''x
n'' on a lower-dimensional face of ''I
s'' leave very few holes as well. Hence the homogeneous filling of ''I
s'' does not qualify because in lower dimensions many points will be at the same place, therefore useless for the integral estimation.
These good distributions are called (''t'',''m'',''s'')-nets and (''t'',''s'')-sequences in base ''b''. To introduce them, define first an elementary ''s''-interval in base ''b'' a subset of ''I
s'' of the form
:
where ''a
j'' and ''d
j'' are non-negative integers, and
for all ''j'' in .
Given 2 integers
, a (''t'',''m'',''s'')-net in base ''b'' is a sequence ''x
n'' of ''b
m'' points of ''I
s'' such that
for all elementary interval ''P'' in base ''b'' of hypervolume ''λ''(''P'') = ''b
t−m''.
Given a non-negative integer ''t'', a (''t'',''s'')-sequence in base ''b'' is an infinite sequence of points ''x
n'' such that for all integers
, the sequence
is a (''t'',''m'',''s'')-net in base ''b''.
In his article, Sobol described ''Π
τ-meshes'' and ''LP
τ sequences'', which are (''t'',''m'',''s'')-nets and (''t'',''s'')-sequences in base 2 respectively. The terms (''t'',''m'',''s'')-nets and (''t'',''s'')-sequences in base ''b'' (also called Niederreiter sequences) were coined in 1988 by
Harald Niederreiter
Harald G. Niederreiter (born June 7, 1944) is an Austrian mathematician known for his work in discrepancy theory, algebraic geometry, quasi-Monte Carlo methods, and cryptography.
Education and career
Niederreiter was born on June 7, 1944, in Vi ...
.
[ Niederreiter, H. (1988). "Low-Discrepancy and Low-Dispersion Sequences", ''Journal of Number Theory'' 30: 51–70.] The term ''Sobol sequences'' was introduced in late English-speaking papers in comparison with
Halton, Faure and other low-discrepancy sequences.
A fast algorithm
A more efficient
Gray code
The reflected binary code (RBC), also known as reflected binary (RB) or Gray code after Frank Gray, is an ordering of the binary numeral system such that two successive values differ in only one bit (binary digit).
For example, the representati ...
implementation was proposed by Antonov and Saleev.
[Antonov, I.A. and Saleev, V.M. (1979) "An economic method of computing LPτ-sequences". ''Zh. Vych. Mat. Mat. Fiz.'' 19: 243–245 (in Russian); ''U.S.S.R. Comput. Maths. Math. Phys.'' 19: 252–256 (in English).]
As for the generation of Sobol numbers, they are clearly aided by the use of Gray code
instead of ''n'' for constructing the ''n''-th point draw.
Suppose we have already generated all the Sobol sequence draws up to ''n'' − 1 and kept in memory the values ''x''
''n''−1,''j'' for all the required dimensions. Since the Gray code ''G''(''n'') differs from that of the preceding one ''G''(''n'' − 1) by just a single, say the ''k''-th, bit (which is a rightmost zero bit of ''n'' − 1), all that needs to be done is a single
XOR
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
operation for each dimension in order to propagate all of the ''x''
''n''−1 to ''x''
''n'', i.e.
:
Additional uniformity properties
Sobol introduced additional uniformity conditions known as property A and A’.
[Sobol, I. M. (1976) "Uniformly distributed sequences with an additional uniform property". ''Zh. Vych. Mat. Mat. Fiz.'' 16: 1332–1337 (in Russian); ''U.S.S.R. Comput. Maths. Math. Phys.'' 16: 236–242 (in English).]
;Definition: A low-discrepancy sequence is said to satisfy Property A if for any binary segment (not an arbitrary subset) of the ''d''-dimensional sequence of length 2
''d'' there is exactly one draw in each 2
''d'' hypercubes that result from subdividing the unit hypercube along each of its length extensions into half.
;Definition: A low-discrepancy sequence is said to satisfy Property A’ if for any binary segment (not an arbitrary subset) of the ''d''-dimensional sequence of length 4
''d'' there is exactly one draw in each 4
''d'' hypercubes that result from subdividing the unit hypercube along each of its length extensions into four equal parts.
There are mathematical conditions that guarantee properties A and A'.
;Theorem: The ''d''-dimensional Sobol sequence possesses Property A iff
::
:where V
''d'' is the ''d'' × ''d'' binary matrix defined by
::
:with ''v''
''k'',''j'',''m'' denoting the ''m''-th digit after the binary point of the direction number ''v''
''k'',''j'' = (0.''v''
''k'',''j'',1''v''
''k'',''j'',2...)
2.
;Theorem: The ''d''-dimensional Sobol sequence possesses Property A' iff
::
:where U
''d'' is the 2''d'' × 2''d'' binary matrix defined by
::
:with ''v''
''k'',''j'',''m'' denoting the ''m''-th digit after the binary point of the direction number ''v''
''k'',''j'' = (0.''v''
''k'',''j'',1''v''
''k'',''j'',2...)
2.
Tests for properties A and A’ are independent. Thus it is possible to construct the Sobol sequence that satisfies both properties A and A’ or only one of them.
The initialisation of Sobol numbers
To construct a Sobol sequence, a set of direction numbers ''v''
''i'',''j'' needs to be selected. There is some freedom in the selection of initial direction numbers.
[These numbers are usually called ''initialisation numbers''.] Therefore, it is possible to receive different realisations of the Sobol sequence for selected dimensions. A bad selection of initial numbers can considerably reduce the efficiency of Sobol sequences when used for computation.
Arguably the easiest choice for the initialisation numbers is just to have the ''l''-th leftmost bit set, and all other bits to be zero, i.e. ''m''
''k'',''j'' = 1 for all ''k'' and ''j''. This initialisation is usually called ''unit initialisation''. However, such a sequence fails the test for Property A and A’ even for low dimensions and hence this initialisation is bad.
Implementation and availability
Good initialisation numbers for different numbers of dimensions are provided by several authors. For example, Sobol provides initialisation numbers for dimensions up to 51.
[Sobol, I.M. and Levitan, Y.L. (1976). "The production of points uniformly distributed in a multidimensional cube" ''Tech. Rep. 40, Institute of Applied Mathematics, USSR Academy of Sciences'' (in Russian).] The same set of initialisation numbers is used by Bratley and Fox.
[Bratley, P. and Fox, B. L. (1988), "Algorithm 659: Implementing Sobol’s quasirandom sequence generator". ''ACM Trans. Math. Software'' 14: 88–100.]
Initialisation numbers for high dimensions are available on Joe and Kuo.
Peter Jäckel provides initialisation numbers up to dimension 32 in his book "
Monte Carlo methods in finance Monte Carlo methods are used in corporate finance and mathematical finance to value and analyze (complex) instruments, portfolios and investments by simulating the various sources of uncertainty affecting their value, and then determining the dist ...
".
[Jäckel, P. (2002) "Monte Carlo methods in finance". New York: ]John Wiley and Sons
John Wiley & Sons, Inc., commonly known as Wiley (), is an American multinational publishing company founded in 1807 that focuses on academic publishing and instructional materials. The company produces books, journals, and encyclopedias, in p ...
. (.)
Other implementations are available as C, Fortran 77, or Fortran 90 routines in the
Numerical Recipes
''Numerical Recipes'' is the generic title of a series of books on algorithms and numerical analysis by William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery. In various editions, the books have been in print since 1 ...
collection of software.
[Press, W.H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992) "Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd ed." ''Cambridge University Press, Cambridge, U.K.''] A
free/open-source implementation in up to 1111 dimensions, based on the Joe and Kuo initialisation numbers, is available in C, and up to 21201 dimensions in Python and
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
. A different free/open-source implementation in up to 1111 dimensions is available for
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Fortran 90,
Matlab
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
, and
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
.
Finally, commercial Sobol sequence generators are available within, for example, the
NAG Library.
A version is available from the British-Russian Offshore Development Agency (BRODA).
MATLAB also contains an implementation
sobolset reference page
Retrieved 2017-07-24. as part of its Statistics Toolbox.
See also
*Low-discrepancy sequence In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of ''N'', its subsequence ''x''1, ..., ''x'N'' has a low discrepancy.
Roughly speaking, the discrepancy of a sequence is low if the proportion of poi ...
s
*Quasi-Monte Carlo method
In numerical analysis, the quasi-Monte Carlo method is a method for numerical integration and solving some other problems using low-discrepancy sequences (also called quasi-random sequences or sub-random sequences). This is in contrast to the regu ...
Notes
References
External links
Collected Algorithms of the ACM
''(See algorithms 647, 659, and 738.)''
Collection of Sobol sequences generator programming codes
Freeware C++ generator of Sobol sequence
{{DEFAULTSORT:Sobol Sequence
Low-discrepancy sequences
Sequences and series