HOME

TheInfoList



OR:

Time Warp Edit Distance (TWED) is a measure of similarity (or dissimilarity) for discrete
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...
matching with
time Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, t ...
' elasticity'. In comparison to other distance measures, (e.g. DTW ( dynamic time warping) or LCS (
longest common subsequence problem The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, su ...
)), TWED is a metric. Its computational time complexity is O(n^2), but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
space complexity The space complexity of an algorithm or a computer program is the amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input. It is the memory required by an algorithm until it ex ...
can be reduced to O(n). It was first proposed in 2009 by P.-F. Marteau.


Definition

\delta_(A^p_1,B^q_1) = Min \begin \delta_(A^_1,B^q_1) + \Gamma(a^_p \to \Lambda) & \rm \\ \delta_(A^_1,B^_1) + \Gamma(a^_p \to b^_q) & \rm\\ \delta_(A^_1,B^_1) + \Gamma(\Lambda \to b^_q) & \rm \end
whereas \Gamma(\alpha^_p \to \Lambda) = d_(a^_, a^_) + \nu \cdot (t_ - t_) + \lambda
\Gamma(\alpha^_p \to b^_q) = d_(a^_p, b^_q) + d_(a^_, b^_) + \nu \cdot (, t_ - t_, + , t_ - t_, )
\Gamma(\Lambda \to b^_q) = d_(b^_, b^_) + \nu \cdot (t_ - t_) + \lambda Whereas the
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
\delta_ is initialized as:
\delta_(A^0_1,B^0_1) = 0,
\delta_(A^0_1,B^j_1) = \infty\ \rmj \ge 1
\delta_(A^i_1,B^0_1) = \infty\ \rmi \ge 1
with a'_0 = b'_0 = 0


Implementations

An implementation of the TWED
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
in C with a Python wrapper is available at TWED is also implemented into the Time Series Subsequence Search Python package (TSSEARCH for short) available a

An R (programming language), R implementation of TWED has been integrated into the TraMineR, a R package for
mining Mining is the extraction of valuable minerals or other geological materials from the Earth, usually from an ore body, lode, vein, seam, reef, or placer deposit. The exploitation of these deposits for raw material is based on the economic ...
, describing and visualizing sequences of states or events, and more generally discrete sequence
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
. Additionally
cuTWED
is a
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
- accelerated implementation of TWED which uses an improved algorithm due to G. Wright (2020). This method is linear in memory and massively parallelized. cuTWED is written in CUDA C/ C++, comes with Python bindings, and also includes Python bindings for Marteau's reference C implementation.


Python

import numpy as np def dlp(A, B, p=2): cost = np.sum(np.power(np.abs(A - B), p)) return np.power(cost, 1 / p) def twed(A, timeSA, B, timeSB, nu, _lambda): # istance, DP= TWED( A, timeSA, B, timeSB, lambda, nu ) # Compute Time Warp Edit Distance (TWED) for given time series A and B # # A := Time series A (e.g. 10 2 30 4 # timeSA := Time stamp of time series A (e.g. 1:4) # B := Time series B # timeSB := Time stamp of time series B # lambda := Penalty for deletion operation # nu := Elasticity parameter - nu >=0 needed for distance measure # Reference : # Marteau, P.; F. (2009). "Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching". # IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (2): 306–318. arXiv:cs/0703033 # http://people.irisa.fr/Pierre-Francois.Marteau/ # Check if input arguments if len(A) != len(timeSA): print("The length of A is not equal length of timeSA") return None, None if len(B) != len(timeSB): print("The length of B is not equal length of timeSB") return None, None if nu < 0: print("nu is negative") return None, None # Add padding A = np.array( + list(A)) timeSA = np.array( + list(timeSA)) B = np.array( + list(B)) timeSB = np.array( + list(timeSB)) n = len(A) m = len(B) # Dynamical programming DP = np.zeros((n, m)) # Initialize DP Matrix and set first row and column to infinity DP , := np.inf DP
, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= np.inf DP
, 0 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= 0 # Compute minimal cost for i in range(1, n): for j in range(1, m): # Calculate and save cost of various operations C = np.ones((3, 1)) * np.inf # Deletion in A C = ( DP - 1, j + dlp(A - 1 A + nu * (timeSA - timeSA - 1 + _lambda ) # Deletion in B C = ( DP
, j - 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
+ dlp(B - 1 B + nu * (timeSB - timeSB - 1 + _lambda ) # Keep data points in both time series C = ( DP - 1, j - 1 + dlp(A B + dlp(A - 1 B - 1 + nu * (abs(timeSA - timeSB + abs(timeSA - 1- timeSB - 1) ) # Choose the operation with the minimal cost and update DP Matrix DP
, j The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= np.min(C) distance = DP - 1, m - 1 return distance, DP
Backtracking, to find the most
cost-efficient Cost efficiency (or cost optimality), in the context of parallel computer algorithms, refers to a measure of how effectively parallel computing can be used to solve a particular problem. A parallel algorithm is considered cost efficient if its asy ...
path: def backtracking(DP): # best_path = BACKTRACKING ( DP ) # Compute the most cost-efficient path # DP := DP matrix of the TWED function x = np.shape(DP) i = x - 1 j = x - 1 # The indices of the paths are save in opposite direction # path = np.ones((i + j, 2 )) * np.inf; best_path = [] steps = 0 while i != 0 or j != 0: best_path.append((i - 1, j - 1)) C = np.ones((3, 1)) * np.inf # Keep data points in both time series C = DP - 1, j - 1 # Deletion in A C = DP - 1, j # Deletion in B C = DP
, j - 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
# Find the index for the lowest cost idx = np.argmin(C) if idx

0: # Keep data points in both time series i = i - 1 j = j - 1 elif idx

1: # Deletion in A i = i - 1 j = j else: # Deletion in B i = i j = j - 1 steps = steps + 1 best_path.append((i - 1, j - 1)) best_path.reverse() return best_path :


MATLAB

function istance, DP= twed(A, timeSA, B, timeSB, lambda, nu) % istance, DP= TWED( A, timeSA, B, timeSB, lambda, nu ) % Compute Time Warp Edit Distance (TWED) for given time series A and B % % A := Time series A (e.g. 10 2 30 4 % timeSA := Time stamp of time series A (e.g. 1:4) % B := Time series B % timeSB := Time stamp of time series B % lambda := Penalty for deletion operation % nu := Elasticity parameter - nu >=0 needed for distance measure % % Code by: P.-F. Marteau - http://people.irisa.fr/Pierre-Francois.Marteau/ % Check if input arguments if length(A) ~= length(timeSA) warning('The length of A is not equal length of timeSA') return end if length(B) ~= length(timeSB) warning('The length of B is not equal length of timeSB') return end if nu < 0 warning('nu is negative') return end % Add padding A = A timeSA = timeSA B = B timeSB = timeSB % Dynamical programming DP = zeros(length(A), length(B)); % Initialize DP Matrix and set first row and column to infinity DP(1, :) = inf; DP(:, 1) = inf; DP(1, 1) = 0; n = length(timeSA); m = length(timeSB); % Compute minimal cost for i = 2:n for j = 2:m cost = Dlp(A(i), B(j)); % Calculate and save cost of various operations C = ones(3, 1) * inf; % Deletion in A C(1) = DP(i - 1, j) + Dlp(A(i - 1), A(i)) + nu * (timeSA(i) - timeSA(i - 1)) + lambda; % Deletion in B C(2) = DP(i, j - 1) + Dlp(B(j - 1), B(j)) + nu * (timeSB(j) - timeSB(j - 1)) + lambda; % Keep data points in both time series C(3) = DP(i - 1, j - 1) + Dlp(A(i), B(j)) + Dlp(A(i - 1), B(j - 1)) + ... nu * (abs(timeSA(i) - timeSB(j)) + abs(timeSA(i - 1) - timeSB(j - 1))); % Choose the operation with the minimal cost and update DP Matrix DP(i, j) = min(C); end end distance = DP(n, m); % Function to calculate euclidean distance function ost= Dlp(A, B) cost = sqrt(sum((A - B) .^ 2, 2)); end end Backtracking, to find the most cost-efficient path: function ath= backtracking(DP) %
path A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desire ...
= BACKTRACKING ( DP ) % Compute the most cost-efficient path % DP := DP matrix of the TWED function x = size(DP); i = x(1); j = x(2); % The indices of the paths are save in opposite direction path = ones(i + j, 2) * Inf; steps = 1; while (i ~= 1 , , j ~= 1) path(steps, :) = ; j C = ones(3, 1) * inf; % Keep data points in both time series C(1) = DP(i - 1, j - 1); % Deletion in A C(2) = DP(i - 1, j); % Deletion in B C(3) = DP(i, j - 1); % Find the index for the lowest cost , idx= min(C); switch idx case 1 % Keep data points in both time series i = i - 1; j = j - 1; case 2 % Deletion in A i = i - 1; j = j; case 3 % Deletion in B i = i; j = j - 1; end steps = steps + 1; end path(steps, :) = j % Path was calculated in reversed direction. path = path(1:steps, :); path = path(end: - 1:1, :); end


References

*{{cite journal , last1=Marteau , first1=P. , last2=F. , year=2009 , title=Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , journal=IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=31 , issue=2 , pages=306–318 , doi=10.1109/TPAMI.2008.76 , pmid=19110495 , arxiv=cs/0703033, s2cid=10049446 Time series Algorithms