Time Warp Edit Distance (TWED) is a
measure of similarity (or dissimilarity) for discrete
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...
matching with
time
Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, t ...
'
elasticity'. In comparison to other distance measures, (e.g. DTW (
dynamic time warping) or LCS (
longest common subsequence problem
The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, su ...
)), TWED is a
metric. Its
computational time complexity is
, but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
space complexity
The space complexity of an algorithm or a computer program is the amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input. It is the memory required by an algorithm until it ex ...
can be reduced to
. It was first proposed in 2009 by P.-F. Marteau.
Definition
whereas
Whereas the
recursion
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematic ...
is initialized as:
with
Implementations
An implementation of the TWED
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
in
C with a
Python wrapper is available at
TWED is also implemented into the Time Series Subsequence Search Python package (TSSEARCH for short) available a
An
R (programming language), R implementation of TWED has been integrated into the TraMineR, a
R package for
mining
Mining is the extraction of valuable minerals or other geological materials from the Earth, usually from an ore body, lode, vein, seam, reef, or placer deposit. The exploitation of these deposits for raw material is based on the economic ...
, describing and visualizing sequences of states or events, and more generally discrete sequence
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
.
Additionally
cuTWEDis a
CUDA
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
- accelerated implementation of TWED which uses an improved algorithm due to G. Wright (2020). This method is linear in memory and massively parallelized. cuTWED is written in CUDA C/
C++, comes with Python bindings, and also includes Python bindings for Marteau's reference C implementation.
Python
import numpy as np
def dlp(A, B, p=2):
cost = np.sum(np.power(np.abs(A - B), p))
return np.power(cost, 1 / p)
def twed(A, timeSA, B, timeSB, nu, _lambda):
# istance, DP= TWED( A, timeSA, B, timeSB, lambda, nu )
# Compute Time Warp Edit Distance (TWED) for given time series A and B
#
# A := Time series A (e.g. 10 2 30 4
# timeSA := Time stamp of time series A (e.g. 1:4)
# B := Time series B
# timeSB := Time stamp of time series B
# lambda := Penalty for deletion operation
# nu := Elasticity parameter - nu >=0 needed for distance measure
# Reference :
# Marteau, P.; F. (2009). "Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching".
# IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (2): 306–318. arXiv:cs/0703033
# http://people.irisa.fr/Pierre-Francois.Marteau/
# Check if input arguments
if len(A) != len(timeSA):
print("The length of A is not equal length of timeSA")
return None, None
if len(B) != len(timeSB):
print("The length of B is not equal length of timeSB")
return None, None
if nu < 0:
print("nu is negative")
return None, None
# Add padding
A = np.array( + list(A))
timeSA = np.array( + list(timeSA))
B = np.array( + list(B))
timeSB = np.array( + list(timeSB))
n = len(A)
m = len(B)
# Dynamical programming
DP = np.zeros((n, m))
# Initialize DP Matrix and set first row and column to infinity
DP , := np.inf
DP, 0
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= np.inf
DP, 0
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= 0
# Compute minimal cost
for i in range(1, n):
for j in range(1, m):
# Calculate and save cost of various operations
C = np.ones((3, 1)) * np.inf
# Deletion in A
C = (
DP - 1, j + dlp(A - 1 A
+ nu * (timeSA - timeSA - 1
+ _lambda
)
# Deletion in B
C = (
DP, j - 1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
+ dlp(B - 1 B
+ nu * (timeSB - timeSB - 1
+ _lambda
)
# Keep data points in both time series
C = (
DP - 1, j - 1 + dlp(A B
+ dlp(A - 1 B - 1
+ nu * (abs(timeSA - timeSB + abs(timeSA - 1- timeSB - 1)
)
# Choose the operation with the minimal cost and update DP Matrix
DP, j
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= np.min(C)
distance = DP - 1, m - 1 return distance, DP
Backtracking, to find the most
cost-efficient
Cost efficiency (or cost optimality), in the context of parallel computer algorithms, refers to a measure of how effectively parallel computing can be used to solve a particular problem. A parallel algorithm is considered cost efficient if its asy ...
path:
def backtracking(DP):
# best_path = BACKTRACKING ( DP )
# Compute the most cost-efficient path
# DP := DP matrix of the TWED function
x = np.shape(DP)
i = x - 1
j = x - 1
# The indices of the paths are save in opposite direction
# path = np.ones((i + j, 2 )) * np.inf;
best_path = []
steps = 0
while i != 0 or j != 0:
best_path.append((i - 1, j - 1))
C = np.ones((3, 1)) * np.inf
# Keep data points in both time series
C = DP - 1, j - 1 # Deletion in A
C = DP - 1, j # Deletion in B
C = DP, j - 1
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
# Find the index for the lowest cost
idx = np.argmin(C)
if idx 0:
# Keep data points in both time series
i = i - 1
j = j - 1
elif idx 1:
# Deletion in A
i = i - 1
j = j
else:
# Deletion in B
i = i
j = j - 1
steps = steps + 1
best_path.append((i - 1, j - 1))
best_path.reverse()
return best_path :
MATLAB
function istance, DP= twed(A, timeSA, B, timeSB, lambda, nu)
% istance, DP= TWED( A, timeSA, B, timeSB, lambda, nu )
% Compute Time Warp Edit Distance (TWED) for given time series A and B
%
% A := Time series A (e.g. 10 2 30 4
% timeSA := Time stamp of time series A (e.g. 1:4)
% B := Time series B
% timeSB := Time stamp of time series B
% lambda := Penalty for deletion operation
% nu := Elasticity parameter - nu >=0 needed for distance measure
%
% Code by: P.-F. Marteau - http://people.irisa.fr/Pierre-Francois.Marteau/
% Check if input arguments
if length(A) ~= length(timeSA)
warning('The length of A is not equal length of timeSA')
return
end
if length(B) ~= length(timeSB)
warning('The length of B is not equal length of timeSB')
return
end
if nu < 0
warning('nu is negative')
return
end
% Add padding
A = A
timeSA = timeSA
B = B
timeSB = timeSB
% Dynamical programming
DP = zeros(length(A), length(B));
% Initialize DP Matrix and set first row and column to infinity
DP(1, :) = inf;
DP(:, 1) = inf;
DP(1, 1) = 0;
n = length(timeSA);
m = length(timeSB);
% Compute minimal cost
for i = 2:n
for j = 2:m
cost = Dlp(A(i), B(j));
% Calculate and save cost of various operations
C = ones(3, 1) * inf;
% Deletion in A
C(1) = DP(i - 1, j) + Dlp(A(i - 1), A(i)) + nu * (timeSA(i) - timeSA(i - 1)) + lambda;
% Deletion in B
C(2) = DP(i, j - 1) + Dlp(B(j - 1), B(j)) + nu * (timeSB(j) - timeSB(j - 1)) + lambda;
% Keep data points in both time series
C(3) = DP(i - 1, j - 1) + Dlp(A(i), B(j)) + Dlp(A(i - 1), B(j - 1)) + ...
nu * (abs(timeSA(i) - timeSB(j)) + abs(timeSA(i - 1) - timeSB(j - 1)));
% Choose the operation with the minimal cost and update DP Matrix
DP(i, j) = min(C);
end
end
distance = DP(n, m);
% Function to calculate euclidean distance
function ost= Dlp(A, B)
cost = sqrt(sum((A - B) .^ 2, 2));
end
end
Backtracking, to find the most cost-efficient path:
function ath= backtracking(DP)
% path
A path is a route for physical travel – see Trail.
Path or PATH may also refer to:
Physical paths of different types
* Bicycle path
* Bridle path, used by people on horseback
* Course (navigation), the intended path of a vehicle
* Desire ...
= BACKTRACKING ( DP )
% Compute the most cost-efficient path
% DP := DP matrix of the TWED function
x = size(DP);
i = x(1);
j = x(2);
% The indices of the paths are save in opposite direction
path = ones(i + j, 2) * Inf;
steps = 1;
while (i ~= 1 , , j ~= 1)
path(steps, :) = ; j
C = ones(3, 1) * inf;
% Keep data points in both time series
C(1) = DP(i - 1, j - 1);
% Deletion in A
C(2) = DP(i - 1, j);
% Deletion in B
C(3) = DP(i, j - 1);
% Find the index for the lowest cost
, idx= min(C);
switch idx
case 1
% Keep data points in both time series
i = i - 1;
j = j - 1;
case 2
% Deletion in A
i = i - 1;
j = j;
case 3
% Deletion in B
i = i;
j = j - 1;
end
steps = steps + 1;
end
path(steps, :) = j
% Path was calculated in reversed direction.
path = path(1:steps, :);
path = path(end: - 1:1, :);
end
References
*{{cite journal , last1=Marteau , first1=P. , last2=F. , year=2009 , title=Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching , journal=IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=31 , issue=2 , pages=306–318 , doi=10.1109/TPAMI.2008.76 , pmid=19110495 , arxiv=cs/0703033, s2cid=10049446
Time series
Algorithms