High frequency data refers to
time-series data
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
collected at an extremely fine scale. As a result of advanced computational power in recent decades, high frequency data can be accurately collected at an efficient rate for analysis.
[Ruey S. Tsay (2000) Editor's Introduction to Panel Discussion on Analysis of High-Frequency Data, ''Journal of Business & Economic Statistics'', 18:2, 139-139, ] Largely used in financial analysis and in
high frequency trading, high frequency data provides
intraday observations that can be used to understand market behaviors, dynamics, and micro-structures.
[Andersen, T. G. (2000). Some reflections on analysis of high-frequency data. ''Journal of Business & Economic Statistics'', 18(2), 146-153. ]
High frequency data collections were originally formulated by massing tick-by-tick market data, by which each single 'event' (transaction, quote, price movement, etc.) is characterized by a 'tick', or one logical unit of information. Due to the large amounts of ticks in a single day, high frequency data collections generally contain a large amount of data, allowing high statistical precision.
[Dacorogna, M. M. (2001). An introduction to high-frequency finance. San Diego: Academic Press.] High frequency observations across one day of a liquid market can equal the amount of daily data collected in 30 years.
Use
Due to the introduction of electronic forms of trading and
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...
-based data providers, high frequency data has become much more accessible and can allow one to follow price formation in real-time. This has resulted in a large new area of research in the high frequency data field, where academics and researchers use the characteristics of high frequency data to develop adequate models for predicting future market movements and risks.
Model predictions cover a wide range of market behaviors including
volume
Volume is a measure of occupied three-dimensional space. It is often quantified numerically using SI derived units (such as the cubic metre and litre) or by various imperial or US customary units (such as the gallon, quart, cubic inch). Th ...
,
volatility, price movement, and placement optimization.
[Hautsch, N., & SpringerLink (Online service). (2012;2011;). Econometrics of financial high-frequency data (2012th ed.). Heidelberg;Berlin;: Springer. ]
There is an ongoing interest in both regulatory agencies and academia surrounding transaction data and
limit order book
An order book is the list of orders (manual or electronic) that a trading venue (in particular stock exchanges) uses to record the interest of buyers and sellers in a particular financial instrument. A matching engine uses the book to determin ...
data, of which greater implications of trade and market behaviors as well as market outcomes and dynamics can be assessed using high frequency data models. Regulatory agencies take a large interest in these models due to the fact that liquidity and price risks are not fully understood in terms of newer forms of automated trading applications.
High frequency data studies contain value in their ability to trace irregular market activities over a period of time. This information allows a better understanding of price and trading activity and behavior. Due to the importance of timing in market events, high frequency data requires analysis using
point processes
Point or points may refer to:
Places
* Point, Lewis, a peninsula in the Outer Hebrides, Scotland
* Point, Texas, a city in Rains County, Texas, United States
* Point, the NE tip and a ferry terminal of Lismore, Inner Hebrides, Scotland
* Poin ...
, which depend on observations and history to characterize random occurrences of events.
This understanding was first developed by 2003 Nobel Prize in Economics winner
Robert Fry Engle III, who specializes in developing
financial econometric analysis methods using financial data and point processes.
High frequency data forms
High frequency data are primarily used in financial research and
stock market analysis. Whenever a trade, quote, or electronic order is processed, the relating data are collected and entered in a
time-series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
format. As such, high frequency data are often referred to as transaction data.
There are five broad levels of high frequency data that are obtained and used in market research and analysis:
Trade data
Individual trade data collected at a certain interval within a time series.
There are two main variables to describe a single point of trade data: the time of the transaction, and a vector known as a 'mark', which characterizes the details of the transaction event.
Trade and quote data
Data collected details both trades and quotes, including price changes and direction, time stamps, and volume. Such information can be found at the TAQ (
Trade and Quote
Trade involves the transfer of goods and services from one person or entity to another, often in exchange for money. Economists refer to a system or network that allows trade as a market.
An early form of trade, barter, saw the direct exch ...
) database operated by the
NYSE
The New York Stock Exchange (NYSE, nicknamed "The Big Board") is an American stock exchange in the Financial District of Lower Manhattan in New York City. It is by far the world's largest stock exchange by market capitalization of its listed co ...
.
Where trade data details the exchange of a transaction itself, quote data details the optimal trading conditions for a given exchange. This information can indicate halts in exchanges and both opening and closing quotes.
[Brownlees, C. T., & Gallo, G. M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics and Data Analysis, 51(4), 2232-2245. ]
Fixed level order book data
Using systems that have been completely computerized, the depth of the market can be assessed using
limit order
An order is an instruction to buy or sell on a trading venue such as a stock market, bond market, commodity market, financial derivative market or cryptocurrency exchange. These instructions can be simple or complicated, and can be sent to either ...
activities that occur in the background of a given market.
Messages on all limit order activities
This data level displays the full information surrounding
limit order
An order is an instruction to buy or sell on a trading venue such as a stock market, bond market, commodity market, financial derivative market or cryptocurrency exchange. These instructions can be simple or complicated, and can be sent to either ...
activities, and can create a reproduction of the
trade flow at any given time using information on time stamps, cancellations, and buyer/seller identification.
Data on order book snapshots
Snapshots of the order book activities can be recorded on
equi-distant
A point is said to be equidistant from a set of objects if the distances between that point and each object in the set are equal.
In two-dimensional Euclidean geometry, the locus of points equidistant from two given (different) points is the ...
based grids to limit the need to reproduce the order book. This however limits trade analysis ability, and is therefore more useful in understanding dynamics rather than book and trading interaction.
Properties in financial analysis
In financial analysis, high frequency data can be organized in differing time scales from minutes to years.
As high frequency data comes in a largely dis-aggregated form over a time-series compared to lower frequency methods of data collection, it contains various unique characteristics that alter the way the data are understood and analyzed.
Robert Fry Engle III categorizes these distinct characteristics as irregular temporal spacing, discreteness, diurnal patterns, and temporal dependence.
[R. Russell, Jeffrey & F. Engle, Robert. (2010). Analysis of High-Frequency Data. Handbook of Financial Econometrics, Vol 1. 383-426. 10.1016/B978-0-444-50897-3.50010-9. ]

Irregular temporal spacing
High frequency data employs the collection of a large sum of data over a time series, and as such the frequency of single data collection tends to be spaced out in irregular patterns over time. This is especially clear in financial market analysis, where transactions may occur in sequence, or after a prolonged period of inactivity.
Discreteness
High frequency data largely incorporates pricing and transactions, of which institutional rules prevent from drastically rising or falling within a short period of time. This results in data changes based on the measure of one tick.
This lessened ability to fluctuate makes the data more discrete in its use, such as in stock market exchange, where popular stocks tend to stay within 5 ticks of movement. Due to the level of discreteness of high frequency data, there tends to be high level of
kurtosis
In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...
present in the set.
Diurnal patterns
Analysis first made by Engle and Russel in 1998 notes that high frequency data follows a
diurnal pattern, with the duration between trades being smallest at the open and the close of the market. Some foreign markets, which operate 24 hours a day, still display a diurnal pattern based on the time of the day.
Temporal dependence
Due largely to discreteness in prices, high frequency data are temporally dependent. The spread forced by small tick differences in buying and selling prices creates a trend that pushes the price in a particular direction. Similarly, the duration and transaction rates between trades tend to cluster, denoting dependence on the temporal changes of price.
Ultra-High frequency data
In an observation noted by
Robert Fry Engle III, the availability of higher frequencies of data over time incited movement from years, to months, to
intraday collections of financial data. This movement however is not infinite in moving to higher frequencies, but faces a limit when all transactions are eventually recorded.
[Engle, R. F. (2000). The econometrics of ultra-high-frequency data. ''Econometrica'', 68(1), 1-22. ] Engle coined this limiting frequency level as ''ultra-high frequency data''. An outstanding quality of this maximum frequency is extreme irregularly spaced data, due to the large spread of time that a dis-aggregated collection imposes.
Rather than breaking the sequence of ultra-high frequency data by time intervals, which would essentially cause a loss of data and make the set a lower frequency, methods and models such as the
autoregressive conditional duration model can be used to consider varying waiting times between data collection.
Effective handling of ultra-high frequency data can be used to increase accuracy of econometric analyses. This can be accomplished with two processes: data cleaning and data management.
Data cleaning
''Data cleaning'', or
data cleansing
Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the d ...
, is the process of utilizing algorithmic functions to remove unnecessary, irrelevant, and incorrect data from high frequency data sets.
Ultra-high frequency data analysis requires a clean sample of records to be useful for study. As velocities in ultra-high frequency collection increase, more errors and irrelevant data are likely to be identified in the collection.
Errors that occur can be attributed to
human error
Human error refers to something having been done that was "unintended consequences, not intended by the actor; not desired by a set of rules or an external observer; or that led the task or system outside its acceptable limits".Senders, J.W. and M ...
, both intentional (e.g. 'dummy' quotes) and unintentional (e.g.
typing mistake), or computer error, which occur with technical failures.
[Verousis, T., & Ap Gwilym, O. (2010). An improved algorithm for cleaning ultra high-frequency data. ''Journal of Derivatives & Hedge Funds'', 15(4), 323-340. ]
Data management
''Data Management'' refers to the process of selecting a specific time-series of interest within a set of ultra-high frequency data to be pulled and organized for the purpose of an analysis. Various transactions may be reported at the same time and at different price levels, and econometric models generally require one observation at each time stamp, necessitating some form of data aggregation for proper analysis.
Data management efforts can be effective to remedy ultra-high frequency data characteristics including irregular spacing, bid-ask bounce, and market opening and closing.
Alternate uses outside of financial trading
A study published in the ''Freshwater Biology'' journal focusing on episodic weather effects on lakes highlights the use of high frequency data to further understand
meteorological
Meteorology is a branch of the atmospheric sciences (which include atmospheric chemistry and physics) with a major focus on weather forecasting. The study of meteorology dates back millennia, though significant progress in meteorology did not ...
drivers and the consequences of "events", or sudden changes to physical, chemical, and biological parameters of a lake.
[JENNINGS, E., JONES, S., ARVOLA, L., STAEHR, P. A., GAISER, E., JONES, I. D., Teknisk-naturvetenskapliga vetenskapsområdet. (2012). Effects of weather‐related episodic events in lakes: An analysis based on high‐frequency data. ''Freshwater Biology'', 57(3), 589-601. ] Due to advances in data collection technology and human networks coupled with the placement of high frequency monitoring stations at a variety of lake types, these events can be more effectively explored. The use of high frequency data in these studies is noted to be an important factor in allowing analyses of rapidly occurring weather changes at lakes, such as wind speed and rainfall, increasing understandings of lake capacities to handle events in the wake of increasing storm severity and
climate change
In common usage, climate change describes global warming—the ongoing increase in global average temperature—and its effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to ...
.
High frequency data has been found to be useful in the forecasting of inflation. A study by Michele Mondugno in the ''International Journal of Forecasting'' indicates that use of daily and monthly data at a high frequency have generally improved the forecast accuracy of total
CPI
A consumer price index (CPI) is a price index, the price of a weighted average market basket of consumer goods and services purchased by households. Changes in measured CPI track changes in prices over time.
Overview
A CPI is a statistic ...
inflation in the United States.
[Modugno, M. (2013). Now-casting inflation using high frequency data. ''International Journal of Forecasting'', 29(4), 664-675. ] The study utilized a comparison of lower frequency models with one that considered all variables at a high frequency. It was ultimately found that the increased accuracy of both highly volatile transport and energy components of prices in the high frequency inflation model led to greater performance and more accurate results.
The use of
half-life
Half-life (symbol ) is the time required for a quantity (of substance) to reduce to half of its initial value. The term is commonly used in nuclear physics to describe how quickly unstable atoms undergo radioactive decay or how long stable at ...
estimation to evaluate speeds of
mean reversion in economic and financial variables has faced issues in regards to sampling, as a half-life of about 13.53 years would require 147 years of annual data according to early
AR process models.
[Huang, M., Liao, S., & Lin, K. (2015). Augmented Half‐Life estimation based on High‐Frequency data. ''Journal of Forecasting'', 34(7), 523-532. ] As a result, some scholars have utilized high frequency data to estimate half-life annual data. While use of high frequency data can face some limitations to discovering true half-life, mainly through the
bias of an estimator
In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In st ...
, utilizing a high frequency
ARMA model has been found to consistently and effectively estimate half-life with long annual data.
See also
*
High-frequency Trading
High-frequency trading (HFT) is a type of algorithmic financial trading characterized by high speeds, high turnover rates, and high order-to-trade ratios that leverages high-frequency financial data and electronic trading tools. While there is no ...
*
Algorithmic Trading
Algorithmic trading is a method of executing orders using automated pre-programmed trading instructions accounting for variables such as time, price, and volume. This type of trading attempts to leverage the speed and computational resources of ...
*
Market analysis
A market analysis studies the attractiveness and the dynamics of a special market within a special industry. It is part of the industry analysis and thus in turn of the global environmental analysis. Through all of these analyses the strengths, we ...
*
Financial econometrics
Financial econometrics is the application of statistical methods to financial market data. Financial econometrics is a branch of financial economics, in the field of economics. Areas of study include capital markets, financial institutions, corpo ...
*
Robert F. Engle
References
{{reflist
Data
Mathematical finance