GDELT Project
   HOME

TheInfoList



OR:

The GDELT Project, or Global Database of Events, Language, and Tone, created by
Kalev Leetaru Kalev Hannes Leetaru is an American internet entrepreneur, academic, and senior fellow at the George Washington University School of Engineering and Applied Sciencebr>Center for Cyber & Homeland Securityin Washington, D.C. He was a former Yahoo! ...
of
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Manage ...
and
Georgetown University Georgetown University is a private university, private research university in the Georgetown (Washington, D.C.), Georgetown neighborhood of Washington, D.C. Founded by Bishop John Carroll (archbishop of Baltimore), John Carroll in 1789 as Georg ...
, along with Philip Schrodt and others, describes itself as "an initiative to construct a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what's happening around the world, what its context is and who's involved, and how the world is feeling about it, every single day." Early explorations leading up to the creation of GDELT were described by co-creator
Philip Schrodt Philip Andrew "Phil" Schrodt (born July 24, 1951) is a political scientist known for his work in automated data and event coding for political news. On August 1, 2013, he announced that he was leaving his job as professor at Pennsylvania State Univ ...
in a conference paper in January 2011. The dataset is available on Google Cloud Platform.


Data

GDELT includes data from 1979 to the present. The data is available as zip files in tab-separated value format using a CSV extension for easy import into
Microsoft Excel Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for App ...
or similar spreadsheet software. Data from 1979 to 2005 is available in the form of one zip file per year, with the file size gradually increased from 14.3 MB in 1979 to 125.9 MB in 2005, reflecting the increase in the number of news media and the frequency and comprehensiveness of event recording. Data files from January 2006 to March 2013 are available at monthly granularity, with the zipped file size rising from 11 MB in January 2006 to 103.2 MB in March 2013. Data files from April 1, 2013 onward are available at a daily granularity. The data file for each date is made available by 6 AM
Eastern Standard Time The Eastern Time Zone (ET) is a time zone encompassing part or all of 23 U.S. states, states in the eastern part of the United States, parts of eastern Canada, the state of Quintana Roo in Mexico, Panama, Colombia, mainland Ecuador, Peru, and ...
the next day. As of June 2014, the size of the daily zipped file is about 5-12 MB. The data files use
Conflict and Mediation Event Observations Conflict and Mediation Event Observations (CAMEO) is a framework for coding event data (typically used for events that merit news coverage, and generally applied to the study of political news and violence). It is a more recent alternative to the W ...
(CAMEO) coding for recording events. In a blog post for '' Foreign Policy'', co-creator
Kalev Leetaru Kalev Hannes Leetaru is an American internet entrepreneur, academic, and senior fellow at the George Washington University School of Engineering and Applied Sciencebr>Center for Cyber & Homeland Securityin Washington, D.C. He was a former Yahoo! ...
attempted to use GDELT data to answer the question of whether the
Arab Spring The Arab Spring ( ar, الربيع العربي) was a series of anti-government protests, uprisings and armed rebellions that spread across much of the Arab world in the early 2010s. It began in Tunisia in response to corruption and econo ...
sparked protests worldwide, using the quotient of the number of protest-related events to the total number of events recorded as a measure of ''protest intensity'' for which the time trend was then studied. Political scientist and data science/forecasting expert
Jay Ulfelder Jay Ulfelder is an American political scientist who is best known for his work on political forecasting, specifically on anticipating various forms of political instability around the world. From 2001 to 2010, he served as research director of the P ...
critiqued the post on his personal blog, saying that Leetaru's normalization method may not have adequately accounted for the change in the nature and composition of media coverage. The dataset is also available on Google Cloud Platform and can be accessed using
Google BigQuery BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a ''Platform as a Service'' (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. Bi ...
.


Reception


Academic reception

GDELT has been cited and used in a number of academic studies, such as a study of visual and
predictive analytics Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. In busine ...
of
Singapore Singapore (), officially the Republic of Singapore, is a sovereign island country and city-state in maritime Southeast Asia. It lies about one degree of latitude () north of the equator, off the southern tip of the Malay Peninsula, bor ...
news (along with
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
and the Straits Times Index) and a study of political conflict. The challenge problem at the 2014 International Social Computing, Behavioral Modeling and Prediction Conference (SBP) asked participants to explore GDELT and apply it to the analysis of social networks, behavior, and prediction.


Reception in blogs and media

GDELT has been covered on the website of the
Center for Data Innovation Center or centre may refer to: Mathematics * Center (geometry), the middle of an object * Center (algebra), used in various contexts ** Center (group theory) ** Center (ring theory) * Graph center, the set of all vertices of minimum eccentrici ...
as well as the GIS Lounge. It has also been discussed and critiqued on blogs about political violence and crisis prediction. The dataset has been cited and critiqued repeatedly in '' Foreign Policy'', including in discussions of political events in Syria, the
Arab Spring The Arab Spring ( ar, الربيع العربي) was a series of anti-government protests, uprisings and armed rebellions that spread across much of the Arab world in the early 2010s. It began in Tunisia in response to corruption and econo ...
, and
Nigeria Nigeria ( ), , ig, Naìjíríyà, yo, Nàìjíríà, pcm, Naijá , ff, Naajeeriya, kcg, Naijeriya officially the Federal Republic of Nigeria, is a country in West Africa. It is situated between the Sahel to the north and the Gulf o ...
. It has also been cited in ''
New Scientist ''New Scientist'' is a magazine covering all aspects of science and technology. Based in London, it publishes weekly English-language editions in the United Kingdom, the United States and Australia. An editorially separate organisation publish ...
'', on the
FiveThirtyEight ''FiveThirtyEight'', sometimes rendered as ''538'', is an American website that focuses on opinion poll analysis, politics, economics, and sports blogging in the United States. The website, which takes its name from the number of electors in th ...
website and Andrew Sullivan's blog. The Predictive Heuristics blog and other blogs have compared GDELT with the
Integrated Conflict Early Warning System The Integrated Crisis Early Warning System (ICEWS) combines a database of political events and a system using these to provide conflict early warnings. It is supported by the Defense Advanced Research Projects Agency in the United States. The databa ...
(ICEWS). Alex Hanna blogged about her experiment assessing GDELT with handcoded data by comparing it with the
Dynamics of Collective Action Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' "power") or dynamic may refer to: Physics and engineering * Dynamics (mechanics) ** Aerodynamics, the study of the motion of air ** Analytical dyna ...
dataset. In May 2014, the Google Cloud Platform blog announced that the entire GDELT dataset would be available as a public dataset in
Google BigQuery BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a ''Platform as a Service'' (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. Bi ...
.


See also

*
United Nations Global Pulse The United Nations Global Pulse is an initiative of the United Nations that attempts to "bring real-time monitoring and prediction to development and aid programs." History and activities The United Nations Global Pulse was launched in 2009 as an ...
*
Integrated Crisis Early Warning System The Integrated Crisis Early Warning System (ICEWS) combines a database of political events and a system using these to provide conflict early warnings. It is supported by the Defense Advanced Research Projects Agency in the United States. The data ...


References


External links

* {{official website Political databases