HOME

TheInfoList



OR:

Social media mining is the process of obtaining
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
from
user-generated content User-generated content (UGC), alternatively known as user-created content (UCC), is any form of content, such as images, videos, text, testimonials, and audio, that has been posted by users on online platforms such as social media, discussion f ...
on social media sites and
mobile apps A mobile application or app is a computer program or software application designed to run on a mobile device such as a phone, tablet, or watch. Mobile applications often stand in contrast to desktop applications which are designed to run on des ...
in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of
mining Mining is the extraction of valuable minerals or other geological materials from the Earth, usually from an ore body, lode, vein, seam, reef, or placer deposit. The exploitation of these deposits for raw material is based on the econom ...
for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services. Social media mining uses a range of basic concepts from
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, data mining,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
. Social media miners develop
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s suitable for investigating massive files of social media data. Social media mining is based on theories and methodologies from
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
,
network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors rep ...
,
sociology Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation an ...
,
ethnography Ethnography (from Greek ''ethnos'' "folk, people, nation" and ''grapho'' "I write") is a branch of anthropology and the systematic study of individual cultures. Ethnography explores cultural phenomena from the point of view of the subject ...
, optimization and mathematics. It encompasses the tools to formally represent, measure and model meaningful patterns from large-scale social media data. In the 2010s, major corporations, governments and not-for-profit organizations engaged in social media mining to obtain data about customers, clients and citizens.


Background

As defined by Kaplan and Haenlein, social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dust ...
or
LinkedIn LinkedIn () is an American business and employment-oriented online service that operates via websites and mobile apps. Launched on May 5, 2003, the platform is primarily used for professional networking and career development, and allows job se ...
), microblogging (
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
), photo sharing (
Flickr Flickr ( ; ) is an American image hosting and video hosting service, as well as an online community, founded in Canada and headquartered in the United States. It was created by Ludicorp in 2004 and was a popular way for amateur and profession ...
,
Instagram Instagram is a photo and video sharing social networking service owned by American company Meta Platforms. The app allows users to upload media that can be edited with filters and organized by hashtags and geographical tagging. Posts can ...
, Photobucket, or
Picasa Picasa was a cross-platform image organizer and image viewer for organizing and editing digital photos, integrated with a now defunct photo-sharing website, originally created by a company named Lifescape (which at that time was incubated by I ...
), news aggregation (
Google Reader Google Reader was an RSS/Atom feed aggregator operated by Google. It was created in early 2005 by Google engineer Chris Wetherell and launched on October 7, 2005, through Google Labs. Google Reader grew in popularity to support a number of progra ...
,
StumbleUpon StumbleUpon was a discovery and advertisement engine (a form of web search engine) that pushed web content recommendations to its users. Its features allowed users to discover and rate Web pages, photos and videos that are personalized to their ...
, or
Feedburner FeedBurner is a web feed management service primarily for monetizing RSS feeds, primarily by inserting targeted advertisements into them. It was founded in 2004 and acquired by Google in 2007. Services Services provided to publishers include tr ...
), video sharing (
YouTube YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second mo ...
, MetaCafe), livecasting (
Ustream IBM Watson Media (formerly Ustream and IBM Cloud Video) is an American virtual events platform company which is a division of IBM. Prior to IBM acquisition, it had more than 180 employees across San Francisco, Los Angeles, and Budapest offices. ...
or
Twitch Twitch may refer to: Biology * Muscle contraction ** Convulsion, rapid and repeated muscle contraction and relaxation ** Fasciculation, a small, local, involuntary muscle contraction ** Myoclonic twitch, a jerk usually caused by sudden muscle co ...
), virtual worlds (
Kaneva Kaneva, LLC is a privately owned American video game company based in Atlanta, Georgia and founded in 2004 by Christopher Klaus and Greg Frame. Kaneva was a 3D Virtual World that supported 2D web browsing, social networking and shared media. In ...
), social gaming (
World of Warcraft ''World of Warcraft'' (''WoW'') is a massively multiplayer online role-playing game (MMORPG) released in 2004 by Blizzard Entertainment. Set in the '' Warcraft'' fantasy universe, ''World of Warcraft'' takes place within the world of Azer ...
), social search (
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
,
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
, or Ask.com), and instant messaging (
Google Talk Google Talk was an Instant messaging, instant messaging service that provided both text and voice communication. The instant messaging service was variously referred to colloquially as Gchat, Gtalk, or Gmessage among its users. Google Talk was ...
,
Skype Skype () is a proprietary telecommunications application operated by Skype Technologies, a division of Microsoft, best known for VoIP-based videotelephony, videoconferencing and voice calls. It also has instant messaging, file transfer, debi ...
, or
Yahoo! messenger Yahoo! Messenger (sometimes abbreviated Y!M) was an advertisement-supported instant messaging client (computing), client and associated protocol provided by Yahoo!. Yahoo! Messenger was provided free of charge and could be downloaded and used wit ...
). The first social media website was introduced by
GeoCities Yahoo! GeoCities was a web hosting service that allowed users to create and publish websites for free and to browse user-created websites by their theme or interest. GeoCities was started in November 1994 by David Bohnett and John Rezner, and ...
in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
coding. The first social networking site, SixDegrees.com, was introduced in 1997. Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media.


Uses

Social media mining is used across several industries including business development, social science research, health services, and educational purposes.Zafarani, R., Ali Abbasi, M., Liu, H., (2014). Social Media Mining. Cambridge University Press. http://dmml.asu.edu/smm. Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity.Tang, J., Chang, Y., Aggarwal, C., Liu, H., (2016).
A Survey of Signed Network Mining in Social Media
. ''ACM Computing Surveys'', 49: 3.
These forces are then measured via statistical analysis of the nodes and connections between these nodes. Social analytics also uses
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
, because social media users often relay positive or negative sentiment in their posts.Adedoyin-Olowe, M., Gaber, M., & Stahl, F., (2013). "A Survey of Data Mining Techniques for Social Media Analysis." This provides important social information about users' emotions on specific topics. These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. Companies would be interested in this information in order to decide who they may hire for
influencer marketing Influencer marketing (also known as influence marketing) is a form of social media marketing involving endorsements and product placement from influencers, people and organizations who have a purported expert level of knowledge or social i ...
. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. Analysts also value measures of homophily: the tendency of two similar individuals to become friends. Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. These analyses can also help create recommendations for individuals in a tailored capacity. By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with.


Perception

Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
. Companies such as these, considered "
Big Tech Big Tech, also known as the Tech Giants, refers to the most dominant companies in the information technology industry, mostly located in the United States. The term also refers to the four or five largest American tech companies, called the Big ...
" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021. It has been claimed by a multitude of anti-algorithm personalities, like
Tristan Harris Tristan Harris () is an American technology ethicist. He is the executive director and co-founder of the Center for Humane Technology. Early in his career, Harris worked as a design ethicist at Google. He received his baccalaureate degree from ...
or
Chamath Palihapitiya Chamath Palihapitiya (born 3 September 1976) is a Sri Lankan-born Canadian and American venture capitalist, engineer, SPAC sponsor, founder and CEO of Social Capital. Palihapitiya was an early senior executive at Facebook, working at the compan ...
, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics. At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming
influencer An Internet celebrity (also known as a social media influencer, social media personality, internet personality, or simply influencer) is a celebrity who has acquired or developed their fame and notability through the Internet. The rise of social m ...
s. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought.


Research


Research areas

* Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. * Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions, or understand human reproduction and sexual interest. *
Community structure In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the part ...
(Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how they evolve, and evaluating identified communities, often without ground truth. * Network measures – Measuring centrality, transitivity, reciprocity, balance, status, and similarity in social media. * Network models – Simulate networks with specific characteristics. Examples include random graphs (E-R models), Preferential attachment models, and small-world models. *
Information cascade An Information cascade or informational cascade is a phenomenon described in behavioral economics and network theory in which a number of people make the same decision in a sequential fashion. It is similar to, but distinct from herd behavior. An ...
– Analyzing how information propagates in social media sites. Examples include herd behavior, information cascades, diffusion of innovations, and epidemic models. *
Influence Influence or influencer may refer to: *Social influence, in social psychology, influence in interpersonal relationships ** Minority influence, when the minority affect the behavior or beliefs of the majority *Influencer marketing, through individ ...
and homophily – Measuring network assortativity and measuring and modeling influence and homophily. * Recommendation in social media – recommending friends or items on social media sites. *
Social search Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram an ...
– Searching for information on the social web. * Sentiment analysis in social media – Identifying collectively subjective information, e.g. positive and negative, from social media data. * Social spammer detection – Detecting social spammers who send out unwanted spam content appearing on social networks and any website with user-generated content to targeted users, often corroborating to boost their social influence, legitimacy, credibility. * Feature selection with social media data – Transforming feature selection to harness the power of social media. * Trust in social media – Studying and understanding of trust in social media. *Distrust and negative links – Exploring negative links in social media. *Role of
social media Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social me ...
in
crises A crisis ( : crises; : critical) is either any event or period that will (or might) lead to an unstable and dangerous situation affecting an individual, group, or all of society. Crises are negative changes in the human or environmental affair ...
– Social media is continuing to play an important role during crises, particularly Twitter. Studies show that it is possible to detect earthquakes and rumors using tweets published during crisis. Developing tools to help first responders to analyze tweets towards better crisis response and developing techniques to provide them faster access to relevant tweets is an active area of research. *Location-based social network mining – Mining Human Mobility for Personalized POI Recommendation on Location-based Social Networks. *Provenance of information in social media –
Provenance Provenance (from the French ''provenir'', 'to come from/forth') is the chronology of the ownership, custody or location of a historical object. The term was originally mostly used in relation to works of art but is now used in similar senses i ...
informs a user about the sources of a given piece of information. Social media can help in identifying the provenance of information due its unique features: user-generated content, user profiles, user interactions, and spatial or temporal information. *
Vulnerability management Vulnerability management is the "cyclical practice of identifying, classifying, prioritizing, remediating, and mitigating" software vulnerabilities. Vulnerability management is integral to computer security and network security, and must not be ...
– A user's
vulnerability Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." A window of vulnerability (WOV) is a time frame within which defensive measures are diminished, com ...
on a social networking sites can be managed in three sequential steps: (1) identifying new ways in which a user can be vulnerable, (2) quantifying or measuring a user's vulnerability, and (3) reducing or mitigating them. *Opinion mining on candidates/parties - Social media is a popular medium for candidates/parties to campaign and for gauging the public reaction to the campaigns. Social media can also be used as an indicator of the voters' opinion. Some research studies have shown that predictions made using social media posts can match (or even improve) traditional opinion polls.


Publication venues

Social media mining research articles are published in computer science, social science, and data mining conferences and journals:


Conferences

Conference papers can be found in proceedings of Knowledge Discovery and Data Mining (KDD), World Wide Web (WWW), Association for Computational Linguistics (ACL), Conference on Information and Knowledge Management (CIKM), International Conference on Data Mining (ICDM), Internet Measuring Conference (IMC). * KDD Conference – ACM SIGKDD
Conference on Knowledge Discovery and Data Mining SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference. Conference history The KDD Conference grew from KDD (Knowledge Di ...
* WWW Conference
International World Wide Web Conference The ACM Web Conference (formerly known as International World Wide Web Conference, abbreviated as WWW) is a yearly international academic conference on the topic of the future direction of the World Wide Web. The first conference of many was hel ...
* WSDM Conference – ACM Conference on Web Search and Data Mining * CIKM Conference – ACM
Conference on Information and Knowledge Management The ACM Conference on Information and Knowledge Management (CIKM, pronounced ) is an annual computer science research conference dedicated to information management (IM) and knowledge management (KM). Since the first event in 1992, the conference ...
* ICDM Conference –
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
International Conference on Data Mining * Association for Computational Linguistics (ACL) * ASONAM conference - IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining * Internet Measuring Conference (IMC) * International Conference on Web and Social Media (ICWSM) * International Conference on Social Media & Society * International Conference on Web Engineering (ICWE) * The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases(ECML/PKDD), * International Joint Conferences on Artificial Intelligence (IJCAI), * Association for the Advancement of Artificial Intelligence (AAAI), * Recommender Systems (RecSys) * Computer-Human Interaction (CHI) * Social Computing Behavioral-Cultural Modeling and Prediction (SBP). * HT Conference – ACM Conference on Hypertext * SDM Conference – SIAM International Conference on Data Mining (
SIAM Thailand ( ), historically known as Siam () and officially the Kingdom of Thailand, is a country in Southeast Asia, located at the centre of the Indochinese Peninsula, spanning , with a population of almost 70 million. The country is bo ...
) * PAKDD Conference – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining


Journals

* DMKD Conference – Research Issues on Data Mining and Knowledge Discovery * ECML-PKDD Conference – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases * IEEE Transactions on Knowledge and Data Engineering (TKDE), * ACM Transactions on Knowledge Discovery from Data (TKDD) * ACM Transactions on Intelligent Systems and Technology (TIST) * Social Network Analysis and Mining (SNAM) * Knowledge and Information Systems (KAIS) * ACM Transactions on the Web (TWEB) * World Wide Web Journal * Social Networks * Internet Mathematics * IEEE Intelligent Systems * SIGKDD Exploration. Social media mining is also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and
International Conference on Very Large Data Bases International Conference on Very Large Data Bases or VLDB conference is an annual conference held by the non-profit ''Very Large Data Base Endowment Inc.'' While named after very large databases, the conference covers the research and development ...
.


See also

; Methods * Social media measurement *
Text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
; Application domains * Web mining * Twitter mining ; Companies * NUVI ; Related topics *
Social media Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social me ...
*
Profiling (information science) In information science, profiling refers to the process of construction and application of user profiles generated by computerized data analysis. This is the use of algorithms or other mathematical techniques that allow the discovery of patter ...
*
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scrapin ...
*
GDPR The General Data Protection Regulation (GDPR) is a European Union regulation on data protection and privacy in the EU and the European Economic Area (EEA). The GDPR is an important component of EU privacy law and of human rights law, in parti ...


References


External links

* Zafarani, Reza; Abbasi, Mohammad Ali; and Liu, Huan (2014)
Social Media Mining: An Introduction
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by King Henry VIII in 1534, it is the oldest university press in the world. It is also the King's Printer. Cambridge University Pr ...
* {{DEFAULTSORT:Social Media Mining Data analysis Formal sciences Social media Social media management Mass media monitoring Social information processing Business intelligence Big data Data mining