List Of Large Language Models

	List Of Large Language Models A large language model (LLM) is a type of machine learning Model#Conceptual model, model designed for natural language processing tasks such as language Generative artificial intelligence, generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. This page lists notable large language models. List For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written. See also * List of chatbots * List of language model benchmarks Notes References {{Authority control Software comparisons Large language models ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Large Language Model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pretrained transformers (GPTs), which are largely used in generative chatbots such as ChatGPT or Gemini. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in. History Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling. A sm ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of Generative pre-trained transformer, GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019. GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to machine translation, translate texts, question answering, answer questions about a topic from a text, automatic summarization, summarize passages from a larger text, and natural language generation, generate text output on a level sometimes Turing test, indistinguishable from that of humans; however, it ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is headquartered in London, with research centres in the United States, Canada, France, Germany, and Switzerland. DeepMind introduced neural Turing machines (neural networks that can access external memory like a conventional Turing machine), resulting in a computer that loosely resembles short-term memory in the human brain. DeepMind has created neural network models to play video games and board games. It made headlines in 2016 after its AlphaGo program beat a human professional Go player Lee Sedol, a world champion, in a five-game match, which was the subject of a documentary film. A more general program, AlphaZer ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mixture Of Experts Mixture of experts (MoE) is a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. MoE represents a form of ensemble learning. They were also called committee machines. Basic theory MoE always has the following components, but they are implemented and combined differently according to the problem being solved: * Experts f_1, ..., f_n, each taking the same input x, and producing outputs f_1(x), ..., f_n(x). * A weighting function (also known as a gating function) w, which takes input x and produces a vector of outputs (w(x)_1, ..., w(x)_n). This may or may not be a probability distribution, but in both cases, its entries are non-negative. * \theta = (\theta_0, \theta_1, ..., \theta_n) is the set of parameters. The parameter \theta_0 is for the weighting function. The parameters \theta_1, \dots, \theta_n are for the experts. * Given an input x, the mixture of experts produces a single output by combinin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Anthropic Anthropic PBC is an American artificial intelligence (AI) startup company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's Gemini. According to the company, it researches and develops AI to "study their safety properties at the technological frontier" and use this research to deploy safe models for the public. Anthropic was founded by former members of OpenAI, including siblings Daniela Amodei and Dario Amodei. In September 2023, Amazon announced an investment of up to $4 billion, followed by a $2 billion commitment from Google in the following month. History Founding and early development (2021–2022) Anthropic was founded in 2021 by seven former employees of OpenAI, including siblings Daniela Amodei and Dario Amodei, the latter of whom served as OpenAI's Vice President of Research. In April 2022, Anthropic announced it had received $580 million in funding, including a $500 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Claude (language Model) Claude is a family of large language models developed by Anthropic. The first model was released in March 2023. The Claude 3 family, released in March 2024, consists of three models: Haiku, optimized for speed; Sonnet, which balances capability and performance; and Opus, designed for complex reasoning tasks. These models can process both text and images, with Claude 3 Opus demonstrating enhanced capabilities in areas like mathematics, programming, and logical reasoning Logical reasoning is a mind, mental Action (philosophy), activity that aims to arrive at a Logical consequence, conclusion in a Rigour, rigorous way. It happens in the form of inferences or arguments by starting from a set of premises and reason ... compared to previous versions. Claude 4, which includes Opus and Sonnet, was released in May 2025. Training Claude models are generative pre-trained transformers. They have been pre-trained to predict the next word in large amounts of text. Then, they have ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Ernie Bot Ernie Bot ( zh, c=文心一言, Pinyin: ), full name Enhanced Representation through Knowledge Integration, is an AI chatbot service product of Baidu, released in 2023. It is built on a large language model called ERNIE, which has been in development since 2019. Version, ERNIE 4.0, was announced on October 17, 2023. Version 4.5 and reasoning model ERNIE X1 were released free of charge on March 16, 2025. History Ernie Bot was initially released for invited testing on March 16, 2023, based on "Ernie 3.0", a large language model that had been in development since 2019. Ernie's so-called live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent the same day. The company's stock gained 14 percent on the next day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it a preliminary thumbs-up. Ernie Bot was released to the public after receiving the green light from Chinese regulatory authorities on August 31, 2023. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Baidu Baidu, Inc. ( ; ) is a Chinese multinational technology company specializing in Internet services and artificial intelligence. It holds a dominant position in China's search engine market (via Baidu Search), and provides a wide variety of other internet services such as Baidu App (Baidu's flagship app for search and newsfeed), Baidu Baike (an online user created Wikipedia-like encyclopedia), iQIYI (a video streaming service), and Baidu Tieba (a keyword-based discussion forum similar to Reddit). Besides its core internet search business, Baidu has diversified into several high-growth areas. The company is a leading player in autonomous driving (Baidu Apollo), and smart consumer electronics (Xiaodu). With over a decade of investment in artificial intelligence, Baidu is one of the few tech companies globally to offer a full-service AI stack, including software, chips, cloud infrastructure, foundation models, and applications. The holding company of the group is incorpo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Selene (supercomputer) Selene is a supercomputer developed by Nvidia, capable of achieving 63.460 petaflops, ranking as the fifth fastest supercomputer in the world, when it entered the list. Selene is based on the Nvidia DGX system consisting of AMD CPUs, Nvidia A100 GPUs, and Mellanox HDDR networking. Selene is based on the Nvidia DGX Superpod, which is a high performance turnkey supercomputer solution provided by Nvidia using DGX hardware. DGX Superpod is a tightly integrated system that combines high performance DGX compute nodes with fast storage and high bandwidth networking. It aims to provide a turnkey solution to high-demand machine learning workloads. Selene was built in three months and is the fastest industrial system in the US while being the second-most energy-efficient supercomputing system ever. Selene utilizing 1080 AMD Epyc CPUs and 4320 A100 GPUs is used to train BERT, the natural language processor, in less than 16 seconds, which usually takes most smaller systems about 20 minutes ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curtis Priem, it designs and supplies graphics processing units (GPUs), application programming interfaces (APIs) for data science and high-performance computing, and system on a chip units (SoCs) for mobile computing and the automotive market. Nvidia is also a leading supplier of artificial intelligence (AI) hardware and software. Nvidia outsources the manufacturing of the hardware it designs. Nvidia's professional line of GPUs are used for edge-to-cloud computing and in supercomputers and workstations for applications in fields such as architecture, engineering and construction, media and entertainment, automotive, scientific research, and manufacturing design. Its GeForce line of GPUs are aimed at the consumer market and are used in ap ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The early 1980s and home computers, rise of personal computers through software like Windows, and the company has since expanded to Internet services, cloud computing, video gaming and other fields. Microsoft is the List of the largest software companies, largest software maker, one of the Trillion-dollar company, most valuable public U.S. companies, and one of the List of most valuable brands, most valuable brands globally. Microsoft was founded by Bill Gates and Paul Allen to develop and sell BASIC interpreters for the Altair 8800. It rose to dominate the personal computer operating system market with MS-DOS in the mid-1980s, followed by Windows. During the 41 years from 1980 to 2021 Microsoft released 9 versions of MS-DOS with a median frequen ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GPT-J GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021. Architecture GPT-J is a GPT-3-like model with 6 billion parameters. Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue. Its architecture differs from GPT-3 in three main ways. * The attention and feedforward neural network were computed in parallel during training, allowing for greater efficiency. * The GPT-J model uses rotary position embeddings, which has been found to be a superior method of injecting positional informat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]