HOME

TheInfoList



OR:

The SAS language is a fourth-generation computer
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
used for statistical analysis, created by Anthony James Barr at
North Carolina State University North Carolina State University (NC State, North Carolina State, NC State University, or NCSU) is a public university, public Land-grant university, land-grant research university in Raleigh, North Carolina, United States. Founded in 1887 and p ...
. Its primary applications include
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. The SAS language runs under
compilers In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
such as the
SAS System SAS (previously "Statistical Analysis System") is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, and predictive analytics. SAS was developed at No ...
that can be used on
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
UNIX Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and
mainframe computers A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
.


History

SAS was developed in the 1960s by Anthony James Barr, who built its fundamental structure, and SAS Institute CEO James Goodnight, who developed a number of features including analysis procedures. The language is currently developed and sponsored by the SAS Institute, of which Goodnight is founder and CEO.


Language

Base SAS is a fourth-generation
procedural programming Procedural programming is a programming paradigm, classified as imperative programming, that involves implementing the behavior of a computer program as Function (computer programming), procedures (a.k.a. functions, subroutines) that call each o ...
language designed for the statistical analysis of data. It is
Turing-complete In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...
and domain specific, with many of the attributes of a
command language A command language is a language for job control in computing. It is a domain-specific and interpreted language; common examples of a command language are shell or batch programming languages. These languages can be used directly at the ...
. As an
interpreted language In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An inter ...
, it is generally parsed, compiled, and executed step by step. The SAS system was originally a
single instruction, single data In computing, single instruction stream, single data stream (SISD) is a computer architecture in which a single uni-core processor executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von ...
(SISD) engine, but
single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
(SIMD) and multiple instruction, multiple data (MIMD) functionality was later added. Most base SAS code can be ported between versions, but some are functions and parameters are specific to certain operating systems and interfaces. All SAS programs are written within the SAS language, although some packages use menu-driven
graphical user interfaces A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
on the front-end. Various SAS editors use color coding to identify components like step boundaries, keywords and constants. It can read in data from common spreadsheets and databases and output the results of statistical analyses in tables, graphs, and as RTF,
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
and
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
documents.


Syntax

The language consists of two main types of blocks: DATA blocks and PROC blocks. DATA blocks can be used to read and manipulate input data, and create data sets. PROC blocks are used to perform analyses and operations on these data sets, sort data, and output results in the form of
descriptive statistics A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, tables, results, charts and plots. PROC SQL can be used to work with
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
syntax within SAS. Users can input both numeric and character data into base SAS. SAS statements must begin with a reserved keyword and end with but the language is otherwise flexible in terms of formatting and most statements are case insensitive. SAS statements can continue across multiple lines and do not require indenting, although indents can improve readability. Comments are delimited by and . A standard SAS program typically entails the definition of data, the creation of a
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
, and the performance of procedures such as analysis on that data set. SAS scripts have the .sas extension. A simple example of SAS code is the following * COMMENT; Data TEMP; input X Y Z; datalines; 1 2 3 5 6 7 ; run; PROC PRINT DATA = TEMP; RUN;


SAS macro language

The SAS macro language is made available within base SAS software to reduce the amount of code, and create code generators for building more versatile and flexible programs. The macro language can be used for functionalities as simple as symbolic substitution and as complex as dynamic programming. SAS macro is considered to be a rich language, although its overall syntax is very similar to that of base SAS. The names of macro variables in SAS are usually preceded by , while macro program statements are usually preceded by .


Software

SAS Institute develops a number of tools and software suites, also called SAS, which are used for creating programs in the language. These suites include JMP, SAS Viya, SAS Enterprise Guide and SAS Enterprise Miner. In 2002, World Programming also developed software that allows the execution of most SAS scripts.


Uses

The SAS language is used as a standard in many industries, and was ranked #22 on the TIOBE index in February 2024. It is especially widely used for
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, and
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
in the finance, insurance, manufacturing, health care and pharmaceutical industries. It has a high level of documentation and community support, which has contributed to its uptake.


Machine learning

SAS is used for preparing input data, and building and optimizing
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms. Various models, such as
artificial neural networks In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks. A neural network consists of connected ...
(ANN),
convolutional neural networks A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different type ...
and
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
models, are developed and trained in SAS. These are applied to areas such as
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
and fraud detection. SAS has also been noted for its applications in the area of decision intelligence.


Data mining and warehousing

While SAS was originally developed for
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
, it became an important language for
data storage Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are con ...
. SAS is one of the primary languages used for data mining in business intelligence and statistics. According to
Gartner Gartner, Inc. is an American research and advisory firm focusing on business and technology topics. Gartner provides its products and services through research reports, conferences, and consulting. Its clients include large corporations, gover ...
's Magic Quadrant and
Forrester Research Forrester Research, Inc. is a research and advisory firm. Forrester serves clients in North America, Europe, and Asia Pacific. The firm is headquartered in Cambridge, Massachusetts, Cambridge, MA with global offices in Amsterdam, London, New D ...
, the SAS Institute is one of the largest vendors of data mining software.


See also

*
List of statistical packages The following is a list of statistical software. Open-source * ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management * ADMB – a software suite for non-linear statistical modeling based on C+ ...
* Comparison of statistical packages * '' SAS Institute Inc v World Programming Ltd''


Notes


References

* *


External links


Learn SAS Programming

comp.soft-sys.sas
at
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
Groups.
UK High Court Judgement on SAS Language

Sasopedia / SAS Language elements

SAS whitepaper search
{{Authority control Statistical programming languages SAS Institute