The SAS language is a
fourth-generation computer
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
used for statistical analysis, created by
Anthony James Barr at
North Carolina State University
North Carolina State University (NC State, North Carolina State, NC State University, or NCSU) is a public university, public Land-grant university, land-grant research university in Raleigh, North Carolina, United States. Founded in 1887 and p ...
. Its primary applications include
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
and
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. The SAS language runs under
compilers
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
such as the
SAS System
SAS (previously "Statistical Analysis System") is a statistical software suite developed by SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, and predictive analytics.
SAS was developed at No ...
that can be used on
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
UNIX
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and
mainframe computers
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
.
History
SAS was developed in the 1960s by
Anthony James Barr, who built its fundamental structure, and
SAS Institute CEO
James Goodnight, who developed a number of features including analysis procedures.
The language is currently developed and sponsored by the
SAS Institute, of which Goodnight is founder and CEO.
Language
Base SAS is a
fourth-generation procedural programming
Procedural programming is a programming paradigm, classified as imperative programming, that involves implementing the behavior of a computer program as Function (computer programming), procedures (a.k.a. functions, subroutines) that call each o ...
language designed for the statistical analysis of data. It is
Turing-complete
In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...
and domain specific, with many of the attributes of a
command language
A command language is a language for job control in computing. It is a domain-specific and interpreted language; common examples of a command language are shell or batch programming languages.
These languages can be used directly at the ...
. As an
interpreted language
In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An inter ...
, it is generally parsed, compiled, and executed step by step.
The SAS system was originally a
single instruction, single data
In computing, single instruction stream, single data stream (SISD) is a computer architecture in which a single uni-core processor executes a single instruction stream, to operate on data stored in a single memory. This corresponds to the von ...
(SISD) engine, but
single instruction, multiple data
Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
(SIMD) and
multiple instruction, multiple data (MIMD) functionality was later added.
Most base SAS code can be ported between versions, but some are functions and parameters are specific to certain operating systems and interfaces.
All SAS programs are written within the SAS language, although some packages use menu-driven
graphical user interfaces
A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
on the
front-end.
Various SAS editors use color coding to identify components like step boundaries, keywords and constants. It can read in data from common spreadsheets and databases and output the results of statistical analyses in tables, graphs, and as
RTF,
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
and
PDF
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
documents.
Syntax
The language consists of two main types of blocks: DATA blocks and PROC blocks.
DATA blocks can be used to read and manipulate input data, and create data sets. PROC blocks are used to perform analyses and operations on these data sets, sort data, and output results in the form of
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
, tables, results, charts and plots.
PROC SQL can be used to work with
SQL
Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel")
is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
syntax within SAS.
Users can input both numeric and character data into base SAS. SAS statements must begin with a reserved keyword and end with
but the language is otherwise flexible in terms of formatting and most statements are
case insensitive.
SAS statements can continue across multiple lines and do not require indenting, although indents can improve readability.
Comments are delimited by and .
A standard SAS program typically entails the definition of data, the creation of a
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
, and the performance of procedures such as analysis on that data set.
SAS scripts have the .sas extension.
A simple example of SAS code is the following
* COMMENT;
Data TEMP;
input X Y Z;
datalines;
1 2 3
5 6 7
;
run;
PROC PRINT DATA = TEMP;
RUN;
SAS macro language
The SAS
macro language is made available within base SAS software to reduce the amount of code, and create
code generators for building more versatile and flexible programs. The macro language can be used for functionalities as simple as symbolic substitution and as complex as
dynamic programming.
SAS macro is considered to be a rich language, although its overall syntax is very similar to that of base SAS. The names of macro variables in SAS are usually preceded by , while macro program statements are usually preceded by .
Software
SAS Institute develops a number of tools and software suites, also called SAS, which are used for creating programs in the language. These suites include
JMP,
SAS Viya, SAS Enterprise Guide and SAS Enterprise Miner.
In 2002,
World Programming also developed software that allows the execution of most SAS scripts.
Uses
The SAS language is used as a standard in many industries,
and was ranked #22 on the
TIOBE index in February 2024. It is especially widely used for
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
,
data mining
Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, and
data warehousing
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
in the finance, insurance, manufacturing, health care and pharmaceutical industries.
It has a high level of documentation and community support,
which has contributed to its uptake.
Machine learning
SAS is used for preparing input data, and building and optimizing
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms. Various models, such as
artificial neural networks
In machine learning, a neural network (also artificial neural network or neural net, abbreviated ANN or NN) is a computational model inspired by the structure and functions of biological neural networks.
A neural network consists of connected ...
(ANN),
convolutional neural networks
A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different type ...
and
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
models, are developed and trained in SAS. These are applied to areas such as
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
and
fraud detection. SAS has also been noted for its applications in the area of
decision intelligence.
Data mining and warehousing
While SAS was originally developed for
data analysis
Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
, it became an important language for
data storage
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are con ...
.
SAS is one of the primary languages used for data mining in business intelligence and statistics.
According to
Gartner
Gartner, Inc. is an American research and advisory firm focusing on business and technology topics. Gartner provides its products and services through research reports, conferences, and consulting. Its clients include large corporations, gover ...
's
Magic Quadrant and
Forrester Research
Forrester Research, Inc. is a research and advisory firm. Forrester serves clients in North America, Europe, and Asia Pacific. The firm is headquartered in Cambridge, Massachusetts, Cambridge, MA with global offices in Amsterdam, London, New D ...
, the SAS Institute is one of the largest vendors of data mining software.
See also
*
List of statistical packages
The following is a list of statistical software.
Open-source
* ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management
* ADMB – a software suite for non-linear statistical modeling based on C+ ...
*
Comparison of statistical packages
* ''
SAS Institute Inc v World Programming Ltd''
Notes
References
*
*
External links
Learn SAS Programmingcomp.soft-sys.sasat
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
Groups.
UK High Court Judgement on SAS LanguageSasopedia / SAS Language elementsSAS whitepaper search
{{Authority control
Statistical programming languages
SAS Institute