SAS (software)
   HOME

TheInfoList



OR:

SAS (previously "Statistical Analysis System") is a
statistical software Statistical software are specialized computer programs for analysis in statistics and econometrics. Open-source * ADaMSoft – a generalized statistical software with data mining algorithms and methods for data management * ADMB – a software ...
suite developed by
SAS Institute SAS Institute (or SAS, pronounced "sass") is an American multinational developer of analytics software based in Cary, North Carolina. SAS develops and markets a suite of analytics software ( also called SAS), which helps access, manage, ana ...
for
data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to r ...
, advanced analytics,
multivariate analysis Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...
,
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
,
criminal investigation Criminal investigation is an applied science that involves the study of facts that are then used to inform criminal trials. A complete criminal investigation can include searching, interviews, interrogations, evidence collection and preservatio ...
, and
predictive analytics Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. In business ...
. SAS was developed at
North Carolina State University North Carolina State University (NC State) is a public land-grant research university in Raleigh, North Carolina. Founded in 1887 and part of the University of North Carolina system, it is the largest university in the Carolinas. The universit ...
from 1966 until 1976, when SAS Institute was incorporated. SAS was further developed in the 1980s and 1990s with the addition of new statistical procedures, additional components and the introduction of JMP. A point-and-click interface was added in version 9 in 2004. A
social media analytics Social media analytics is the process of gathering and analyzing data from social networks such as Facebook, Instagram, LinkedIn, or Twitter. A part of social media analytics is called social media monitoring or social listening. It is commonly ...
product was added in 2010.


Technical overview and terminology

SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more through the
SAS language The SAS language is a computer programming language used for statistical analysis, created by Anthony James Barr at North Carolina State University.Barr & Goodnight, et al. 1976:"The SAS Staff". Attribution of contributions to SAS 72 and SAS 76. It ...
. SAS programs have DATA steps, which retrieve and manipulate data, and PROC steps, which analyze the data. Each step consists of a series of statements. The DATA step has executable statements that result in the software taking an action, and declarative statements that provide instructions to read a data set or alter the data's appearance. The DATA step has two phases: compilation and execution. In the compilation phase, declarative statements are processed and syntax errors are identified. Afterwards, the execution phase processes each executable statement sequentially. Data sets are organized into tables with rows called "observations" and columns called "variables". Additionally, each piece of data has a descriptor and a value. The PROC step consists of PROC statements that call upon named procedures. Procedures perform analysis and reporting on data sets to produce statistics, analyses, and graphics. There are more than 300 named procedures and each one contains a substantial body of programming and statistical work. PROC statements can also display results, sort data or perform other operations. SAS macros are pieces of code or variables that are coded once and referenced to perform repetitive tasks. SAS data can be published in HTML, PDF, Excel, RTF and other formats using the Output Delivery System, which was first introduced in 2007. The SAS Enterprise Guide is SAS's point-and-click interface. It generates code to manipulate data or perform analysis automatically and does not require SAS programming experience to use. The SAS software suite has more than 200 components Some of the SAS components include:


History


Origins

The development of SAS began in 1966 after
North Carolina State University North Carolina State University (NC State) is a public land-grant research university in Raleigh, North Carolina. Founded in 1887 and part of the University of North Carolina system, it is the largest university in the Carolinas. The universit ...
re-hired Anthony Barr to program his analysis of variance and regression software so that it would run on
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applica ...
computers. The project was funded by the
National Institutes of Health The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
. and was originally intended to analyze agricultural data to improve crop yields. Barr was joined by student
James Goodnight James Howard Goodnight (born January 6, 1943) is an American billionaire businessman and software developer. He has been the CEO of SAS Institute since 1976, which he co-founded that year with other faculty members of North Carolina State Univer ...
, who developed the software's statistical routines, and the two became project leaders. In 1968, Barr and Goodnight integrated new
multiple regression In statistical modeling, regression analysis is a set of statistical processes for Estimation theory, estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning ...
and
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
routines. In 1972, after issuing the first release of SAS, the project lost its funding. According to Goodnight, this was because NIH only wanted to fund projects with medical applications. Goodnight continued teaching at the university for a salary of $1 and access to mainframe computers for use with the project, until it was funded by the
University Statisticians of the Southern Experiment Stations The University Statisticians of the Southern Experiment Stations (USSES) was a coalition of southern Universities formed in the mid-1960s for the purpose of coordinating efforts in the development of statistical software. This coalition was large ...
the following year.
John Sall John P. Sall (born 1948) is an American billionaire businessman and computer software developer, who co-founded SAS Institute and created the JMP statistical software. Sall grew up in Rockford, Illinois and earned degrees in history, economics ...
joined the project in 1973 and contributed to the software's econometrics, time series, and matrix algebra. Another early participant, Caroll G. Perkins, contributed to SAS' early programming. Jolayne W. Service and Jane T. Helwig created SAS' first documentation. The first versions of SAS were named after the year in which they were released. In 1971, SAS 71 was published as a limited release. It was used only on IBM mainframes and had the main elements of SAS programming, such as the DATA step and the most common procedures in the PROC step. The following year a full version was released as SAS 72, which introduced the MERGE statement and added features for handling missing data or combining data sets. In 1976, Barr, Goodnight, Sall, and Helwig removed the project from North Carolina State and incorporated it into SAS Institute, Inc.


Development

SAS was redesigned in SAS 76 with an
open architecture Open architecture is a type of computer architecture or software architecture intended to make adding, upgrading, and swapping components with other computers easy. For example, the IBM PC, Amiga 500 and Apple IIe have an open architecture support ...
that allowed for
compilers In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and procedures. The INPUT and INFILE statements were improved so they could read most data formats used by IBM mainframes. Generating reports was also added through the PUT and FILE statements. The ability to analyze
general linear model The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regre ...
s was also added as was the FORMAT procedure, which allowed developers to customize the appearance of data. In 1979, SAS 79 added support for the CMS operating system and introduced the DATASETS procedure. Three years later, SAS 82 introduced an early macro language and the APPEND procedure. SAS version 4 had limited features, but made SAS more accessible. Version 5 introduced a complete macro language, array subscripts, and a full-screen interactive user interface called Display Manager. In 1985, SAS was rewritten in the
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
. This allowed for the SAS' Multivendor Architecture that allows the software to run on
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
,
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few ope ...
, and
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
. It was previously written in
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
, Fortran, and
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
. In the 1980s and 1990s, SAS released a number of components to complement Base SAS. SAS/GRAPH, which produces graphics, was released in 1980, as well as the SAS/ETS component, which supports econometric and time series analysis. A component intended for pharmaceutical users, SAS/PH-Clinical, was released in the 1990s. The
Food and Drug Administration The United States Food and Drug Administration (FDA or US FDA) is a List of United States federal agencies, federal agency of the United States Department of Health and Human Services, Department of Health and Human Services. The FDA is respon ...
standardized on SAS/PH-Clinical for new drug applications in 2002. Vertical products like SAS Financial Management and SAS Human Capital Management (then called CFO Vision and HR Vision respectively) were also introduced. JMP was developed by SAS co-founder
John Sall John P. Sall (born 1948) is an American billionaire businessman and computer software developer, who co-founded SAS Institute and created the JMP statistical software. Sall grew up in Rockford, Illinois and earned degrees in history, economics ...
and a team of developers to take advantage of the graphical user interface introduced in the 1984
Apple Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
and shipped for the first time in 1989. Updated versions of JMP were released continuously after 2002 with the most recent release being from 2016. SAS version 6 was used throughout the 1990s and was available on a wider range of operating systems, including
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
,
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
,
Silicon Graphics Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
, and
PRIMOS PRIMOS is a discontinued operating system developed during the 1970s by Prime Computer for its minicomputer systems. It rapidly gained popularity and by the mid-1980s was a serious contender as a mainline minicomputer operating system. With ...
. SAS introduced new features through dot-releases. From 6.06 to 6.09, a user interface based on the windows paradigm was introduced and support for SQL was added. Version 7 introduced the Output Delivery System (ODS) and an improved text editor. ODS was improved upon in successive releases. For example, more output options were added in version 8. The number of operating systems that were supported was reduced to
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
,
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
z/OS z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions.Starting with the earliest: * O ...
, and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
was added. SAS version 8 and SAS Enterprise Miner were released in 1999.


Recent history

In 2002, the Text Miner software was introduced. Text Miner analyzes text data like emails for patterns in
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
applications. In 2004, SAS Version 9.0 was released, which was dubbed "Project Mercury" and was designed to make SAS accessible to a broader range of business users. Version 9.0 added custom user interfaces based on the user's role and established the point-and-click user interface of SAS Enterprise Guide as the software's primary
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
(GUI). The
Customer Relationship Management Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information. CRM systems compile data from a ra ...
(CRM) features were improved in 2004 with SAS Interaction Management. In 2008 SAS announced Project Unity, designed to integrate data quality, data integration and master data management.
SAS Institute Inc v World Programming Ltd The SAS Institute, creators of the SAS System filed a lawsuit against World Programming Limited, creators of World Programming System (WPS) in November 2009. The dispute was whether World Programming had infringed copyrights on SAS Institute Pr ...
was a lawsuit with developers of a competing implementation,
World Programming System The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data a ...
, alleging that they had infringed SAS's copyright in part by implementing the same functionality. This case was referred from the United Kingdom's
High Court of Justice The High Court of Justice in London, known properly as His Majesty's High Court of Justice in England, together with the Court of Appeal of England and Wales, Court of Appeal and the Crown Court, are the Courts of England and Wales, Senior Cou ...
to the
European Court of Justice The European Court of Justice (ECJ, french: Cour de Justice européenne), formally just the Court of Justice, is the supreme court of the European Union in matters of European Union law. As a part of the Court of Justice of the European Un ...
on 11 August 2010. In May 2012, the
European Court of Justice The European Court of Justice (ECJ, french: Cour de Justice européenne), formally just the Court of Justice, is the supreme court of the European Union in matters of European Union law. As a part of the Court of Justice of the European Un ...
ruled in favor of World Programming, finding that "the functionality of a computer program and the programming language cannot be protected by copyright." A free version was introduced for students in 2010. SAS Social Media Analytics, a tool for social media monitoring, engagement and
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
, was also released that year. SAS Rapid Predictive Modeler (RPM), which creates basic analytical models using
Microsoft Excel Microsoft Excel is a spreadsheet developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android and iOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro (comp ...
, was introduced that same year. JMP 9 in 2010 added a new interface for using the
R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
from JMP and an add-in for Excel. The following year, a
High Performance Computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multid ...
appliance was made available in a partnership with
Teradata Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech ...
and
EMC Greenplum Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired ...
. In 2011, the company released Enterprise Miner 7.1. The company introduced 27 data management products from October 2013 to October 2014 and updates to 160 others. At the 2015 SAS Global Forum, it announced several new products that were specialized for different industries, as well as new training software.


Releases date

SAS had many releases since 1972. Since release 9.3, SAS/STAT has its own release numbering.


Software products

As of 2011 SAS's largest set of products is its line for
customer intelligence Customer intelligence (CI) as part of business intelligence is the process of gathering and analyzing information regarding customers, and their details and activities, to build deeper and more effective customer relationships and improve decisi ...
. Numerous SAS modules for web, social media and marketing analytics may be used to profile customers and prospects, predict their behaviors and manage and optimize communications. SAS also provides the SAS Fraud Framework. The framework's primary functionality is to monitor transactions across different applications, networks and partners and use analytics to identify anomalies that are indicative of fraud. SAS Enterprise GRC (Governance, Risk and Compliance) provides risk modeling, scenario analysis and other functions in order to manage and visualize risk, compliance and corporate policies. There is also a SAS Enterprise Risk Management product-set designed primarily for banks and financial services organizations. SAS' products for monitoring and managing the operations of IT systems are collectively referred to as SAS IT Management Solutions. SAS collects data from various IT assets on performance and utilization, then creates reports and analyses. SAS' Performance Management products consolidate and provide graphical displays for
key performance indicators A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity (such as projects, programs, products and other initiatives) in which it en ...
(KPIs) at the employee, department and organizational level. The SAS Supply Chain Intelligence product suite is offered for supply chain needs, such as forecasting product demand, managing distribution and inventory and optimizing pricing. There is also a "SAS for Sustainability Management" set of software to forecast environmental, social and economic effects and identify causal relationships between operations and an impact on the environment or ecosystem. SAS has product sets for specific industries, such as government, retail, telecommunications and aerospace and for marketing optimization or
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
.


Free University Edition

SAS previously offered a Free University Edition which could be downloaded by anyone for non-commercial use. The first announcement regarding this Free University Edition seems to have appeared in newspapers on 28 May 2014. However, in 2022, SAS Free University Edition was replaced by two entirely web-based versions: SAS OnDemand for Academics and SAS Viya for Learners.


Comparison to other products

In a 2005 article for the ''
Journal of Marriage and Family The ''Journal of Marriage and Family'' is a peer-reviewed academic journal published by Wiley-Blackwell on behalf of the National Council on Family Relations. It was established in 1939 as ''Living'', renamed to ''Marriage and Family Living'' in ...
'' comparing statistical packages from SAS and its competitors Stata and
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
, Alan C. Acock wrote that SAS programs provide "extraordinary range of data analysis and data management tasks," but were difficult to use and learn. SPSS and Stata, meanwhile, were both easier to learn (with better documentation) but had less capable analytic abilities, though these could be expanded with paid (in SPSS) or free (in Stata) add-ons. Acock concluded that SAS was best for
power user A power user is a user of computers, software and other electronic devices, who uses advanced features of computer hardware, operating systems, programs, or websites which are not used by the average user. A power user might not have extensive tec ...
s, while occasional users would benefit most from SPSS and Stata. A 2014 comparison by the
University of California, Los Angeles The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...
, gave similar results. Competitors such as
Revolution Analytics Revolution Analytics (formerly REvolution Computing) is a statistical software company focused on developing open source and "open-core" versions of the free and open source software R for enterprise, academic and analytics customers. Revolution ...
and Alpine Data Labs advertise their products as considerably cheaper than SAS'. In a 2011 comparison, Doug Henschen of ''
InformationWeek ''InformationWeek'' is a digital magazine which conducts corresponding face-to-face events, virtual events, and research. It is headquartered in San Francisco, California and was first published in 1985 by CMP Media, later called Informa. The ...
'' found that start-up fees for the three are similar, though he admitted that the starting fees were not necessarily the best basis for comparison. SAS' business model is not weighted as heavily on initial fees for its programs, instead focusing on revenue from annual subscription fees.


Adoption

According to IDC, SAS is the largest market-share holder in "advanced analytics" with 35.4 percent of the market as of 2013. It is the fifth largest market-share holder for business intelligence (BI) software with a 6.9% share and the largest independent vendor. It competes in the BI market against conglomerates, such as SAP BusinessObjects, IBM Cognos,
SPSS Modeler IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining a ...
,
Oracle Hyperion Hyperion Solutions Corporation was a software company located in Santa Clara, California, which was acquired by Oracle Corporation in 2007. Many of its products were targeted at the business intelligence (BI) and business performance management ...
, and
Microsoft Power BI Power BI is an interactive data visualization software product developed by Microsoft with a primary focus on business intelligence. It is part of the Microsoft Power Platform. Power BI is a collection of software services, apps, and connectors t ...
. SAS has been named in the Gartner Leader's Quadrant for Data Integration Tool and for Business Intelligence and Analytical Platforms. A study published in 2011 in ''
BMC Health Services Research ''BMC Health Services Research'' is an open access healthcare journal, which covers research on the subject of health services. It was established in 2001 and is published by BioMed Central. Abstracting and indexing ''BMC Health Services Rese ...
'' found that SAS was used in 42.6 percent of data analyses in health service research, based on a sample of 1,139 articles drawn from three journals.


See also

*
Comparison of numerical-analysis software The following tables provide a comparison of numerical-analysis software. Applications General Operating system support The operating systems the software can run on natively (without emulation). Language features Colors indicate ...
* Comparison of OLAP servers *
JMP (statistical software) JMP (pronounced "jump") is a suite of computer programs for statistical analysis developed by JMP, a subsidiary of SAS Institute. It was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating s ...
, also from SAS Institute Inc. *
SAS language The SAS language is a computer programming language used for statistical analysis, created by Anthony James Barr at North Carolina State University.Barr & Goodnight, et al. 1976:"The SAS Staff". Attribution of contributions to SAS 72 and SAS 76. It ...
*
R (programming language) R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...


References

SAS' revenue up 12% in 2011


Further reading

* * Wikiversity:Data Analysis using the SAS Language


External links

*
SAS OnDemand for Academics
No-cost access for learners (free SAS Profile required)


SAS for Developers

SAS community forums
{{Good article Fourth-generation programming languages Articles with example code Business intelligence C (programming language) software Criminal investigation Data mining and machine learning software Data warehousing Extract, transform, load tools Mathematical optimization software Numerical software Proprietary commercial software for Linux Proprietary cross-platform software Science software for Linux