GOLD (parser)
   HOME

TheInfoList



OR:

GOLD is a
free Free may refer to: Concept * Freedom, having the ability to do something, without having to obey anyone/anything * Freethought, a position that beliefs should be formed only on the basis of logic, reason, and empiricism * Emancipate, to procur ...
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
system that is designed to support multiple programming languages.


Design

The system uses a DFA for lexical analysis and the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
algorithm for parsing. Both of these algorithms are state machines that use tables to determine actions. GOLD is designed around the principle of logically separating the process of generating the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
and DFA parse tables from the actual implementation of the parsing algorithms themselves. This allows parsers to be implemented in different programming languages while maintaining the same grammars and development process. The GOLD system consists of three logical components, the "Builder", the "Engine", and a "Compiled Grammar Table" file definition which functions as an intermediary between the Builder and the Engine.


Builder

The Builder is the primary component and main application of the system. The Builder is used to analyze the syntax of a language (specified as a grammar) and construct
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
and DFA tables. During this process, any ambiguities in the grammar will be reported. This is essentially the same task that is performed by compiler-compilers such as
YACC Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
and
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
. Once the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
and DFA parse tables are successfully constructed, the Builder can save this data into a Compiled Grammar Table file. This allows the information to be reopened later by the Builder or used in one of the Engines. Currently, the Builder component is only available for
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
32-bit operating systems. Some of the features of the Builder are: *
Freeware Freeware is software, most often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the f ...
license * State browsing * Integrated testing * Test multiple files wizard * Generate webpages (including hyperlinked syntax charts) * Generate skeleton programs using templates * Export grammars to
YACC Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
* Export tables to XML or formatted text


Compiled Grammar Table file

The Compiled Grammar Table file is used to store table information generated by the Builder.


Engines

Unlike the Builder, which only runs on a single platform, the Engine component is written for a specific programming language and/or development platform. The Engine implements the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
and DFA algorithms. Since different programming languages use different approaches to designing programs, each implementation of the Engine will vary. As a result, an implementation of the Engine written for
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (cl ...
6 will differ greatly from one written for
ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...
. Currently, Engines for GOLD have been implemented for the following programming languages / platforms. New Engines can be implemented using the source code for the existing Engines as the starting point. *
Assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
- Intel x86 *
ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...
* C# * D *
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracle ...
*
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
*
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
*
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
*
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (cl ...
*
Visual Basic .NET Visual Basic, originally called Visual Basic .NET (VB.NET), is a multi-paradigm, object-oriented programming language, implemented on .NET, Mono, and the .NET Framework. Microsoft launched VB.NET in 2002 as the successor to its original Visua ...
*
Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tria ...


Grammars

GOLD grammars are based directly on
Backus–Naur form In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats ...
,
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s, and set notation. The following grammar defines the syntax for a minimal general-purpose programming language called "Simple".
"Name"    = 'Simple'
"Author"  = 'Devin Cook'
"Version" = '2.1' 
"About"   = 'This is a very simple grammar designed for use in examples'

"Case Sensitive" = False 
"Start Symbol"   = 

 =  -  ' =  -  
Identifier    = *    

! String allows either single or double quotes

StringLiteral = ''  * ''
              ,  '"' * '"'

NumberLiteral = +('.'+)?

Comment Start = '/*'
Comment End   = '*/'
Comment Line  = '//' 
::= , ::= display , display read ID , assign ID '=' , while do end , if then end , if then else end ::= '>' , '<' , '<=' , '>=' , '

' , '<>' , ::= '+' , '-' , '&' , ::= '*' , '/' , ::= '-' , ::= Identifier , StringLiteral , NumberLiteral , '(' ')'


Development overview

The first step consists of writing and testing a grammar for the language being parsed. The grammar can be written using any text editor - such as Notepad or the editor that is built into the Builder. At this stage, no coding is required. Once the grammar is complete, it is analyzed by the Builder, the
LALR In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-right, ...
and DFA parse tables are constructed, and any ambiguities or problems with the grammar are reported. Afterwards, the tables are saved to a Compiled Grammar Table file to be used later by a parsing engine. At this point, the GOLD Parser Builder is no longer needed. In the final stage, the tables are read by an Engine. At this point, the development process is dependent on the selected implementation language.


References


External links

* * {{Web archive, title=GOLD Yahoo Group , url=https://web.archive.org/web/20201214134247/https://groups.yahoo.com/neo/groups/GOLDParser/info, date=December 14,2020 Parser generators Programming language implementation