Perl language structure
   HOME

TheInfoList



OR:

The structure of the
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
programming language encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "
there's more than one way to do it There's more than one way to do it (TMTOWTDI or TIMTOWTDI, pronounced ''Tim Toady'') is a Perl programming motto. The language was designed with this idea in mind, in that it “doesn't try to tell the programmer how to program.” As proponents of ...
". As a multi-paradigm, dynamically typed language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots, and is responsible for the size of the
CPAN The Comprehensive Perl Archive Network (CPAN) is a repository of over 250,000 software modules and accompanying documentation for 39,000 distributions, written in the Perl programming language by over 12,000 contributors. ''CPAN'' can denote eit ...
archive, a community-maintained repository of more than 100,000 modules.


Basic syntax

In Perl, the minimal
Hello World ''Hello'' is a salutation or greeting in the English language. It is first attested in writing from 1826. Early uses ''Hello'', with that spelling, was used in publications in the U.S. as early as the 18 October 1826 edition of the '' Norwich ...
program may be written as follows: print "Hello, World!\n" This prints the string ''Hello, World!'' and a newline, symbolically expressed by an n character whose interpretation is altered by the preceding escape character (a backslash). Since version 5.10, the new 'say' builtin produces the same effect even more simply: say "Hello, World!" An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix): $ perl -e 'print "Hello, World!\n"' The canonical form of the program is slightly more verbose: #!/usr/bin/perl print "Hello, World!\n"; The hash mark character introduces a
comment Comment may refer to: * Comment (linguistics) or rheme, that which is said about the topic (theme) of a sentence * Bernard Comment (born 1960), Swiss writer and publisher Computing * Comment (computer programming), explanatory text or informa ...
in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the shebang line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning perl. (Note that, on Microsoft Windows systems, Perl programs are typically invoked by associating the .pl extension with the Perl interpreter. In order to deal with such circumstances, perl detects the shebang line and parses it for switches.) The second line in the canonical form includes a semicolon, which is used to separate statements in Perl. With only a single statement in a block or file, a separator is unnecessary, so it can be omitted from the minimal form of the program—or more generally from the final statement in any block or file. The canonical form includes it, because it is common to terminate every statement even when it is unnecessary to do so, as this makes editing easier: code can be added to, or moved away from, the end of a block or file without having to adjust semicolons. Version 5.10 of Perl introduces a say function that implicitly appends a newline character to its output, making the minimal "Hello World" program even shorter: use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10 say 'Hello, World!'


Data types

Perl has a number of fundamental data types. The most commonly used and discussed are scalars,
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
s, hashes,
filehandle In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically have ...
s, and
subroutines In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions may ...
:


Scalar values

String values (literals) must be enclosed by quotes. Enclosing a string in double quotes allows the values of variables whose names appear in the string to automatically replace the variable name (or be
interpolated In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one often has a n ...
) in the string. Enclosing a string in single quotes prevents variable interpolation. For example, if $name is "Jim": *then print("My name is $name") will print "My name is Jim" (interpolation within double quotes), *but print('My name is $name') will print "My name is $name" (no interpolation within single quotes). To include a double quotation mark in a string, precede it with a backslash or enclose the string in single quotes. To include a single quotation mark, precede it with a backslash or enclose the string in double quotes. Strings can also be quoted with the q and qq quote-like operators: *'this' and q(this) are identical, *"$this" and qq($this) are identical. Finally, multiline strings can be defined using
here document In computing, a here document (here-document, here-text, heredoc, hereis, here-string or here-script) is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also ...
s: $multilined_string = < Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings $n and $m are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, + is always the numeric addition operator. The string concatenation operator is the period. $n = '3 apples'; $m = '2 oranges'; print $n + $m; Functions are provided for the
rounding Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with . Rounding is often done to ob ...
of fractional values to integer values: int chops off the fractional part, rounding towards zero; POSIX::ceil and POSIX::floor round always up and always down, respectively. The number-to-string conversion of printf "%f" or sprintf "%f" round out even, use bankers' rounding. Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl: $false = 0; # the number zero $false = 0.0; # the number zero as a float $false = 0b0; # the number zero in binary $false = 0x0; # the number zero in hexadecimal $false = '0'; # the string zero $false = ""; # the empty string $false = (); # the empty list $false = undef; # the return value from undef $false = 2-3+1 # computes to 0 that is converted to "0" so it is false All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '0E0' or '0e0', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero. The empty hash is also true; in this context is not an empty block, because perl -e 'print ref ' returns HASH. Evaluated boolean expressions are also scalar values. The documentation does not promise which ''particular'' value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The ''defined()'' function determines whether a variable has any value set. In the above examples, ''defined($false)'' is true for every value except ''undef''. If either 1 or 0 are specifically needed, an explicit conversion can be done using the
conditional operator The conditional operator is supported in many programming languages. This term usually refers to ?: as in C, C++, C#, and JavaScript. However, in Java, this term can also refer to && and , , . && and , , In some programming languages, e.g. Jav ...
: my $real_result = $boolean_result ? 1 : 0;


Array values

An array value (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence). @scores = (32, 45, 16, 5); The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent: @names = ('Billy', 'Joe', 'Jim-Bob'); @names = qw(Billy Joe Jim-Bob); The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression. @scores = split(',', '32,45,16,5'); Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar
sigil A sigil () is a type of symbol used in magic. The term has usually referred to a pictorial signature of a deity or spirit. In modern usage, especially in the context of chaos magic, sigil refers to a symbolic representation of the practitioner ...
must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, $month /code> is "April" (the first element in an array has an index value of 0), and @month ..6/code> is ("May", "June", "July").


Hash values

Perl programmers may initialize a hash (or
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
) from a list of key/value pairs. If the keys are separated from the values with the => operator (sometimes called a fat comma), rather than a comma, they may be unquoted (barewords). The following lines are equivalent: %favorite = ('joe', "red", 'sam', "blue"); %favorite = (joe => 'red', sam => 'blue'); Individual values in a hash are accessed by providing the corresponding key, in curly braces. The $ sigil identifies the accessed element as a scalar. For example, $favorite equals 'red'. A hash can also be initialized by setting its values individually: $favorite = 'red'; $favorite = 'blue'; $favorite = 'green'; Multiple elements may be accessed using the @ sigil instead (identifying the result as a list). For example, @favorite equals ('red', 'blue').


Filehandles

Filehandles provide read and write access to resources. These are most often files on disk, but can also be a device, a
pipe Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circular ...
, or even a scalar value. Originally, filehandles could only be created with package variables, using the ALL_CAPS convention to distinguish it from other variables. Perl 5.6 and newer also accept a scalar variable, which will be set ( autovivified) to a reference to an anonymous filehandle, in place of a named filehandle.


Typeglob values

A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example: *PI = \3.141592653; # creating constant scalar $PI *this = *that; # creating aliases for all data types 'this' to all data types 'that'


Array functions

The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the $# sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar(@array) and ($#array + 1) are equivalent.


Hash functions

There are a few functions that operate on entire hashes. The ''keys'' function takes a hash and returns the list of its keys. Similarly, the ''values'' function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order. # Every call to each returns the next key/value pair. # All values will be eventually returned, but their order # cannot be predicted. while (($name, $address) = each %addressbook) # Similar to the above, but sorted alphabetically foreach my $next_name (sort keys %addressbook)


Control structures

Perl has several kinds of control structures. It has block-oriented control structures, similar to those in the C,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
, and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
programming languages. Conditions are surrounded by parentheses, and controlled blocks are surrounded by braces: ''label'' while ( ''cond'' ) ''label'' while ( ''cond'' ) continue ''label'' for ( ''init-expr'' ; ''cond-expr'' ; ''incr-expr'' ) ''label'' foreach ''var'' ( ''list'' ) ''label'' foreach ''var'' ( ''list'' ) continue if ( ''cond'' ) if ( ''cond'' ) else if ( ''cond'' ) elsif ( ''cond'' ) else Where only a single statement is being controlled, statement modifiers provide a more-concise syntax: ''statement'' if ''cond'' ; ''statement'' unless ''cond'' ; ''statement'' while ''cond'' ; ''statement'' until ''cond'' ; ''statement'' foreach ''list'' ; Short-circuit logical operators are commonly used to affect control flow at the expression level: ''expr'' and ''expr'' ''expr'' && ''expr'' ''expr'' or ''expr'' ''expr'' , , ''expr'' (The "and" and "or" operators are similar to && and , , but have lower precedence, which makes it easier to use them to control entire statements.) The flow control keywords next (corresponding to C's continue), last (corresponding to C's break), return, and redo are expressions, so they can be used with short-circuit operators. Perl also has two implicit looping constructs, each of which has two forms: ''results'' = grep ''list'' ''results'' = grep ''expr'', ''list'' ''results'' = map ''list'' ''results'' = map ''expr'', ''list'' grep returns all elements of ''list'' for which the controlled block or expression evaluates to true. map evaluates the controlled block or expression for each element of ''list'' and returns a list of the resulting values. These constructs enable a simple
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
style. Up until the 5.10.0 release, there was no switch statement in Perl 5. From 5.10.0 onward, a multi-way branch statement called given/when is available, which takes the following form: use v5.10; # must be present to import the new 5.10 functions given ( ''expr'' ) Syntactically, this structure behaves similarly to switch statements found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit continues are instead necessary to emulate switch behavior. For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on that of sister language Raku. It is implemented using a source filter, so its use is unofficially discouraged. Perl includes a goto label statement, but it is rarely used. Situations where a goto is called for in other languages don't occur as often in Perl, because of its breadth of flow control options. There is also a goto &sub statement that performs a
tail call In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recur ...
. It terminates the current subroutine and immediately calls the specified ''sub''. This is used in situations where a caller can perform more-efficient stack management than Perl itself (typically because no change to the current stack is required), and in deep recursion, tail calling can have substantial positive impact on performance, because it avoids the overhead of scope/stack management on return.


Subroutines

Subroutines are defined with the sub keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand (&) before it. But using & without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using & with parentheses will bypass prototypes. # Calling a subroutine # Parentheses are required here if the subroutine is defined later in the code foo(); &foo; # (this also works, but has other consequences regarding arguments passed to the subroutine) # Defining a subroutine sub foo foo; # Here parentheses are not required A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes. foo $x, @y, %z; The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine. Arrays are expanded to their elements; hashes are expanded to a list of key/value pairs; and the whole lot is passed into the subroutine as one flat list of scalars. Whatever arguments are passed are available to the subroutine in the special array @_. The elements of @_ are references to the actual arguments; changing an element of @_ changes the corresponding argument. Elements of @_ may be accessed by subscripting it in the usual way. $_ $_ However, the resulting code can be difficult to read, and the parameters have
pass-by-reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
semantics, which may be undesirable. One common idiom is to assign @_ to a list of named variables. my ($x, $y, $z) = @_; This provides mnemonic parameter names and implements pass-by-value semantics. The my keyword indicates that the following variables are lexically scoped to the containing block. Another idiom is to shift parameters off of @_. This is especially common when the subroutine takes only one argument or for handling the $self argument in object-oriented modules. my $x = shift; Subroutines may assign @_ to a hash to simulate named arguments; this is recommended in '' Perl Best Practices'' for subroutines that are likely to ever have more than three parameters. sub function1 function1( x => 23 ); Subroutines may return values. return 42, $x, @y, %z; If the subroutine does not exit via a return statement, it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments. The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary. sub list sub array $x = list; # returns 6 - last element of list $x = array; # returns 3 - number of elements in list @x = list; # returns (4, 5, 6) @x = array; # returns (4, 5, 6) A subroutine can discover its calling context with the wantarray function. sub either $x = either; # returns "Oranges" @x = either; # returns (1, 2)


Regular expressions

The Perl language includes a specialized syntax for writing
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s (RE, or regexes), and the interpreter contains an engine for matching strings to regular expressions. The regular-expression engine uses a
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it d ...
algorithm, extending its capabilities from simple pattern matching to string capture and substitution. The regular-expression engine is derived from regex written by
Henry Spencer Henry Spencer (born 1955) is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely used software library for regular expressions, and co-wrote C News, a Usenet server program. He also wrote ''The Ten Commandments for C ...
. The Perl regular-expression syntax was originally taken from Unix Version 8 regular expressions. However, it diverged before the first release of Perl and has since grown to include far more features. Many other languages and applications are now adopting
Perl Compatible Regular Expressions Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax ...
over
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
regular expressions, such as
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, Microsoft's
.NET Framework The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
, and the Apache HTTP server. Regular-expression syntax is extremely compact, owing to history. The first regular-expression dialects were only slightly more expressive than globs, and the syntax was designed so that an expression would resemble the text that it matches. This meant using no more than a single punctuation character or a pair of delimiting characters to express the few supported assertions. Over time, the expressiveness of regular expressions grew tremendously, but the syntax design was never revised and continues to rely on punctuation. As a result, regular expressions can be cryptic and extremely dense.


Uses

The m// (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, the leading m may be omitted for brevity. If the m is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as $x =~ /abc/; evaluates to true
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is b ...
the string $x matches the regular expression abc. The s/// (substitute) operator, on the other hand, specifies a search-and-replace operation: $x =~ s/abc/aBc/; # upcase the b Another use of regular expressions is to specify delimiters for the split function: @words = split /,/, $line; The split function creates a list of the parts of the string that are separated by what matches the regular expression. In this example, a line is divided into a list of its own comma-separated parts, and this list is then assigned to the @words array.


Syntax


Modifiers

Perl regular expressions can take ''modifiers''. These are single-letter suffixes that modify the meaning of the expression: $x =~ /abc/i; # case-insensitive pattern match $x =~ s/abc/aBc/g; # global search and replace Because the compact syntax of regular expressions can make them dense and cryptic, the /x modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments ''inside'' regular expressions: $x =~ / a # match 'a' . # followed by any character c # then followed by the 'c' character /x;


Capturing

Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are ''captured''. Captured strings are assigned to the sequential built-in variables $1, $2, $3, …, and a list of captured strings is returned as the value of the match. $x =~ /a(.)c/; # capture the character between 'a' and 'c' Captured strings $1, $2, $3, … can be used later in the code. Perl regular expressions also allow built-in or user-defined functions to apply to the captured match, by using the /e modifier: $x = "Oranges"; $x =~ s/(ge)/uc($1)/e; # OranGEs $x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge


Objects

There are many ways to write
object-oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of p ...
code in Perl. The most basic is using "blessed" references. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way: sub Point::new sub Point::distance This class can be used by invoking new() to construct instances, and invoking distance on those instances. my $p1 = Point->new(3, 4); my $p2 = Point->new(0, 0); print $p1->distance($p2); # Prints 5 Many modern Perl applications use the
Moose The moose (in North America) or elk (in Eurasia) (''Alces alces'') is a member of the New World deer subfamily and is the only species in the genus ''Alces''. It is the largest and heaviest extant species in the deer family. Most adult ma ...
object system. Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API. Moose classes: * A class has zero or more attributes. * A class has zero or more methods. * A class has zero or more superclasses (aka parent classes). A class inherits from its superclass(es). * A class does zero or more roles, which add the ability to add pre-defined functionality to classes without subclassing. * A class has a constructor and a destructor. * A class has a metaclass. * A class has zero or more method modifiers. These modifiers can apply to its own methods, methods that are inherited from its ancestors, or methods that are provided by roles. Moose roles: * A role is something that a class does, somewhat like
mixin In object-oriented programming languages, a mixin (or mix-in) is a class that contains methods for use by other classes without having to be the parent class of those other classes. How those other classes gain access to the mixin's methods depen ...
s or interfaces in other object-oriented programming languages. Unlike mixins and interfaces, roles can be applied to individual object instances. * A role has zero or more attributes. * A role has zero or more methods. * A role has zero or more method modifiers. * A role has zero or more required methods.


Examples

An example of a class written using the MooseX::DeclareMooseX::Declare documentation
/ref> extension to Moose: use MooseX::Declare; class Point3D extends Point This is a class named Point3D that extends another class named Point explained in Moose examples. It adds to its base class a new attribute z, redefines the method set_to and extends the method clear.


References

{{reflist


External links


Perl tutorials



PerlMonks
A community committed to sharing Perl knowledge and coding tips. Articles with example Perl code Perl