Chrisantha Fernando

Perl Programming Notes

Introduction

I use BBEdit to write Perl programs. It can be downloaded here along with other perl IDE (Interactive Development Environments). This really is a very nice editor. So far these notes are from "Learning Perl" O-Reilly.

Strings

Single-quoted v. Double quoted. The difference is that in single quotes one \ does not have any special meaning as a control character. Also, double quoted strings are variable interpolated.

Concatenation of strings : Uses the . operator. To repeat a string use "fred" x 3, where fred is the string to be repeated. Automatic conversion between numbers and strings occurs depending on context.

Variables

Start with $ , e.g. $numAgents.

Print

print $numAgents. Double-quoted strings get contained variables interpolated, e.g. print "There are $numAgents agents\n". If you need a letter right after numAgents write ${numAgents}bla

printf can be used as well like in C++.

Comparison Operators

String and numeric below (from here).

If control structure

Same as in C if (condition) { stuff } else {stuff }. if ($fred) { bla} evaluates true if $fred is not undef, not zero or '0' and not empty string ' '. All else is true.

Input

Use $line = <STDIN>; Perl waits for user to enter a line and type enter. The $line variable stores the string AND the newline charcter \n that the user typed in. To get rid of this use chomp e.g. chomp($text); gets rid of a newline character and the end of text. e.g. chomp($text = <STDIN>) reads the text, without the newline character. Paranthesise are optional so you can write $x = chomp $text, and it still works.

In list context, i.e. @lines = <STDIN> , STDIN returns all of the remaining lines up to the end of file, each line being a seperate element in the list. chomp(@lines) then removes the newlines at the end of each line or just do chomp(@lines = <STDIN>). TRY NOT TO DO THIS FOR LARGE FILES.

Undef

Before being defined, variables have the undef value, which is zero or the empty string depending on how the variable gets used. To check for undef use defined($text) which returns undef if end-of-file is reached without an end of line.

Lists and Arrays

An array is a variable that contains a list which is an ordered collection of scalers. The scalers can be of variable types! Access is like in C i.e. $fred[0] = "dog"; If accessed with a float, the float is rounded down, i.e. $fred[ 2.712] is $fred[2]. If this is beyond the size of the array, you get undef returned.

If you store into an element beyond the end of the array, the array gets extended with undefs in between the current max, and the new element that you stored. This is truely weird.

The last element index is $#arrayName, which you need to add one to to get the number of elements. You can change the SIZE of the array by changing the value of this element!! e.g. $#array = 2 makes the array only 3 in length, i.e. the last element becomes 2. Indexing with $array[-1] is the same as $array[$#array], since the indexes are cyclical! Another weirdness of Perl.

 
List Literal

(1,2,3,) is a list, last comma ignored. () empty list. (1..100) range operator, creates a list from 1 to 100 in ones. Range operator truncates floats, only counts up, can take integer variables, or $#arrayName value, which is a useful way to go through the indices of an array. Thus the elemnts of a list can be evaluated each time the literal is used, e.g. ($b+ $c, $d + $e). The list can have any scaler variables, like an array.

To make a list of simple words... qw/ word1 word2 word3 / is the same as ("word1", "word2", "word3"). qw means quoted words. Perl treats it like a single quoted string. Although forward slashes were used, any punctuation character can be used as a delimiter.

List Assignment

Variables in lists can be assigned to variables in other lists, e.g.

($fred, $barney, $dino) = ("flintstone", "rubble", undef); or

($feed, $barney) = qw< flintstone ruubble slate granite >;

An array can be assigned to like this...

@rocks = qw/ word1 word2 word3/

Array copies are effectively list copies. To clear an array assign an empty list, i.e. @array = ();

Pop and Push

Takes last element of array and returns it, e.g. $fred = pop(@array). Push adds an element to the end of the array e.g. push(@array, 8) adds 8 to the end of array. Can be used without brackets, or like this push @array, 1..10 shift and unshift do the same to the start of the array.

Interpolating Arrays into Strings (and printing arrays).

Just like with scalers, e.g. print "some text @array more text\n"; the array elements get seperated by spaces. A single element of an array is replaced by its value. If you write print @array; you get no spaces, whereas if you write print "@array"; the output is seperated by spaces. If the array contains new lines its best to use the former.

foreach

Using foreach one can iterate through an array or list like this,

foreach $arrayWord (qw/ word1 word2 word3) {

print "Bla Bla is $array. \n";

}

$arrayWord is the array element. The variable $arrayWord is reset to undef outside the loop.

If a variable is omited then Perl uses the default variable $_

reverse

Reverses the list of values (that may come from an array). Note it returns the reversed list, and does not alter the original. In scaler context reverses a string.

sort

Sorts in ASCII order. @sorted = sort(@rocks).

Scalar and List Context

$number = 5 + @people; //Gives the number of elements in people. (Array context)

@sorted = sort @people; //Sorts people. (List context)

Force scalar context using scalar @array.

Array Examples

1. Reading data from a file into a two-dimensional array.

2. Reading data from a file into a two-dimensional array, and shuffling this data to produce various NULL models for statistical analysis.


#!/usr/bin/perl


use strict;

open DATA"data.txt";

my(@data) = <DATA>;

splice (@data0,5); 

my @final_data;

foreach $_ (@data){
    chomp $_;
    my @cols = split /\t/$_;
    push @final_data, [ @cols ]; 
    #print $_; 

}

print "#Original Data\n"for $a (1..$#final_data) {
    for $b (0..2){
         print $final_data[$a]->[$b] . " "  ;
    }
    print "\n"; 
}

#Print out to a file the original data table. 

#Create a hash table of key value pairs from 0 to MAX_GENE number.
my %hashMixforeach $_ (1..423) { 
    $hashMix{$_} = $_;
    #print "$hashMix{$_} "; 
    }

#Shuffle the values in the hash table, by pairwise shuffling, i.e. choose two keys randomly
#and swap their values. 
        
        my $temp_value; 
        my $temp_value2; 
        my $randomNum;
        my $randomNum2;

        my($key); 
        my($value); 

        my $a; 
        my $b; 

        my $shuffle; 
        
for $shuffle (1..10){ 
        print "#Random Data $shuffle\n"; 
        #How can we know how good a randomization this is? 
        foreach (1..10000){ 
            $randomNum = intrand(423)) + 1;
            $temp_value = $hashMix{$randomNum}; 
            $randomNum2 = intrand(423)) + 1;
            $temp_value2 = $hashMix{$randomNum2}; 
            
            $hashMix{$randomNum} = $temp_value2; 
            $hashMix{$randomNum2} = $temp_value; 
            
        } 
         
         
        #while ( ($key, $value) = each %hashMix) { 
        #   print "$key => $value\n"; 
        #}
            
        #Now replace the numbers in the @final_data array with the hash values. 
        
        for $a (1..$#final_data) {
            for $b (0..2){
                 #print $final_data[$a]->[$b] ;
                 #print " ";
                 $final_data[$a]->[$b] = $hashMix{$final_data[$a]->[$b]}; 
                 print $final_data[$a]->[$b] . " "  ;
                 
            }
            print "\n"; 
        }

}
 
#Convert numbers to GO catagories.  

The above software is used in my Gene Ontology/Bacterial Motif research, see here.

 

Subroutines

#!/usr/bin/perl
sub marine {
$n += 1;
print "Hello, sailor number $n!\n";
}

&marine; //Calls the subroutine.
&marine;
&marine;

For example, runs marine 3 times. & preceeds the subroutine name. The return value is the last expression evaluated.

Arguments

to subroutines are done like this.... $n = &max(10,15); The arguments are stored in @_ e.g in @_[0] and @_[1]. How to use these? Create private variables called lexical variables using the my operator.

#!/usr/bin/perl
sub max {
my($a, $b);
($a, $b) = @_;
if($a > $b) { $a} else {$b}
}

print &max(10,15);

Above the variables $a and $b are scoped. Alternatively write my($a, $b) = @_; in one line. To be on the safe side it is best to check that a subroutine is called with the correct number of arguments e.g.

if(@_ != 2) {

print "WARNING"

}

But its better to "make the subroutine adapt to the parameters. A better max routine is shown below capable of dealing with any input length of arguments.

$maximum = &max(3, 5, 10, 4, 6);

sub max {
my($max_so_far) = shift @_; # the first one is the largest yet seen
foreach (@_) { # look at the remaining arguments
if ($_ > $max_so_far) { # could this one be bigger yet?
$max_so_far = $_;
}
}
$max_so_far;
}

This is called a "high-water mark" algorithm.

use strict should be used to enforce good programming rules.

return immediately returns a value from a subroutine, without having to execute the rest. As a default you might return undef.

Hashes

e.g. for species and how many times that species appears. Its a very simple database. To access type

$hash($some_key ]

use curley braces, unlike with arrays, and the key is a string. To assign data type,

$Surname{"Chrisantha"} = "Fernando";

To refer to the entire hash use %hashName. A has can be converted into a list and back, e.g. assigning a list to a hash, the list being made of key value pairs,

%hash = ("key1", value, "key2", value2, "key3", value3); and the other way around @an_array = %some_hash, returns the list value pair array. The order may be jumbled. Hash assignment can also use a big arrow => e.g. "fred" => "flintstone", instead of comma's in the above list.

keys returns a list of keys in the hash, i.e. my @k = keys %hash; and values does the same thing for values.

each returns a key value pair, e.g.

while ( ($key, $value) = each %hash){

print "$key => $value\n"

}

Diamond Operator

Use this to allow command line arguments containing filenames to replace STDIN, e.g.


#!/usr/bin/perl


while (defined($line = <>)){ 
    chomp($line); 
    print "It was $line that I saw\n"; 
}

in the command line you would call this program with program.pl file1 file2 etc... and it would read through each line in those files. This allows a Perl program to be used like a unix command. If you include @ARGV = qw# file1 file2 file3 #; before the while, you can force this files to be read.

Regular Expressions

This is very important. We will be using regular expressions to implement genetic operators on Kappa and BioNetGen process algebra rules.

 

About Us | Site Map | Privacy Policy | Contact Us | ©2005 Chrisantha Fernando