[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl-beginners
Subject:    Re: Extracting Columns from tab delimited files
From:       Tiago Hori <tiago.hori () gmail ! com>
Date:       2013-02-11 15:56:10
Message-ID: DAEB3FE5-73AD-4A6D-8416-EBE8248D7A2A () gmail ! com
[Download RAW message or body]

Thanks, Jim.

This awesome!

T.

Sent from my iPhone

On 2013-02-11, at 11:49 AM, Jim Gibson <jimsgibson@gmail.com> wrote:

> 
> On Feb 10, 2013, at 5:57 PM, Tiago Hori wrote:
> 
> > Hi All,
> > 
> > I am trying to force myself to not use one of perl's modules to parse tab
> > delimited files (like TXT::CSV), so please be patient and don't tell me
> > just to go and use them. I am trying to re-ivent the wheel, so to speak,
> > because as we do with science, we repeat experiments to lean about the
> > process even tough we know the outcome.
> 
> Coding your own solutions rather than using a module already built for the same \
> purpose is perfectly all right, especially if you are learning Perl. If you are \
> confident your data has a simple format and will not change, then you can parse it \
> yourself. Keep in mind, however, that the Text::CSV module can handle more \
> complicated cases. For example, what if your data fields can contain the separator \
> character? In that case, your data fields may be enclosed in quotes or the embedded \
> separator characters will have to be escaped (e.g., preceded by a '\' character or \
> some other means.) The Text::CSV module can handle these cases, plus it can read \
> from a file or a scalar and deal with broken lines and other complexities. There is \
> also the Text::CSV::XS module which includes C code for speed. 
> > 
> > So I started by putting reading in the files and go one line at time,
> > putting those line in arrays and matching a specific line of interest. With
> > join I could then turn the array of interest in a scalar and print that
> > out. That is almost what I wanted (see code below):
> > 
> > #! /usr/bin/perl
> > use strict;
> > use warnings;
> > 
> > my $filename_data = $ARGV[0];
> > my $filename_target = $ARGV[1];
> > my $line_number = 1;
> > my @targets;
> > 
> > open FILE, "<", $filename_data or die $!;
> > open TARGET, "<", $filename_target or die $!;
> 
> Lexical file handles are generally better, and it helps to include the file name in \
> the error message: 
> open(my $file, '<', $filename_data) or 
> die( "Can't open $filename_data for reading: $!");
> 
> > 
> > while (<TARGET>){
> > push (@targets, $_);
> > }
> 
> You can replace the above with:
> 
> my @targets = <TARGET>;
> 
> You can also do this to remove the line ending characters from @targets:
> 
> chomp(@targets);
> 
> > 
> > close (TARGET);
> > 
> > while (<FILE>){
> > chomp;
> > my $line = $_;
> 
> You can read directly into a scalar, so no need for the $_ variable here:
> 
> while( my $line = <FILE> ) {
> chomp($line);
> 
> > my @elements = split ("\t", $line);
> > my $row_name = $elements[0];
> > if ($line_number == 1){
> > my $header = join("\t", @elements);
> 
> You are splitting $line, then joining it back up in $header. Why not just
> $header = $line;
> 
> > print $header, "\n";
> > $line_number = 2;}
> > elsif($line_number = 2){
> 
> That should be 
> 
> elsif( $line_number == 2 ) {
> 
> > foreach (@targets){
> > chomp;
> > my $target = $_;
> > if ($row_name eq $target){
> > my $data = join("\t", @elements);
> > print $data,"\n";
> 
> Once again, just use $line.
> 
> > }
> > }
> > }
> > }
> > 
> > close (FILE);
> > 
> > Realistic, I don't want the whole row. So I started thinking about how to
> > get specific columns. I started reading on the internet and the ideas seems
> > to be placing the arrays containing the lines in a hash indexed by the row
> > names. So I did this:
> 
> There are several ways to extract individual columns from a CSV line.
> 
> 1. You can split the line into an array and make copies of specific elements:
> 
> my @fields = split("\t",$line);
> my $name = $fields[0];
> my $address = $fields[3];
> my $zip = $fields[7];
> 
> 2. You can use an array slice on the array:
> 
> my( $name, $address, $zip ) = @fields[0,3,7];
> 
> 3. You can use an array slice on the return list from split:
> 
> my( $name, $address, $zip ) = (split("\t",$line))[0,3,7];
> 
> 4. You can split the line into individual variables:
> 
> my( $name, $position, $salary, $address, $street, $city, $country, $zip ) = \
> split("\t",$line); 
> 5. You can use undefs to ignore columns you don't want:
> 
> my( $name, undef, undef, $address, undef, undef, undef, $zip ) = split("\t",$line);
> 
> 
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
> 
> 

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic