[prev in list] [next in list] [prev in thread] [next in thread]
List: perl-beginners
Subject: Re: Extracting Columns from tab delimited files
From: Tiago Hori <tiago.hori () gmail ! com>
Date: 2013-02-11 15:56:10
Message-ID: DAEB3FE5-73AD-4A6D-8416-EBE8248D7A2A () gmail ! com
[Download RAW message or body]
Thanks, Jim.
This awesome!
T.
Sent from my iPhone
On 2013-02-11, at 11:49 AM, Jim Gibson <jimsgibson@gmail.com> wrote:
>
> On Feb 10, 2013, at 5:57 PM, Tiago Hori wrote:
>
> > Hi All,
> >
> > I am trying to force myself to not use one of perl's modules to parse tab
> > delimited files (like TXT::CSV), so please be patient and don't tell me
> > just to go and use them. I am trying to re-ivent the wheel, so to speak,
> > because as we do with science, we repeat experiments to lean about the
> > process even tough we know the outcome.
>
> Coding your own solutions rather than using a module already built for the same \
> purpose is perfectly all right, especially if you are learning Perl. If you are \
> confident your data has a simple format and will not change, then you can parse it \
> yourself. Keep in mind, however, that the Text::CSV module can handle more \
> complicated cases. For example, what if your data fields can contain the separator \
> character? In that case, your data fields may be enclosed in quotes or the embedded \
> separator characters will have to be escaped (e.g., preceded by a '\' character or \
> some other means.) The Text::CSV module can handle these cases, plus it can read \
> from a file or a scalar and deal with broken lines and other complexities. There is \
> also the Text::CSV::XS module which includes C code for speed.
> >
> > So I started by putting reading in the files and go one line at time,
> > putting those line in arrays and matching a specific line of interest. With
> > join I could then turn the array of interest in a scalar and print that
> > out. That is almost what I wanted (see code below):
> >
> > #! /usr/bin/perl
> > use strict;
> > use warnings;
> >
> > my $filename_data = $ARGV[0];
> > my $filename_target = $ARGV[1];
> > my $line_number = 1;
> > my @targets;
> >
> > open FILE, "<", $filename_data or die $!;
> > open TARGET, "<", $filename_target or die $!;
>
> Lexical file handles are generally better, and it helps to include the file name in \
> the error message:
> open(my $file, '<', $filename_data) or
> die( "Can't open $filename_data for reading: $!");
>
> >
> > while (<TARGET>){
> > push (@targets, $_);
> > }
>
> You can replace the above with:
>
> my @targets = <TARGET>;
>
> You can also do this to remove the line ending characters from @targets:
>
> chomp(@targets);
>
> >
> > close (TARGET);
> >
> > while (<FILE>){
> > chomp;
> > my $line = $_;
>
> You can read directly into a scalar, so no need for the $_ variable here:
>
> while( my $line = <FILE> ) {
> chomp($line);
>
> > my @elements = split ("\t", $line);
> > my $row_name = $elements[0];
> > if ($line_number == 1){
> > my $header = join("\t", @elements);
>
> You are splitting $line, then joining it back up in $header. Why not just
> $header = $line;
>
> > print $header, "\n";
> > $line_number = 2;}
> > elsif($line_number = 2){
>
> That should be
>
> elsif( $line_number == 2 ) {
>
> > foreach (@targets){
> > chomp;
> > my $target = $_;
> > if ($row_name eq $target){
> > my $data = join("\t", @elements);
> > print $data,"\n";
>
> Once again, just use $line.
>
> > }
> > }
> > }
> > }
> >
> > close (FILE);
> >
> > Realistic, I don't want the whole row. So I started thinking about how to
> > get specific columns. I started reading on the internet and the ideas seems
> > to be placing the arrays containing the lines in a hash indexed by the row
> > names. So I did this:
>
> There are several ways to extract individual columns from a CSV line.
>
> 1. You can split the line into an array and make copies of specific elements:
>
> my @fields = split("\t",$line);
> my $name = $fields[0];
> my $address = $fields[3];
> my $zip = $fields[7];
>
> 2. You can use an array slice on the array:
>
> my( $name, $address, $zip ) = @fields[0,3,7];
>
> 3. You can use an array slice on the return list from split:
>
> my( $name, $address, $zip ) = (split("\t",$line))[0,3,7];
>
> 4. You can split the line into individual variables:
>
> my( $name, $position, $salary, $address, $street, $city, $country, $zip ) = \
> split("\t",$line);
> 5. You can use undefs to ignore columns you don't want:
>
> my( $name, undef, undef, $address, undef, undef, undef, $zip ) = split("\t",$line);
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic