'[jira] [Updated] (CSV-226) Add CSVParser test case for standard charsets'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       jakarta-commons-dev
Subject:    [jira] [Updated] (CSV-226) Add CSVParser test case for standard charsets
From:       "Anson Schwabecher (JIRA)" <jira () apache ! org>
Date:       2018-05-31 1:28:00
Message-ID: JIRA.13162194.1527296750000.70050.1527730080041 () Atlassian ! JIRA
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/CSV-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Anson Schwabecher updated CSV-226:
----------------------------------
    Description: 
Hello, I'd like to contribute a CSVParser test suite for standard charsets as defined \
in java.nio.charset.StandardCharsets + UTF-32.

This is a standalone test but is also in support of a fix for CSV-107.   It also \
refactors and unifies the testing around your established workaround of inserting \
BOMInputStream ahead of the CSVParser.

It will take a single base UTF-8 encoded file (cstest.csv) and copy it to multiple \
output files (in target dir) with differing character sets, similar to the iconv \
tool.   Each file will then be fed into the parser to test all the BOM/NOBOM unicode \
variants.   I think a file based approach is still important here rather than just \
encoding a character stream inline as a string, that way if issues develop it's easy \
to inspect the data.

I noticed in the project's pom.xml (rat config) that you are excluding individual \
test resource files by name rather than using a wildcard expression to exclude every \
file in the directory.   Is there a reason for this? It's much better if devs do not \
have to maintain this configuration.


{code:language=xml|title=i.e.: switch over to a single exclude expression}
<exclude>src/test/resources/**/*</exclude>
{code}

  was:
Hello, I'd like to contribute a CSVParser test suite for standard charsets as defined \
in java.nio.charset.StandardCharsets + UTF-32.

This is a standalone test but is also in support of a fix for CSV-107.   It also \
refactors and unifies the testing around your established workaround of inserting \
BOMInputStream ahead of the CSVParser.

It will take a single base UTF-8 encoded file (cstest.csv) and copy it to multiple \
output files (in target dir) with differing character sets, similar to the iconv \
tool.   Each file will then be fed into the parser to test all the BOM/NOBOM unicode \
variants.   I think a file based approach is still important here rather than just \
encoding a character stream inline as a string, that way if issues develop it's easy \
to inspect the data.

I noticed in the project's pom.xml (rat config) that you are excluding individual \
test resource files by name rather than using a wildcard expression to exclude every \
file in the directory.   Is there a reason for this? It's much better if devs do not \
have to maintain this configuration.

i.e.: switch over to a single exclude expression:

{{<exclude>src/test/resources/**/*</exclude>}}


> Add CSVParser test case for standard charsets
> ---------------------------------------------
> 
> Key: CSV-226
> URL: https://issues.apache.org/jira/browse/CSV-226
> Project: Commons CSV
> Issue Type: Test
> Components: Parser
> Affects Versions: 1.5
> Reporter: Anson Schwabecher
> Priority: Minor
> 
> Hello, I'd like to contribute a CSVParser test suite for standard charsets as \
> defined in java.nio.charset.StandardCharsets + UTF-32. This is a standalone test \
> but is also in support of a fix for CSV-107.   It also refactors and unifies the \
> testing around your established workaround of inserting BOMInputStream ahead of the \
> CSVParser. It will take a single base UTF-8 encoded file (cstest.csv) and copy it \
> to multiple output files (in target dir) with differing character sets, similar to \
> the iconv tool.   Each file will then be fed into the parser to test all the \
> BOM/NOBOM unicode variants.   I think a file based approach is still important here \
> rather than just encoding a character stream inline as a string, that way if issues \
> develop it's easy to inspect the data. I noticed in the project's pom.xml (rat \
> config) that you are excluding individual test resource files by name rather than \
> using a wildcard expression to exclude every file in the directory.   Is there a \
> reason for this? It's much better if devs do not have to maintain this \
> configuration. {code:language=xml|title=i.e.: switch over to a single exclude \
> expression} <exclude>src/test/resources/**/*</exclude>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic