[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: Behaviour of str.split
From:       David Fraser <davidf () sjsoft ! com>
Date:       2005-04-20 13:03:29
Message-ID: d45jv0$g67$1 () ctb-nnrp2 ! saix ! net
[Download RAW message or body]

Bengt Richter wrote:
> On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <davidf@sjsoft.com> wrote:
> 
> 
>>Greg Ewing wrote:
>>
>>>Will McGugan wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>I'm curious about the behaviour of the str.split() when applied to 
>>>>empty strings.
>>>>
>>>>"".split() returns an empty list, however..
>>>>
>>>>"".split("*") returns a list containing one empty string.
>>>
>>>
>>>Both of these make sense as limiting cases.
>>>
>>>Consider
>>>
>>> >>> "a b c".split()
>>>['a', 'b', 'c']
>>> >>> "a b".split()
>>>['a', 'b']
>>> >>> "a".split()
>>>['a']
>>> >>> "".split()
>>>[]
>>>
>>>and
>>>
>>> >>> "**".split("*")
>>>['', '', '']
>>> >>> "*".split("*")
>>>['', '']
>>> >>> "".split("*")
>>>['']
>>>
>>>The split() method is really doing two somewhat different things
>>>depending on whether it is given an argument, and the end-cases
>>>come out differently.
>>>
>>
>>You don't really explain *why* they make sense as limiting cases, as 
>>your examples are quite different.
>>
>>Consider
>>
>>>>>"a*b*c".split("*")
>>
>>['a', 'b', 'c']
>>
>>>>>"a*b".split("*")
>>
>>['a', 'b']
>>
>>>>>"a".split("*")
>>
>>['a']
>>
>>>>>"".split("*")
>>
>>['']
>>
>>Now how is this logical when compared with split() above?
> 
> 
> The trouble is that s.split(arg) and s.split() are two different functions.
> 
> The first is 1:1 and reversible like arg.join(s.split(arg))==s
> The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually.
> 
> I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace
> substrings and ''.joining those back with the others, but not with split(). I.e.,
> 
>  >>> def splitjoin(s, splitter=None):
>  ...     return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter))
>  ...
>  >>> splitjoin('a*b*c', '*')
>  'a*b*c'
>  >>> splitjoin('a*b', '*')
>  'a*b'
>  >>> splitjoin('a', '*')
>  'a'
>  >>> splitjoin('', '*')
>  ''
>  >>> splitjoin('a b    c')
>  'a<<whitespace>>b<<whitespace>>c'
>  >>> splitjoin('a b    ')
>  'a<<whitespace>>b'
>  >>> splitjoin('  b    ')
>  'b'
>  >>> splitjoin('')
>  ''
> 
>  >>> splitjoin('*****','*')
>  '*****'
> Note why that works:
> 
>  >>> '*****'.split('*')
>  ['', '', '', '', '', '']
>  >>> '*a'.split('*')
>  ['', 'a']
>  >>> 'a*'.split('*')
>  ['a', '']
> 
>  >>> splitjoin('*a','*')
>  '*a'
>  >>> splitjoin('a*','*')
>  'a*'

Thanks, this makes sense.
So ideally if we weren't dealing with backward compatibility these 
functions might have different names... "split" (with arg) and 
"spacesplit" (without arg)
In fact it would be nice to allow an argument to "spacesplit" specifying 
the characters regarded as 'space'
But all not worth breaking current code :-)

David
-- 
http://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic