'[Patches] [ python-Patches-1057588 ] chr, ord,'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-patches
Subject:    [Patches] [ python-Patches-1057588 ] chr, ord,
From:       noreply () sourceforge ! net (SourceForge ! net)
Date:       2004-10-31 19:18:38
Message-ID: E1COKHi-0006NA-44 () sc8-sf-web4 ! sourceforge ! net
[Download RAW message or body]

Patches item #1057588, was opened at 2004-10-31 00:25
Message generated for change (Settings changed) made by mike_j_brown
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1057588&group_id=5470

Category: Documentation
Group: Python 2.4
Status: Open
Resolution: None
> Priority: 1
Submitted By: Mike Brown (mike_j_brown)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: chr, ord, unichr documentation updates

Initial Comment:
The attached diff may be applied against v1.175 of
libfuncs.tex --
http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/dist/src/Doc/lib/libfuncs.tex?content-type=text%2Fplain&rev=1.175



chr(): A str is not in any particular encoding, so
don't talk about ASCII, which does not apply to
arguments > 127 anyway. Also make reference to unichr().

ord(): A str is not in any particular encoding, so
don't talk about ASCII. Describe what the return value
represents for each type of string (str, unicode), and
mention the TypeError that will be raised on narrow
unicode builds of Python.

unichr(): Mention the restrictions on the argument
depending on whether Python was built with wide or
narrow unicode.

The precedent in unicode() is to refer to str objects
as "8-bit strings", so the wording of the above changes
was chosen accordingly.


----------------------------------------------------------------------

Comment By: Mike Brown (mike_j_brown)
Date: 2004-10-31 11:17

Message:
Logged In: YES 
user_id=371366

Oops, didn't mean to remove the assignment to fdrake when
adding previous comment.

----------------------------------------------------------------------

Comment By: Mike Brown (mike_j_brown)
Date: 2004-10-31 01:23

Message:
Logged In: YES 
user_id=371366

Also note that I did not suggest removing the example with
the letter "a". I just suggested removing the reference to
"ASCII" in particular.

Ideally, IMHO, the documentation for sequence types is where
one should mention the strong association between strings
and ASCII. It currently doesn't even really describe what a
string or Unicode string is. It should state that
non-Unicode strings are an abstraction in which each member
of the sequence is a "character" that is actually an 8-bit
value, as in Standard C, intended to represent a character
in an arbitrary encoding, and that there is an _informal_
convention, in documentation, of referring to these values
as being ASCII values, in part due to the notational
conventions of string literals, such as using "\t", "\n",
and "\r" to represent decimal values 9, 10, and 13,
respectively (associations that only make sense in ASCII or
ASCII-based encodings), and in part because it is easier to
talk about the lower 128 values in terms of their ASCII
equivalents (e.g. "chr(97) produces the string 'a'").
Likewise, the unicode type could be described as being an
abstraction of 16-bit ("narrow") or 32-bit ("wide") code
units, depending on how Python was built, and so on... I
would see making such unambiguous statements to be a
reasonable alternative to just deleting mentions of ASCII
from the library docs, although I think making all of the
changes would be best, as people already have preconceived
notions of what a 'string' is and I know from experience
that they tend to not worry about straightening out their
understanding of such nuances until they get burned by
assumptions built around statements like "ord() gives you
the ASCII value".


----------------------------------------------------------------------

Comment By: Mike Brown (mike_j_brown)
Date: 2004-10-31 00:51

Message:
Logged In: YES 
user_id=371366

That kind of resistance to using accurate, strict
terminology just perpetuates common misunderstandings about
the relationship between characters and encodings.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-10-31 00:38

Message:
Logged In: YES 
user_id=80475

The attachment didn't make it.  Try again.

And, FWIW, I think the documentation is perfectly clear as
is.  Though the ASCII reference is not strict, I think
taking it out would be a mistake.  Though many encodings are
possible, there is a strong relationship between the number
97 and the letter 'a'.  Mentioning ASCII makes that
relationship clear.

IOW, I -1 on changing it until a new bytes type is introduced.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1057588&group_id=5470


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic