[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-man
Subject:    Re: [PATCH] New Page: byte.7: Document what "byte" means, theory and practice
From:       Michael Witten <mfwitten () gmail ! com>
Date:       2018-06-24 4:49:12
Message-ID: 44ac6074c1c94296b0a7316d22be8b83-mfwitten () gmail ! com
[Download RAW message or body]

On Fri, 22 Jun 2018 05:54:57 -0000, Michael Witten wrote:
> On Thu, 21 Jun 2018 21:46:03 +0200, Michael Kerrisk wrote:
>> On 06/20/2018 06:25 PM, Michael Witten wrote:
>>> This man page  defines what "byte" means in the  context of Linux
>>> programming; it draws on various authoritative references, namely
>>> Linus  Torvalds's master's  thesis,  POSIX, and  the C11  [draft]
>>> standard. Each of these references is properly cited.
>>>
>>> The content  has been  laid out  to render well  in a  pager that
>>> provides at least 80 columns  of monospace characters; it is best
>>> viewed by  `man' with at least one  of the  following environment
>>> variable definitions:
>>>
>>>   COLUMNS=80
>>>   MANWIDTH=80
>>
>> Thanks for sending this, but  what's missing in this cover message
>> is some explanation of why the  page needed. It's not clear to me.
>> Nor is the rationale clear from reading the start of the page. So,
>> why is the page needed?
>
> A programmer needs  to hook into  various interfaces to  make things
> work.  Linux provides an interface, POSIX provides an interface, and
> the C standard provides an interface; and, of course, there are many
> other interfaces, some of which haven't even yet been built, but for
> which a programmer might want to be  fully prepared, and which might
> itself target one of those Big interfaces while neglecting another.
>
> Though these Big Three interfaces  are related, they're not actually
> coupled  all that  strongly  together; there's  plenty  of room  for
> disagreement both  now and in  the future,  which is one  reason why
> Linus Torvalds  writes in his  master's thesis  about  the size of a
> byte and the  nature of data  (the quote in the new man page is from
> the  section "Unresolved  Issues", where  he details  concerns about
> portability).
>
> [...]
>
> There's nothing stopping a determined  soul from porting Linux to an
> unusual architecture that does not have an 8-bit primitive;  for the
> sake of  compatibility, that  port  would undoubtedly require  a few
> hacks to emulate an 8-bit interface, but that's just the kernel! The
> user space is an entirely different domain, which might eschew POSIX
> compliance (targeting instead  just the looser constraints  of the C
> standard),  and  thereby  place  on the  programmer  the  burden  of
> structuring data properly.
>
> Even if there were the  *strictest* compliance to POSIX, guess what?
> An `unsigned short' under POSIX  ain't necessarily 16 bits; like the
> C standard,  POSIX requires only that an `unsigned short' be capable
> of representing *at least* 16 bits:
>
>   http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/limits.h.html#tag_13_23_03_06
>   {USHRT_MAX}
>       Maximum value for an object of type unsigned short.
>       Minimum Acceptable Value: 65 535
>
> All of your code that uses an uninitialized `unsigned short' to read
> in a single 2-byte datum is wrongheaded,  even under POSIX;  it just
> "happens to work",  at least  for now.  You've  got  to clear  those
> "extra" higher-order bits if you  don't want them used inadvertently
> in your calculation.  That is, you've got to write a program that is
> aware of the sizes of even basic data types.
>
> As described by Linus's master's thesis, that's why the Linux kernel
> targets a header-based "virtual machine" that provides architecture-
> specific implementations of integer types with precise widths (e.g.,
> `u8', `u16', or `u32);  similarly, that's why this man page mentions
> C99's fixed-width integer types (e.g., `uint8_t', `uint16_t', etc.).
>
> [...]
>
> The new  man page  explicitly discusses issues  like this,  but con-
> centrates more on the narrow topic of  what a "byte" or a "char" is.
> Perhaps the purpose of this man  page would be more obvious if other
> data  types  (like `short')  were also listed  in the SYNOPSIS,  and
> further  discussed  in the DESCRIPTION.  Perhaps the man page should
> also delve into Linux's integer types.
>
> What do you think?

I've expanded the SYNOPSIS to include information that is perhaps more
widely useful, though I haven't yet expanded the DESCRIPTION to follow
suit; here is its ASCII rendering (without bolding or italicizing, and
without a couple fancy Unicode characters for the arrow symbols):

BYTE(7)                    Linux Programmer's Manual                   BYTE(7)



NAME
       byte - exactly 8 bits; the smallest addressable unit in the kernel
       char - at least 8 bits; the smallest addressable unit in C

SYNOPSIS

   Linux and POSIX (and modern computing)

       byte       <->  exactly 8 bits   This is the definition of "byte"
                                        used throughout this manual.
   POSIX

       char       <->  exactly 1 byte
       short      <-> at least 2 bytes
       int        <-> at least 4 bytes
       long       <-> at least 4 bytes
       long long  <-> at least 8 bytes

   Standard C

       char       <-> at least 1 byte    <- Beware!
       short      <-> at least 2 bytes
       int        <-> at least 2 bytes   <- Beware!
       long       <-> at least 4 bytes
       long long  <-> at least 8 bytes

   IP16L32 (16-bit x86; "near" pointers)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 2 bytes
       long       <->  exactly 4 bytes
       a pointer  <->  exactly 2 bytes

   I16LP32 (16-bit x86; "far" pointers)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 2 bytes
       long       <->  exactly 4 bytes
       a pointer  <->  exactly 4 bytes

   ILP32 (32-bit x86)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 4 bytes
       long       <->  exactly 4 bytes
       long long  <->  exactly 8 bytes
       a pointer  <->  exactly 4 bytes

   IL32P64 or LLP64 (x86-64)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 4 bytes
       long       <->  exactly 4 bytes
       long long  <->  exactly 8 bytes
       a pointer  <->  exactly 8 bytes

   I32LP64 or LP64 (x86-64)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 4 bytes
       long       <->  exactly 8 bytes
       long long  <->  exactly 8 bytes
       a pointer  <->  exactly 8 bytes

   ILP64 (SPARC64)

       char       <->  exactly 1 byte
       short      <->  exactly 2 bytes
       int        <->  exactly 8 bytes
       long       <->  exactly 8 bytes
       long long  <->  exactly 8 bytes
       a pointer  <->  exactly 8 bytes

   SILP64 (Cray)

       char       <->  exactly 1 byte
       short      <->  exactly 8 bytes
       int        <->  exactly 8 bytes
       long       <->  exactly 8 bytes
       long long  <->  exactly 8 bytes
       a pointer  <->  exactly 8 bytes

    ...

DESCRIPTION
[...]

Sincerely,
Michael Witten
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic