[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-man
Subject: Re: [PATCH] New Page: byte.7: Document what "byte" means, theory and practice
From: Michael Witten <mfwitten () gmail ! com>
Date: 2018-06-24 4:49:12
Message-ID: 44ac6074c1c94296b0a7316d22be8b83-mfwitten () gmail ! com
[Download RAW message or body]
On Fri, 22 Jun 2018 05:54:57 -0000, Michael Witten wrote:
> On Thu, 21 Jun 2018 21:46:03 +0200, Michael Kerrisk wrote:
>> On 06/20/2018 06:25 PM, Michael Witten wrote:
>>> This man page defines what "byte" means in the context of Linux
>>> programming; it draws on various authoritative references, namely
>>> Linus Torvalds's master's thesis, POSIX, and the C11 [draft]
>>> standard. Each of these references is properly cited.
>>>
>>> The content has been laid out to render well in a pager that
>>> provides at least 80 columns of monospace characters; it is best
>>> viewed by `man' with at least one of the following environment
>>> variable definitions:
>>>
>>> COLUMNS=80
>>> MANWIDTH=80
>>
>> Thanks for sending this, but what's missing in this cover message
>> is some explanation of why the page needed. It's not clear to me.
>> Nor is the rationale clear from reading the start of the page. So,
>> why is the page needed?
>
> A programmer needs to hook into various interfaces to make things
> work. Linux provides an interface, POSIX provides an interface, and
> the C standard provides an interface; and, of course, there are many
> other interfaces, some of which haven't even yet been built, but for
> which a programmer might want to be fully prepared, and which might
> itself target one of those Big interfaces while neglecting another.
>
> Though these Big Three interfaces are related, they're not actually
> coupled all that strongly together; there's plenty of room for
> disagreement both now and in the future, which is one reason why
> Linus Torvalds writes in his master's thesis about the size of a
> byte and the nature of data (the quote in the new man page is from
> the section "Unresolved Issues", where he details concerns about
> portability).
>
> [...]
>
> There's nothing stopping a determined soul from porting Linux to an
> unusual architecture that does not have an 8-bit primitive; for the
> sake of compatibility, that port would undoubtedly require a few
> hacks to emulate an 8-bit interface, but that's just the kernel! The
> user space is an entirely different domain, which might eschew POSIX
> compliance (targeting instead just the looser constraints of the C
> standard), and thereby place on the programmer the burden of
> structuring data properly.
>
> Even if there were the *strictest* compliance to POSIX, guess what?
> An `unsigned short' under POSIX ain't necessarily 16 bits; like the
> C standard, POSIX requires only that an `unsigned short' be capable
> of representing *at least* 16 bits:
>
> http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/limits.h.html#tag_13_23_03_06
> {USHRT_MAX}
> Maximum value for an object of type unsigned short.
> Minimum Acceptable Value: 65 535
>
> All of your code that uses an uninitialized `unsigned short' to read
> in a single 2-byte datum is wrongheaded, even under POSIX; it just
> "happens to work", at least for now. You've got to clear those
> "extra" higher-order bits if you don't want them used inadvertently
> in your calculation. That is, you've got to write a program that is
> aware of the sizes of even basic data types.
>
> As described by Linus's master's thesis, that's why the Linux kernel
> targets a header-based "virtual machine" that provides architecture-
> specific implementations of integer types with precise widths (e.g.,
> `u8', `u16', or `u32); similarly, that's why this man page mentions
> C99's fixed-width integer types (e.g., `uint8_t', `uint16_t', etc.).
>
> [...]
>
> The new man page explicitly discusses issues like this, but con-
> centrates more on the narrow topic of what a "byte" or a "char" is.
> Perhaps the purpose of this man page would be more obvious if other
> data types (like `short') were also listed in the SYNOPSIS, and
> further discussed in the DESCRIPTION. Perhaps the man page should
> also delve into Linux's integer types.
>
> What do you think?
I've expanded the SYNOPSIS to include information that is perhaps more
widely useful, though I haven't yet expanded the DESCRIPTION to follow
suit; here is its ASCII rendering (without bolding or italicizing, and
without a couple fancy Unicode characters for the arrow symbols):
BYTE(7) Linux Programmer's Manual BYTE(7)
NAME
byte - exactly 8 bits; the smallest addressable unit in the kernel
char - at least 8 bits; the smallest addressable unit in C
SYNOPSIS
Linux and POSIX (and modern computing)
byte <-> exactly 8 bits This is the definition of "byte"
used throughout this manual.
POSIX
char <-> exactly 1 byte
short <-> at least 2 bytes
int <-> at least 4 bytes
long <-> at least 4 bytes
long long <-> at least 8 bytes
Standard C
char <-> at least 1 byte <- Beware!
short <-> at least 2 bytes
int <-> at least 2 bytes <- Beware!
long <-> at least 4 bytes
long long <-> at least 8 bytes
IP16L32 (16-bit x86; "near" pointers)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 2 bytes
long <-> exactly 4 bytes
a pointer <-> exactly 2 bytes
I16LP32 (16-bit x86; "far" pointers)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 2 bytes
long <-> exactly 4 bytes
a pointer <-> exactly 4 bytes
ILP32 (32-bit x86)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 4 bytes
long <-> exactly 4 bytes
long long <-> exactly 8 bytes
a pointer <-> exactly 4 bytes
IL32P64 or LLP64 (x86-64)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 4 bytes
long <-> exactly 4 bytes
long long <-> exactly 8 bytes
a pointer <-> exactly 8 bytes
I32LP64 or LP64 (x86-64)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 4 bytes
long <-> exactly 8 bytes
long long <-> exactly 8 bytes
a pointer <-> exactly 8 bytes
ILP64 (SPARC64)
char <-> exactly 1 byte
short <-> exactly 2 bytes
int <-> exactly 8 bytes
long <-> exactly 8 bytes
long long <-> exactly 8 bytes
a pointer <-> exactly 8 bytes
SILP64 (Cray)
char <-> exactly 1 byte
short <-> exactly 8 bytes
int <-> exactly 8 bytes
long <-> exactly 8 bytes
long long <-> exactly 8 bytes
a pointer <-> exactly 8 bytes
...
DESCRIPTION
[...]
Sincerely,
Michael Witten
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic