[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-distutils-sig
Subject:    Re: [Distutils] Update to my skeletal PEP for a new build system interface
From:       Paul Moore <p.f.moore () gmail ! com>
Date:       2015-11-09 18:42:34
Message-ID: CACac1F-yTG_8YNDPc3Qi+gxj4JDdbHEa1CfOv0SewccZG=uqFQ () mail ! gmail ! com
[Download RAW message or body]

On 9 November 2015 at 17:21, Nathaniel Smith <njs@pobox.com> wrote:
> On Mon, Nov 9, 2015 at 7:34 AM, Paul Moore <p.f.moore@gmail.com> wrote:
>> On 9 November 2015 at 05:20, Nathaniel Smith <njs@pobox.com> wrote:
>>> A *source tree* is something like a VCS checkout. We need a standard
>>> interface for installing from this format, to support usages like
>>> ``pip install some-directory/``.
>>
>> I still find these two definitions unhelpful, sorry.
>>
>> We don't *need* an interface to install from a source tree. It's
>> entirely feasible to have a standard interface to build a sdist from a
>> source tree and go source tree -> sdist -> wheel -> install. That
>> doesn't cater for editable installs, nor does it cater for reusing
>> things like object files from previous builds, so there may be
>> *benefits* to having a richer interface than this, but it's wrong to
>> say it's needed.
>
> I am confuse. All that sentence is saying is that (a) it is useful to
> have the phrase "source tree" as distinct from "sdist" so we can talk
> about them (which I assume you agree about because you use that phrase
> in your response :-)),

Agreed.

> and (b) there must be *some* interface that
> allows people to type "pip install some-directory/" and have it work
> because that's a feature we have to support (which I assume you agree
> about because you immediately propose an interface for supporting that
> feature).

Are we talking at cross purposes here? The end user interface "pip
install directory" is OK. What I think this PEP is saying is that we
need a way for pip to *implement* that functionality in terms of
primitive operations that the "source tree" must support. That, again,
I'm fine with. But you're then saying (I think) that the primitive
operation a source tree must provide is an "install" operation - and
that's what I fundamentally disagree with. The source tree should
provide a "build" primitive. If we agree on that (which I think we do,
but I don't think the PEP says so), then there's still a further
point, on which I think we do disagree, and that's over sdists.

I think that there are *two* steps within the build process, and these
need to be separated out:

1. Make a structured archive of the project's sources. This includes
creation of all generated source files that can be created in a
target-independent way. This would include (static) metadata,
generated source files such as cython output, etc. The point about
this archive is that it is fully target-independent, and does not
require any tools to build it that are not fundamentally
target-dependent. This is what I consider to be the "sdist". There
should only ever need to be one sdist for a given name/version of a
project, precisely because it's totally portable, by design.

2. Create target-dependent installable wheels. This is the "build"
step, in the sense that it's when you run a compiler to create
platform-specific binaries.

With this model, the install process is specifically

source tree ---> sdist ---> wheel ---> installed package

It is possible that tools could merge some of these steps, but a
generic tool like pip that manages the running of the steps in an
appropriate order needs to work in terms of the fundamental building
blocks. So I am strongly opposed to proposals that treat source tree
---> wheel as a primitive operation, because they hamper pip's ability
to manage things at the level of the fundamental steps.

One of the worst aspects of distutils, and one that pip is still far
from free of, is the fact that distutils provides merged steps like
source tree ---> installed package, and we (mistakenly, in hindsight)
used them to "optimise" the way pip works. It did optimise things in
some ways, I guess, but it makes it really hard to disentangle things
when we want to modularise processing.

The above is of course idealised. Editable installs are one example of
something that simply doesn't follow this pattern, and as far as I can
see they make no sense *except* as a source tree --> editable install
one-step operation. Also, modularising the steps to this extent does
have downsides - separating source tree --> sdist and sdist --> wheel
makes it harder to do "in place rebuild" optimisations. We can agree
or disagree on the trade-offs, or we can work on trying to get the
best of both worlds, but I still think we should be starting
(certainly when working at the spec/PEP level) from a clean conceptual
model.

> It sounds like we do disagree about the details of what this interface
> should look like and thus how "pip install some-directory/" should
> work internally, but that's not a problem with the definition (or
> indeed something that this PEP's text currently takes any stance on at
> all :-)).

As I say, I think we're talking at cross purposes. I read the PEP as
trying to specify (the wrong) primitives for pip to use. I'm not sure
what you intend the PEP to say - maybe that "pip install <directory>"
is the canonical install command? I don't think that needs a PEP, it's
just how pip works (and other tools may choose to expose things in a
different manner).

>> I suspect you're reluctant to require a "source tree -> sdist"
>> interface, because the author of flit isn't comfortable with having
>> such a thing. That's OK - if you want to note that a benefit of going
>> direct to install (or wheel) is that tools that don't allow you to
>> create a sdist are supported, then let's make that explicit. Expect
>> plenty of pushback on the idea of tools that don't supply sdists
>> though...
>
> I actually haven't talked to Thomas about this particular point at
> all, and actually part of what started all this was my looking at flit
> and going "this is cool, but c'mon, you can't just throw away sdists"
> :-).
>
> The reason I'm reluctant to require a "source tree -> sdist" interface
> is described here:
>     https://mail.python.org/pipermail/distutils-sig/2015-November/027636.html
>
> and also at the very top of this long email (which for some reason I
> can't seem to find in the mail.python.org archives?):
>     https://www.mail-archive.com/distutils-sig@python.org/msg23144.html
>
> The TL;DR is: obviously we need source tree -> sdist operations
> somewhere, and obviously we need mechanisms to increase the
> reliability of builds -- we all agree that there's some irreducible
> complexity there, those issues need to be addressed, the question is
> just where to put that complexity. I think putting it into the PEP for
> the build frontend <-> build backend interface is the wrong place,
> because it increases spec complexity (the worst kind of complexity)
> and it rules out the useful feature of incremental rebuilds. (And by
> "useful feature" there I mean "if we regress from distutils by failing
> to support this, then there's a good chance downstream devs will
> simply refuse to use our new design".)

But here I think we have a new term that's adding confusion. Pip isn't
a "build frontend". In 99% of cases pip does no building at all.

Basically, pip is a manager of build and install steps, and to manage
those steps successfully, it needs clear definitions of the steps
involved. In the extreme case, if there's a step "take a source tree
and install it" you've left nothing for pip to manage, and you may as
well go back to setup.py install.

I think that extracting and formalising the fundamental ("atomic" if
you like) steps that constitute going from a source tree to an
installed package, is precisely the sort of simplification a spec/PEP
*must* do. In doing so, there are engineering trade-offs such as how
we reintroduce incremental rebuilds without compromising the model.
Such trade-offs may imply a need to add complexity to the spec (maybe
in terms of optional "combined" steps such as source tree --> wheel),
but it should be clear that these are (a) optional (as in, the process
works fine with just the atomic steps) and (b) optimisations (as in,
they can't alter the ultimate behaviour as defined in terms of atomic
steps).

>> Certainly your definition of a sdist is general enough that it doesn't
>> preclude such things. But on the other hand, it doesn't offer any
>> suggestion that this is an important feature of a sdist (and it is - I
>> say that as someone who has needed to build wheels from a sdist and
>> doesn't have Cython installed). From your definition, people will
>> infer that zipping up a development directory makes a sdist, and so
>> that's what they'll do. Because after all, making Cython a build
>> requirement and generating the C at build time is *also* an option,
>> it's just not as friendly to the average user.
>
> Hmm, I certainly agree that it doesn't preclude such things, because I
> am very aware of this use case (I maintain projects that handle Cython
> in exactly the way you describe), and it never occurred to me that
> this could *not* be supported :-). I'm not sure what you're worried
> about exactly? Right now, zipping up a development directory actually
> is a valid way of making an sdist, and nonetheless projects actually
> do go to elaborate lengths to trick distutils into including generated
> .c files. So I don't think it's likely they'll stop because of some
> PEP that neglected to explicitly point out that this was possible :-).
> But if you think the wording could be improved I'm certainly open to
> that.

I think that we currently have so much confusion over "what a sdist
is" that a new over-general definition isn't going to help. What we
need to do is to *pin down* the definition of a sdist, not allow the
term to continue to mean too much (and hence, ultimately, very
little).

Does my definition of a sdist above in terms of being
target-independent but containing all files that can be generated in a
target-independent way clarify what I'm intending? I'd be happy if
there was wording that left it as optional how much a project needed
to eliminate build dependencies by including the output of those
dependencies in the sdist, but I'd much prefer it if there was a
strong implication that if files could be generated without reference
to the target architecture, and doing so eliminated a build
dependency, then they should. (To give a specific example, I'd prefer
it if it was clear that sdists should always include C sources
generated by cython - even though that requirement isn't enforceable
in any practical sense).

> (I guess I do have some generic preference that we not insist on PEPs
> serving as end-user documentation -- the intended audience here is
> experts, the definitions are written to mean exactly what they say,
> etc., and there are real trade-offs between being precise and being
> easily comprehensible by non-experts. But I also would like you to be
> happy :-).)

Agreed we don't intend these things to be for end users. But I think
it's important that the experts have something detailed and precise,
as ultimately they'll have to implement code based on the PEP. And
worse still, anyone wanting to implement an alternative to pip has a
right to expect that everything they need is in a PEP, not in
"people's understanding".

I don't know if it's clear (I hope it is but it's hard to be sure :-))
but my comments are from the perspective of someone who knows the
internals of pip, but would like to be able to (re-) write it without
ever having to refer to pip's code in order to do so. I think that's a
reasonable goal to aim for, as not being able to do that is precisely
what got us into the mess where we daren't touch distutils because we
don't know what it's supposed to do other than "what it does"...

Thanks for considering my happiness :-) It's not too easy to make me
miserable, so don't worry - the big issue is that I enjoy long complex
detail-oriented debates, so you're better off not trying *too* hard to
increase my happiness in that direction!!! :-)

Paul
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic