'Re: [llvm-dev] [cfe-dev] RFC: Add an llvm::vfs::OutputManager to allow Clang to virtualize compiler '

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       llvm-dev
Subject:    Re: [llvm-dev] [cfe-dev] RFC: Add an llvm::vfs::OutputManager to allow Clang to virtualize compiler 
From:       "Duncan P. N. Exon Smith via llvm-dev" <llvm-dev () lists ! llvm ! org>
Date:       2021-04-28 22:01:50
Message-ID: C10079A1-B296-4C04-A355-DEADD03A8ABD () apple ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hi Sam:

> On 2021 Mar  15, at 14:18, Duncan P. N. Exon Smith via cfe-dev \
> <cfe-dev@lists.llvm.org> wrote: 
> Thanks for the replies Sam! I've just back after being out for a bit; I'll try to \
> give a substantive reply later this week. 

(Took longer than I'd intended / expected for me to circle back, but here I am.)

> > I may have missed it (apologies if so), but is there somewhere a breakdown of \
> > which performance-critical configurations this is aimed at? (e.g. many small \
> > outputs that don't benefit from streaming, mirrored to two output streams, but \
> > one is null) Performance is certainly going to cut against simplicity sometimes, \
> > but it's hard to evaluate without knowing what we're optimizing for.

The main use case for this optimization — and the trigger for most of the \
                complexity in this proposal — is to support the module cache, both \
                by
- using output backend as a new implementation strategy for InMemoryModuleCache/etc. \
                and
- (optionally?) capturing just-written implicit modules in a client-defined output \
backend. The specific optimization I wanted to enable was for the "normal" case where \
the client is "normal" compilation from `/usr/bin/clang`, and we're writing to disk. \
                In that case, it seems optimal to:
- use something akin to llvm::FileOutputBuffer to efficiently write to disk (using \
                mmap) and
- save that just-written file-backed memory buffer to FileManager / \
InMemoryModuleCache / vfs::InMemoryFileSystem / somewhere.

Having had some more time to reflect (and getting some experience building on top of \
this patch for some tangentially-related experiments), I'm leaning strongly toward \
*not* trying to solve the in-memory module cache as an inherent part of \
OutputBackend, and virtualizing the module cache separately from (or on top of) this \
abstraction.

At a high-level, here's what I'm planning to do:

1. Minimal OutputBackend with a slimmed down OutputFile class, and two \
                implementations: OnDiskOutputBackend and NullOutputBackend.
- We can use this to discuss tradeoffs around OutputConfig (enum+bitset wrapper for \
                named-based parameters vs. struct for full flexibility).
- Doesn't really need output directories: clients for now can be expected to provide \
absolute paths, or call `sys::fs::change_directory`.

2. OutputDirectory / CWD / etc.

3. InMemoryOutputBackend, which saves files to a vfs::InMemoryFileSystem
- Maybe can't quite land before OutputDirectory is worked out, but most of the review \
can happen in parallel.

4. Temp files and temp directories
- Adding shared infrastructure that InMemoryOutputBackend and other virtual backends \
                can use.
- Enables threading more outputs through the outputbackend

5. Look more concretely at how best to virtualize the module cache using this.
- Or maybe this thinking will / should be front-loaded...

WDYT?

Thanks,
Duncan

> > > Only other thing is the scope of the patch - it's useful that as an RFC it's \
> > > comprehensive and shows things fitting together.
> 
> Absolutely; agreed it needs to be split up for review.
> 
> > On 2021 Feb  22, at 06:11, Sam McCall <sammccall@google.com \
> > <mailto:sammccall@google.com>> wrote: 
> > Sorry about the long delay!
> > 
> > This looks much better to me, thank you for the simplifications!
> > I've made a bunch of notes on the patch itself which I'll send now too. \
> > High-level things inline: TL;DR: stream/buffer confusion in outputfile, scope. 
> > On Wed, Feb 10, 2021 at 3:12 AM Duncan P. N. Exon Smith <dexonsmith@apple.com \
> > <mailto:dexonsmith@apple.com>> wrote: Sam, any thoughts here?
> > 
> > > On 2021 Feb  2, at 19:36, Duncan P. N. Exon Smith via llvm-dev \
> > > <llvm-dev@lists.llvm.org <mailto:llvm-dev@lists.llvm.org>> wrote: 
> > > Update: I've incorporated much of Sam's feedback into the main patch \
> > > (https://reviews.llvm.org/D95501 <https://reviews.llvm.org/D95501>). 
> > > - Simplify OutputConfig, restricting it to semantic information about a \
> > > specific output. Sink all backend configuration to flags on the backends \
> > > themselves.
> > 
> > The remaining options look great, and agree that we can't expect everything to be \
> > configured globally.
> > > - Remove OutputManager, instead exposing OutputBackend directly.
> > 
> > Hooray! 
> > > - Merge Output and OutputDestination into a single class called OutputFile, and \
> > >                 rename the API for creating them to \
> > >                 OutputBackend::createFile().
> > > - Add support for working directories via OutputDirectory, and add \
> > >                 OutputBackend::getDirectory() and \
> > >                 OutputBackend::createDirectory().
> > > - Add support for createUniqueFile() and createUniqueDirectory(), both heavily \
> > > used in clang. Backends without read access can use StableUniqueEntityAdaptor \
> > > to implement these.
> > 
> > This is definitely needed, but there are a few design options and I can't say I \
> > precisely understand either the requirements or the solution in the patch. I \
> > think it would be useful to review this separately. 
> > > 
> > > The main thing not settled is the threading guarantees. Restating the straw man \
> > > I proposed: 
> > > - All OutputBackend APIs are safe to call concurrently. Since OutputDirectory \
> > >                 *is* an OutputBackend, it can be used concurrently as well.
> > > - An OutputFile cannot be used concurrently, but two files from the same \
> > > backend can be.
> > 
> > LGTM 
> > > 
> > > Interested in everyone's thoughts on that; if that sounds reasonable I can \
> > > update the patch to make it so. 
> > > Two other points:
> > > 
> > > - Sam proposed dropping OutputConfig. I don't think we can, as the API client \
> > > needs to communicate semantic information about specific outputs, not about the \
> > >                 backends generally.
> > > - Sam suggested OutputDestination (now OutputFile) seemed bloated. I talked \
> > > through some of the details in my previous response; most of the complexity is \
> > > there to make MirroringOutputBackend work efficiently and avoid duplication in \
> > > subclasses. As a counterpoint, the only API a concrete subclass needs to \
> > > override is storeContentBuffer().
> > 
> > Hmm, OutputFile still feels caught between two concepts: a buffer-based one and a \
> > stream based one. I understand the goal to make this easy to implement and easy \
> > to use, but having a lot of conceptual distance between the two, and optional \
> > methods/implementation strategies is easy to misunderstand. 
> > I may have missed it (apologies if so), but is there somewhere a breakdown of \
> > which performance-critical configurations this is aimed at? (e.g. many small \
> > outputs that don't benefit from streaming, mirrored to two output streams, but \
> > one is null) Performance is certainly going to cut against simplicity sometimes, \
> > but it's hard to evaluate without knowing what we're optimizing for. 
> > > MirroringOutputBackend work efficiently
> > One thing I don't understand about MirroringOutputBackend - why does it buffer \
> > the content? It seems possible to create a raw_pwrite_stream that broadcasts to \
> > raw_pwrite_streams of the underlying outputs.
> > > 
> > > Sam, let please take another look and let me know if you have more high-level \
> > > comments.
> > 
> > Only other thing is the scope of the patch - it's useful that as an RFC it's \
> > comprehensive and shows things fitting together. But the amount of detail is a \
> > bit overwhelming :-) I think the parts regarding directories, unique files, \
> > mirroring/filtering backends, many of the output options would be easier to \
> > review in detail in isolated patches. 
> > > Others, if you have feedback, please let me know!
> > > 
> > > Thanks!
> > > Duncan
> > > 
> > > > On 2021 Jan  28, at 11:24, Duncan P. N. Exon Smith via llvm-dev \
> > > > <llvm-dev@lists.llvm.org <mailto:llvm-dev@lists.llvm.org>> wrote: 
> > > > Thanks for the detailed response Sam!
> > > > 
> > > > > This roughly corresponds to OutputBackend + OutputDestination in the patch, \
> > > > >                 except:
> > > > > - the OutputConfig seems like it belongs to particular backends, not the \
> > > > > overall backend abstraction
> > > > 
> > > > Ideally the OnDiskOutputConfig would almost entirely be settings on \
> > > > OnDiskOutputBackend since no one else will care. It isn't that way in the \
> > > > initial patch because Clang decides most of this stuff on a per-output basis. \
> > > > Maybe there's a refactoring that could fix it. 
> > > > Here are the problems I hit that led to this design:
> > > > 
> > > > 1. Some properties need to be associated with specific outputs, not backends, \
> > > >                 because they relate to properties of the outputs themselves. \
> > > >                 Here are two:
> > > > - ClientIntentOutputConfig::NeedsSeeking
> > > > - OnDiskOutputConfig::OpenFlagText
> > > > 
> > > > Maybe also something like:
> > > > - OnDiskOutputConfig::CreateDirectories
> > > > (and maybe others)
> > > > 
> > > > I couldn't think of a good way to solve this besides passing in configuration \
> > > > at output creation time. 
> > > > 2. Some "configurers" may be handed a pre-constructed opaque OutputManager / \
> > > > OutputBackend and need to configure internal OnDiskOutputBackend. In \
> > > >                 particular, I found that some tooling wants to turn off:
> > > > - OnDiskOutputConfig::RemoveFileOnSignal
> > > > (others flags might benefit)
> > > > 
> > > > This is "documented" by the call chains of \
> > > > clang::CompilerInstance::createOutputFile. 
> > > > I reused the solution from #1 since it needed (?) to exist anyway. Another \
> > > > option would be to add an OutputBackend visitor, to find and reconfigure any \
> > > > "contained" OnDiskOutputBackend. This seems pretty heavyweight though. 
> > > > > - OutputDestination has a lot of stuff in it, I'll need to dig further into \
> > > > > the patch to try to understand why
> > > > 
> > > > Firstly, `Output` has the user-facing abstraction. `OutputDestination` has \
> > > > low-level bits. Another reasonable name might be `OutputImpl` but \
> > > > `OutputDescription` seemed more descriptive. 
> > > > Secondly, most of the low-level bits avoid unnecessary copies of content \
> > > > buffers. Maybe there are simpler ideas for how to do this, but here are the \
> > > >                 design goals I was working around:
> > > > - Avoid buffering content unless necessary.
> > > > - Avoid duplicating content buffers unless necessary.
> > > > - Support multiple destinations (for mirrored backends) without breaking the \
> > > > above rules. The obvious "interesting" case is in-memory + on-disk (in either \
> > > >                 order).
> > > > - Make sub-classes of `OutputDestination` as small as possible given the \
> > > > above. 
> > > > Thirdly, there's some boilerplate related to lifetimes of multiple \
> > > > destinations. Probably having an explicit `MirroringOutputDestination` would \
> > > > be better. 
> > > > > As for OutputManager itself, I think this belongs in clang, if it's needed. \
> > > > > Its main job seems to be to store a set of default options and manage the \
> > > > > lifecycle of backends, and it's not obvious that those sorts of concerns \
> > > > > will generalize across tools or that there's much to be gained from sharing \
> > > > > code here.
> > > > 
> > > > In the end, its main job is to wrap an OutputDestination (low-level \
> > > > abstraction) in an Output (high-level abstraction). Output does a bunch of \
> > > > work for OutputDestination, such as manage the intermediate content buffer. 
> > > > Probably it's better to have the OutputBackend return an Output directly (and \
> > > > get rid of the OutputManager). 
> > > > (At one point OutputManager was managing multiple backends directly and was \
> > > > involved in the store operation(s); but since I factored that logic out to \
> > > > MirroringOutputBackend and OutputDestination it probably doesn't need to \
> > > > exist.) 
> > > > > - Any other major concerns / suggestions?
> > > > > Thread-safety of the core plug-in interface is something that would be nice \
> > > > > to explicitly address, as this has been a pain-point with vfs::FileSystem. \
> > > > > It's tempting to say "not threadsafe, you should lock", but this results in \
> > > > > throwing an unnecessary global lock around all FS accesses in multithreaded \
> > > > > programs, in the common case that the real FS is being used.
> > > > 
> > > > Right, I hit multi-threading limitations myself when prototyping a follow-up \
> > > > patch (didn't get around to posting it until this AM): \
> > > > https://reviews.llvm.org/D95628 <https://reviews.llvm.org/D95628> 
> > > > My intuition is:
> > > > 
> > > > Thread-safe:
> > > > - OutputBackend::createDestination
> > > > - Concurrently touching more than one Output/OutputDestination
> > > > 
> > > > Thread-unsafe:
> > > > - Using a single Output/OutputDestination concurrently
> > > > 
> > > > This all seems cheap and easy to maintain because of the limited interface. \
> > > > The problem I hit in the above patch is that for the InMemoryOutputBackend \
> > > > you also need any readers of the InMemoryFileSystem to be thread-safe. 
> > > > Relatedly, https://reviews.llvm.org/D95583 <https://reviews.llvm.org/D95583> \
> > > > (a prep patch for the above) allows the llvm::LockManager to be skipped. This \
> > > > is not really about outputs — it's inter-process coordination to avoid \
> > > > doing redundant work in competing Clang command-line invocations (at one \
> > > > point it was needed for correctness, but the main benefit now is to avoid \
> > > > taxing the on-disk filesystem) — but it does touch on the idea of exclusive \
> > > > write access. 
> > > > > Relatedly, working-directory/relative-path handling should be considered.
> > > > 
> > > > Yeah, you're probably right. Any specific thoughts on what to do here? It \
> > > > seems like llvm::vfs::FileSystem gets them pretty wrong for Windows; see \
> > > > (e.g.) the discussion at the end of https://reviews.llvm.org/D70701 \
> > > > <https://reviews.llvm.org/D70701>. 
> > > > Even for POSIX, working directories could complicate concurrency guarantees. \
> > > > A simple solution is to not have an API for changing working directories, and \
> > > > instead model that by creating a proxy backend (ChangeDirectoryOutputBackend) \
> > > > that prepends the (new) working directory to new outputs; since each backend \
> > > > has an immutable view of the CWD concurrent access should be fine. 
> > > > Two other thoughts related to paths:
> > > > 
> > > > 1. I wonder if this abstraction treats the "path" as too central a property \
> > > > of the output. Can this evolve to allow a compiler to build a directory \
> > > > structure bottom-up without having to know the destination a priori? (E.g., \
> > > > an inner layer makes a blob, a middle layer makes a tree out of a few of \
> > > > those, and an outer layer decides where to put the tree.) 
> > > > I think it can. One approach is to use proxy backends:
> > > > - Inner layer writes to '-' / stdout. (Why not pass a pre-constructed \
> > > >                 Output/raw_pwrite_stream? See note below.)
> > > > - Middle layer passes the instances of the inner layer a proxy backend that \
> > > >                 maps stdout to the various blob names. (E.g., `/name1` and \
> > > >                 `/name2`.)
> > > > - Outer layer passes the middle layer a proxy backend that "does the right \
> > > > thing" with the tree. (E.g., writes to `/Users/dexonsmith/name{1,2}`.) 
> > > > => If writing on-disk, "the right thing" is to prepend the correct output \
> > > > directory for the tree and pass each file to a regular OnDiskOutputBackend. \
> > > > => If writing to (e.g.) Git's CAS, "the right thing" is to call \
> > > > git-hash-object on Output::close and track the name-to-hash mapping as \
> > > > outputs come in, and then call "git-mktree" when the middle layer is "done" \
> > > > (maybe a callback in the backend-passed-to-the-middle-layer's destructor). 
> > > > (IOW, a refactoring where instead of passing absolute paths / directories / \
> > > > filenames down through all the layers, proxy output backends build up the \
> > > > path/destination piece-by-piece.) 
> > > > I think it's doable with the abstraction as-proposed. But let me know if \
> > > >                 anyone has concerns. For example:
> > > > - Is `Output::getPath()` an abstraction leak?
> > > > - Should we have a `createOutput` that doesn't take a path?
> > > > - ...
> > > > 
> > > > Why not pass a pre-constructed Output/raw_pwrite_stream to the inner layer?
> > > > - The inner layer needs an output backend if it (sometimes) dumps "side" \
> > > > files (such as AST record layouts into ".ast-dump" or textual IR into ".ll"). \
> > > > This avoids needing to know the on-disk file path ("/path/to/output" => \
> > > > "/path/to/output.ll"), or to even know whether there's a disk. 
> > > > 2. How should we virtualize stdout / stderr?
> > > > - "'-' means stdout" is probably good enough since LLVM makes that assumption \
> > > >                 all over. Unless someone disagrees?
> > > > - I'm not sure what to do with stderr. No one ever "closes" this stream.
> > > > - Are there other outputs that don't have path names?
> > > > 
> > > > 3. Do we need to virtualize llvm::sys::fs::create_directories?
> > > > - If so, why?
> > > > 
> > > > > (And a question/concern about the relationship between input and output \
> > > > > virtualization, elaborated at the bottom)
> > > > 
> > > > > Why doesn't this inherit from llvm::vfs::FileSystem?
> > > > > Separating the input and output abstractions seems a bit cleaner. It's easy \
> > > > > enough to join them, when useful: e.g., write to an `InMemoryFileSystem` \
> > > > > (via an `InMemoryOutputBackend`) and install this same FS in the \
> > > > > `FileManager` (maybe indirectly via an `OverlayFileSystem`). I agree with \
> > > > > separating the interfaces. In hosted environments your input VFS is often \
> > > > > read-only and outputs go somewhere else.  
> > > > > But I wonder, is there an implicit assumption that data written to \
> > > > > OutputManager is readable via the (purportedly independent) \
> > > > > vfs::FileSystem? This seems like a helpful assumption for module caching, \
> > > > > but is extremely constraining and eliminates many of the simplest and most \
> > > > > interesting possibilities. 
> > > > > If you're going to require the FileSystem and OutputBackend to be linked, \
> > > > > then I think they *should* be the same object.
> > > > 
> > > > No, I don't think that should be a requirement / expectation. It's a specific \
> > > > requirement for Clang's implicitly built modules, and I think Clang should be \
> > > > responsible for hooking them together when necessary. 
> > > > > But if it's mostly module caching that requires that, then it seems simpler \
> > > > > and less invasive to virtualize module caching directly (put(module, key) + \
> > > > > get(key)) rather than file access.
> > > > 
> > > > Agreed, explicitly virtualizing module caching might be a good thing to do. \
> > > > Either way this is Clang's job to coordinate; I just think the output manager \
> > > > should efficiently support mirroring outputs to an additional/custom backend \
> > > > that Clang installs. 
> > > > Note: implicit modules doesn't currently rely on reading the modules it has \
> > > > just built from disk. It uses InMemoryModuleCache to avoid having to read \
> > > > anything it has written to disk and to ensure consistency between \
> > > > CompilerInstances across an implicit build. It's pretty awkward though. 
> > > > > On 2021 Jan  28, at 03:19, Sam McCall <sammccall@google.com \
> > > > > <mailto:sammccall@google.com>> wrote: 
> > > > > Really glad to see this work, virtualizing module cache is something we've \
> > > > > wanted to experiment with for tooling, but never got to. I want to get into \
> > > > > the patches in more detail, but some high-level thoughts... 
> > > > > On Wed, Jan 27, 2021 at 6:23 AM Duncan P. N. Exon Smith via cfe-dev \
> > > > > <cfe-dev@lists.llvm.org <mailto:cfe-dev@lists.llvm.org>> wrote: TL;DR: \
> > > > > Let's virtualize compiler outputs in Clang. These patches would get us \
> > > > >                 started:
> > > > > - https://reviews.llvm.org/D95501 <https://reviews.llvm.org/D95501> Add \
> > > > >                 llvm::vfs::OutputManager
> > > > > - https://reviews.llvm.org/D95502 <https://reviews.llvm.org/D95502> Initial \
> > > > > adoption of llvm::vfs::OutputManager in Clang. 
> > > > > Questions for the reader
> > > > > - Should we virtualize compiler outputs in Clang? (Hint: yes.)
> > > > > Definitely agree.
> > > > > 
> > > > > 	- Does this support belong in LLVM? (I think it does, so that non-Clang \
> > > > > tools can easily reuse it.) Ideally, the core abstraction (path -> \
> > > > > pwrite_stream) certainly belongs in LLVM, as well as the most common \
> > > > > implementations of it. Based on experience with VirtualFileSystem, I'd like \
> > > > > this interface to be as narrow as possible to make it feasible to \
> > > > > implement/wrap correctly, and to reason about how the wider system \
> > > > > interacts with it. 
> > > > > This roughly corresponds to OutputBackend + OutputDestination in the patch, \
> > > > >                 except:
> > > > > - the OutputConfig seems like it belongs to particular backends, not the \
> > > > >                 overall backend abstraction
> > > > > - OutputDestination has a lot of stuff in it, I'll need to dig further into \
> > > > > the patch to try to understand why 
> > > > > As for OutputManager itself, I think this belongs in clang, if it's needed. \
> > > > > Its main job seems to be to store a set of default options and manage the \
> > > > > lifecycle of backends, and it's not obvious that those sorts of concerns \
> > > > > will generalize across tools or that there's much to be gained from sharing \
> > > > > code here. 
> > > > > 	- Is `llvm::vfs::` a reasonable namespace? (If not, suggestions? I think \
> > > > > `llvm::` itself is too broad.) llvm::vfs:: is definitely the right \
> > > > > namespace for the core writing stuff IMO. If more ancillary bits (parts \
> > > > > some but not all tools might use) need to go in llvm, llvm:: seems to be \
> > > > > the best namespace we have (like e.g. SourceMgr) but maybe we should add a \
> > > > > new one. But as mentioned, I'd prefer those to live in clang:: at least for \
> > > > > now. 
> > > > > - Do you have a use case that this won't address well?
> > > > > 	- Should that be fixed in the initial patch, or could this be evolved \
> > > > >                 in-tree to address that?
> > > > > - Any other major concerns / suggestions?
> > > > > Thread-safety of the core plug-in interface is something that would be nice \
> > > > > to explicitly address, as this has been a pain-point with vfs::FileSystem. \
> > > > > It's tempting to say "not threadsafe, you should lock", but this results in \
> > > > > throwing an unnecessary global lock around all FS accesses in multithreaded \
> > > > > programs, in the common case that the real FS is being used. 
> > > > > Relatedly, working-directory/relative-path handling should be considered.
> > > > > 
> > > > > (And a question/concern about the relationship between input and output \
> > > > > virtualization, elaborated at the bottom) 
> > > > > - If you think the above patches should be split up for initial review / \
> > > > > commit, how? Obviously my favorite would be to see a minimal core writable \
> > > > > VFS interface extracted and land that first. What's built on top of it is \
> > > > > less critical, and I'm not concerned about it landing in larger chunks. 
> > > > > 
> > > > > (Other feedback welcome too!)
> > > > > 
> > > > > Longer version
> > > > > There are a number of use cases for capturing compiler outputs, which I'm \
> > > > > hoping this proposal is a step toward addressing. 
> > > > > - Tooling wants to capture outputs directly, without going through the \
> > > > >                 filesystem.
> > > > > 	- Sometimes, tertiary outputs can be ignored, or still need to be written \
> > > > >                 to disk.
> > > > > - clang-scan-deps is using a form of stripped down "implicit" modules to \
> > > > > determine which modules need to be built explicitly. It doesn't really want \
> > > > >                 to be using the on-disk module cache—in-memory would be \
> > > > >                 better.
> > > > > - clang's ModuleManager manually maintains an in-memory modules cache for \
> > > > > implicit modules. This involves copying the PCM outputs into memory. It'd \
> > > > > be better for these modules to be file-backed, instead of copies of the \
> > > > > stream. 
> > > > > The patch has a bunch of details written / tested \
> > > > > (https://reviews.llvm.org/D95501 <https://reviews.llvm.org/D95501>). Here \
> > > > > are the high-level structures in the design: 
> > > > > - OutputManager—a shared manager for creating outputs without knowing \
> > > > >                 about the storage.
> > > > > - OutputConfig—configuration set on the OutputManager that can be \
> > > > >                 (partially) overridden for specific outputs.
> > > > > - Output—opaque object with a raw_pwrite_stream, output path, and \
> > > > > `erase`/`close` functionality. Internally, it has a linked list of output \
> > > > >                 destinations.
> > > > > - OutputBackend—abstract interface installed in an OutputManager to \
> > > > > create the "guts" of an output. While an OutputManager only has one \
> > > > >                 installed, these can be layered / forked / mirrored.
> > > > > - OutputDestination—abstract interface paired with an OutputBackend, \
> > > > >                 whose lifetime is managed by an Output.
> > > > > - ContentBuffer—actual content to allow efficient use of data by multiple \
> > > > > backends. For example, the installed backend is a mirror between an on-disk \
> > > > > and in-memory backend, the in-memory backend will either get the content \
> > > > > moved directly into an llvm::MemoryBuffer, or a just-written mmap'ed file. 
> > > > > The patch includes a few backends:
> > > > > 
> > > > > - NullOutputBackend, for ignoring all backends.
> > > > > - OnDiskOutputBackend, for writing to disk (the default), initially based \
> > > > >                 on the logic in `clang::CompilerInstance`.
> > > > > - InMemoryOutputBackend, for writing to an `InMemoryFileSystem`.
> > > > > - MirroringOutputBackend, for writing to multiple backends. \
> > > > >                 OutputDestination's API is designed around supporting this.
> > > > > - FilteringOutputBackend, for filtering which outputs get written to the \
> > > > > underlying backend. 
> > > > > Why doesn't this inherit from llvm::vfs::FileSystem?
> > > > > Separating the input and output abstractions seems a bit cleaner. It's easy \
> > > > > enough to join them, when useful: e.g., write to an `InMemoryFileSystem` \
> > > > > (via an `InMemoryOutputBackend`) and install this same FS in the \
> > > > > `FileManager` (maybe indirectly via an `OverlayFileSystem`). I agree with \
> > > > > separating the interfaces. In hosted environments your input VFS is often \
> > > > > read-only and outputs go somewhere else.  
> > > > > But I wonder, is there an implicit assumption that data written to \
> > > > > OutputManager is readable via the (purportedly independent) \
> > > > > vfs::FileSystem? This seems like a helpful assumption for module caching, \
> > > > > but is extremely constraining and eliminates many of the simplest and most \
> > > > > interesting possibilities. 
> > > > > If you're going to require the FileSystem and OutputBackend to be linked, \
> > > > > then I think they *should* be the same object. But if it's mostly module \
> > > > > caching that requires that, then it seems simpler and less invasive to \
> > > > > virtualize module caching directly (put(module, key) + get(key)) rather \
> > > > > than file access. 
> > > > > Other work in the area
> > > > > See also: https://reviews.llvm.org/D78058 <https://reviews.llvm.org/D78058> \
> > > > > (thanks to Marc Rasi for posting that patch, and to Sam McCall for some \
> > > > > feedback on an earlier version of this proposal). 
> > > > > Thanks for reading!
> > > > > Duncan
> > > > > _______________________________________________
> > > > > cfe-dev mailing list
> > > > > cfe-dev@lists.llvm.org <mailto:cfe-dev@lists.llvm.org>
> > > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev \
> > > > > <https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev@lists.llvm.org <mailto:llvm-dev@lists.llvm.org>
> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev \
> > > > <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> > > 
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev@lists.llvm.org <mailto:llvm-dev@lists.llvm.org>
> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev \
> > > <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

[Attachment #5 (unknown)]

<html><head><meta http-equiv="Content-Type" content="text/html; \
charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; \
line-break: after-white-space;" class="">Hi Sam:<br class=""><div><br \
class=""><blockquote type="cite" class=""><div class="">On 2021 Mar &nbsp;15, at \
14:18, Duncan P. N. Exon Smith via cfe-dev &lt;<a \
href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>&gt; \
wrote:</div><br class="Apple-interchange-newline"><div class=""><meta \
http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div \
style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: \
after-white-space;" class="">Thanks for the replies Sam! I've just back after being \
out for a bit; I'll try to give a substantive reply later this week.<br class=""><div \
class=""><br class=""></div></div></div></blockquote><div><br \
class=""></div><div>(Took longer than I'd intended / expected for me to circle back, \
but here I am.)</div><div><br class=""></div><div><blockquote type="cite" \
class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div \
class="gmail_quote"><div class="">I may have missed it (apologies if so), but is \
there somewhere a breakdown of which performance-critical configurations this is \
aimed at?</div><div class="">(e.g. many small outputs that don't benefit from \
streaming, mirrored to two output streams, but one is null)</div><div \
class="">Performance is certainly going to cut against simplicity sometimes, but it's \
hard to evaluate without knowing what we're optimizing \
for.</div></div></div></blockquote></blockquote><div class=""><br class=""></div><div \
class="">The main use case for this optimization — and the trigger for most of the \
complexity in this proposal — is to support the module cache, both by</div><div \
class="">- using output backend as a new implementation strategy for \
InMemoryModuleCache/etc. and</div><div class="">- (optionally?) capturing \
just-written implicit modules in a client-defined output backend.</div><div \
class="">The specific optimization I wanted to enable was for the "normal" case where \
the client is "normal" compilation from `/usr/bin/clang`, and we're writing to disk. \
In that case, it seems optimal to:</div><div class="">- use something akin to \
llvm::FileOutputBuffer to efficiently write to disk (using mmap) and</div><div \
class="">- save that just-written file-backed memory buffer to FileManager / \
InMemoryModuleCache / vfs::InMemoryFileSystem / somewhere.</div><div class=""><br \
class=""></div><div class="">Having had some more time to reflect (and getting some \
experience building on top of this patch for some tangentially-related experiments), \
I'm leaning strongly toward *not* trying to solve the in-memory module cache as an \
inherent part of OutputBackend, and virtualizing the module cache separately from (or \
on top of) this abstraction.</div><div class=""><br class=""></div><div class="">At a \
high-level, here's what I'm planning to do:</div><div class=""><br \
class=""></div><div class="">1. Minimal OutputBackend with a slimmed down OutputFile \
class, and two implementations: OnDiskOutputBackend and NullOutputBackend.</div><div \
class="">- We can use this to discuss tradeoffs around OutputConfig (enum+bitset \
wrapper for named-based parameters vs. struct for full flexibility).</div><div \
class="">- Doesn't really need output directories: clients for now can be expected to \
provide absolute paths, or call `sys::fs::change_directory`.</div><div class=""><br \
class=""></div><div class="">2. OutputDirectory / CWD / etc.</div><div class=""><br \
class=""></div><div class="">3. InMemoryOutputBackend, which saves files to a \
vfs::InMemoryFileSystem</div><div class="">- Maybe can't quite land before \
OutputDirectory is worked out, but most of the review can happen in \
parallel.</div><div class=""><br class=""></div><div class="">4. Temp files and temp \
directories</div><div class="">- Adding shared infrastructure that \
InMemoryOutputBackend and other virtual backends can use.</div><div class="">- \
Enables threading more outputs through the outputbackend</div></div><div><br \
class=""></div><div>5. Look more concretely at how best to virtualize the module \
cache using this.</div><div>- Or maybe this thinking will / should be \
front-loaded...</div><div><br class=""></div><div>WDYT?</div><div><br \
class=""></div><div>Thanks,</div><div>Duncan</div><br class=""><blockquote \
type="cite" class=""><div class=""><div style="word-wrap: break-word; \
-webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div \
class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><div \
dir="ltr" class=""><div class="gmail_quote"><div class="">Only other thing is the \
scope of the patch - it's useful that as an RFC it's comprehensive and shows things \
fitting together.</div></div></div></blockquote></blockquote><div class=""><br \
class=""></div>Absolutely; agreed it needs to be split up for \
review.</div></div></div></blockquote><blockquote type="cite" class=""><br \
class=""></blockquote><blockquote type="cite" class=""><div class=""><div \
style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: \
after-white-space;" class=""><div class=""><blockquote type="cite" class=""><div \
class="">On 2021 Feb &nbsp;22, at 06:11, Sam McCall &lt;<a \
href="mailto:sammccall@google.com" class="">sammccall@google.com</a>&gt; \
wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" \
class=""><div dir="ltr" class="">Sorry about the long delay!</div><div dir="ltr" \
class=""><div class=""><br class=""></div><div class="">This looks <b \
class="">much</b>&nbsp;better to me, thank you for the simplifications!</div><div \
class="">I've made a bunch of notes on the patch itself which I'll send now too. \
High-level things inline: TL;DR: stream/buffer confusion in outputfile, \
scope.</div></div><br class=""><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Wed, Feb 10, 2021 at 3:12 AM Duncan P. N. Exon Smith &lt;<a \
href="mailto:dexonsmith@apple.com" class="">dexonsmith@apple.com</a>&gt; wrote:<br \
class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class="">Sam, any thoughts here?<br class=""><div \
class=""><br class=""><blockquote type="cite" class=""><div class="">On 2021 Feb \
&nbsp;2, at 19:36, Duncan P. N. Exon Smith via llvm-dev &lt;<a \
href="mailto:llvm-dev@lists.llvm.org" target="_blank" \
class="">llvm-dev@lists.llvm.org</a>&gt; wrote:</div><br class=""><div class=""><span \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transf \
orm:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" \
class="">Update: I've incorporated much of Sam's feedback into the main patch \
(</span><a href="https://reviews.llvm.org/D95501" \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" \
target="_blank" class="">https://reviews.llvm.org/D95501</a><span \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transf \
orm:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" \
class="">).</span><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><br class=""><div class=""><div class="">- Simplify&nbsp;OutputConfig, \
restricting it to semantic information about a specific output. Sink all backend \
configuration to flags on the backends themselves.<br \
class=""></div></div></div></div></blockquote></div></div></blockquote><div \
class="">The remaining options look great, and agree that we can't expect everything \
to be configured globally.</div><blockquote class="gmail_quote" style="margin:0px 0px \
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class=""><div class="">- Remove&nbsp;OutputManager, instead \
exposing&nbsp;OutputBackend&nbsp;directly.<br \
class=""></div></div></div></div></blockquote></div></div></blockquote><div \
class="">Hooray!&nbsp;</div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class=""><div class=""></div><div class="">- \
Merge&nbsp;Output&nbsp;and&nbsp;OutputDestination&nbsp;into a single class \
called&nbsp;OutputFile, and rename the API for creating them \
to&nbsp;OutputBackend::createFile().<br class=""></div><div class="">- Add support \
for working directories via&nbsp;OutputDirectory, and \
add&nbsp;OutputBackend::getDirectory()&nbsp;and&nbsp;OutputBackend::createDirectory().<br \
class=""></div><div class="">- Add support \
for&nbsp;createUniqueFile()&nbsp;and&nbsp;createUniqueDirectory(), both heavily used \
in clang. Backends without read access can use&nbsp;StableUniqueEntityAdaptor&nbsp;to \
implement these.<br class=""></div></div></div></div></blockquote></div></div></blockquote><div \
class="">This is definitely needed, but there are a few design options and I can't \
say I precisely understand either the requirements or the solution in the \
patch.</div><div class="">I think it would be useful to review this \
separately.&nbsp;</div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class=""><div class=""></div><div class=""><br class=""></div><div \
class="">The main thing not settled is the threading guarantees. Restating the straw \
man I proposed:</div><div class=""><br class=""></div><div class="">- \
All&nbsp;OutputBackend&nbsp;APIs are safe to call concurrently. Since OutputDirectory \
*is* an OutputBackend, it can be used concurrently as well.</div><div class="">- \
An&nbsp;OutputFile&nbsp;cannot be used concurrently, but two files from the same \
backend can be.</div></div></div></div></blockquote></div></div></blockquote><div \
class="">LGTM&nbsp;</div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class=""><div class=""><br class=""></div><div class="">Interested in \
everyone's thoughts on that; if that sounds reasonable I can update the patch to make \
it so.</div><div class=""><br class=""></div><div class=""><div style="overflow-wrap: \
break-word;" class=""><div class=""><div class=""><div class="">Two other \
points:</div><div class=""><br class=""></div><div class="">- Sam proposed dropping \
OutputConfig. I don't think we can, as the API client needs to communicate semantic \
information about specific outputs, not about the backends generally.</div><div \
class="">- Sam suggested OutputDestination (now OutputFile) seemed bloated. I talked \
through some of the details in my previous response; most of the complexity is there \
to make MirroringOutputBackend work efficiently and avoid duplication in subclasses. \
As a counterpoint, the only API a concrete subclass needs to override is \
storeContentBuffer().</div></div></div></div></div></div></div></div></blockquote></div></div></blockquote><div \
class="">Hmm, OutputFile still feels caught between two concepts: a buffer-based one \
and a stream based one.</div><div class="">I understand the goal to make this easy to \
implement and easy to use, but having a lot of conceptual distance between the two, \
and optional methods/implementation strategies is easy to misunderstand.</div><div \
class=""><br class=""></div><div class="">I may have missed it (apologies if so), but \
is there somewhere a breakdown of which performance-critical configurations this is \
aimed at?</div><div class="">(e.g. many small outputs that don't benefit from \
streaming, mirrored to two output streams, but one is null)</div><div \
class="">Performance is certainly going to cut against simplicity sometimes, but it's \
hard to evaluate without knowing what we're optimizing for.</div><div class=""><br \
class=""></div><div class="">&nbsp;&gt;&nbsp;MirroringOutputBackend work \
efficiently<br class=""></div><div class="">One thing I don't understand about \
MirroringOutputBackend - why does it buffer the content? It seems possible to create \
a raw_pwrite_stream that broadcasts to raw_pwrite_streams of the underlying \
outputs.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class=""><div class=""><br class=""></div></div><div class="">Sam, let \
please take another look and let me know if you have more high-level \
comments.</div></div></div></blockquote></div></div></blockquote><div class="">Only \
other thing is the scope of the patch - it's useful that as an RFC it's comprehensive \
and shows things fitting together.</div><div class="">But the amount of detail is a \
bit overwhelming :-)</div><div class="">I think the parts regarding directories, \
unique files, mirroring/filtering backends, many of the output options would be \
easier to review in detail in isolated patches.</div><div \
class="">&nbsp;</div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div \
style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" \
class=""><div class=""><div \
style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal \
;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" \
class=""><div class="">Others, if you have feedback, please let me know!</div><div \
class=""><br class=""></div><div class="">Thanks!</div><div class="">Duncan</div><div \
class=""><br class=""></div><div class=""><div class=""><div class=""><blockquote \
type="cite" class=""><div class="">On 2021 Jan &nbsp;28, at 11:24, Duncan P. N. Exon \
</div></blockquote></div><br \
class=""></div>_______________________________________________<br class="">cfe-dev \
mailing list<br class=""><a href="mailto:cfe-dev@lists.llvm.org" \
class="">cfe-dev@lists.llvm.org</a><br \
class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br \
class=""></div></blockquote></div><br class=""></body></html>

[Attachment #6 (text/plain)]

_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

[prev in list] [next in list] [prev in thread] [next in thread]