'Re: PerlIO generally (was Re: [perl #XXXXX] segfault in PerlIOBuf_fill when filehandle is closed in'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       perl5-porters
Subject:    Re: PerlIO generally (was Re: [perl #XXXXX] segfault in PerlIOBuf_fill when filehandle is closed in
From:       Nicholas Clark <nick () ccl4 ! org>
Date:       2010-11-30 15:24:43
Message-ID: 20101130152443.GL24189 () plum ! flirble ! org
[Download RAW message or body]

On Mon, Nov 29, 2010 at 10:59:07PM -0600, Craig A. Berry wrote:
> On Mon, Nov 29, 2010 at 8:45 AM, Nicholas Clark <nick@ccl4.org> wrote:
> > On Fri, Nov 26, 2010 at 04:11:18PM +0000, Dave Mitchell wrote:
> 
> >>     Not much has been done to give sensible behaviour on re-entrancy; for
> >>     example, a buffer that has already been written once might get written
> >>     again. Fixing this sort of thing would require a large-scale audit of
> >>     perlio.c.
> >
> >
> > Sadly I think that we are really going to need to do that.
> >
> > Whilst PerlIO does the "job", I'm not that comfortable that it's great,
> > because:
> >
> > a: it was never my favourite thing because Nick I-S wrote it, and then thought
> >   about threadsafety.
> 
> I assume you mean mostly the combination of threads and signals (or
> threads and failure conditions).  I don't have any reason to disagree
> with your assessment of where we are now, but to be fair, was PerllO,
> when it was primarily developed 8-10 years ago, unusual as core
> components go in not having thread safety baked into the design?  As
> you well know, both thread handling and signal handing in core have
> undergone sea changes in the last 8 years.  At least it occurs to me
> things might have been worse if PerlIO had a lot of dependencies on a
> discarded threading model.

It was written between the release of 5.6.0 and 5.8.0
At the time it was already clear that 5005threads had failed, and ithreads
was the future. And if anything, it wasn't clear than that ithreads wasn't
actually the future we thought it was.

To me it seemed really obvious at the time that

a: like it or not, threads were becoming important
b: Perl 5 had really suffered because threads were bolted in as an afterthought
   rather than part of the initial design
c: Building a new system, that had to cope with threads, without thinking about
   threads from the start, was repeating an existing mistake.

> > b: on output, it conflates "flush" to mean either of
> >   i)  please perform a routine empty of this buffer
> >   ii) make damn sure this data is on the disk platter.
> 
> That one never bothered me; the two operations are really the same
> thing to their respective layers.   (I did have trouble with
> PerlIO_exportFILE and PerlIO_importFILE not being opposite operations
> -- *that* confused me.)

I forgot that one. Yes, that one confuses me.

Not really.

"make damn sure this data is on the disk platter" implies that you should
propagate down to the lowest level and perform an fsync().
A regular buffer flush doesn't mean that.

If you have a buffer layer (eg perlio) atop an OS layer (eg unix), the two
have different meanings, yet there is only one "flush" method on your file
handle. (And the way PerlIO is written, the buffered layers call their own
"flush" method to empty the buffer when it gets full)

> > c: Having just looked at the route down PerlIO_read(), it always copies.
> >   A "sane" (to my mind) system would be able to make the optimisation of
> >   "a read bigger than my buffer size can go direct to the user's buffer"
> 
> Can one make different layers share buffer space and still maintain
> layer separation?  Wouldn't every layer have to know how to share
> buffer space with every conceivable other layer that could be under it
> or over it, and wouldn't that make thread safety even more difficult?

No, I was more thinking that if you're doing fread() you

a: supply as much as you can by emptying your current buffer
b: if remaining size % buffer size is non-zero, you call the layer below you
   to fill the user's pointer direct
c: you call the layer below you to refill your buffer, and then use that to
   fulfill the last part of the request

whereas what it currently does is pump everything through its buffer, meaning
lots of copying that could be avoided.

> > d: It does nothing special about EAGAIN, but probably could have been designed
> >   to act sanely, which would have been a win

e: It's coded in terms of perl data structures such as SVs and AVs, which feels
   like it ought to allow a reduction in code. However
   i)  SVs and AVs have a lot of overhead that PerlIO (and other parts of the
       internals such as Pads) don't need - you never bless an internal buffer,
       shift things off the front of arrays (requiring an extra pointer for
       the efficiency hack for that), or need to store anything other than
       byte strings in your "scalars"
   ii) During interpreter cloning, your IO system is not up and running.
       Result - warning diagnostics (hacked in, or enabled via -D) can crash
       the system.

A long term plan I had was to look at whether it can actually be *simplified*
by replacing its internal use of SV and probably AV. (Whilst keeping the
same public interfaces)

Nicholas Clark

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic