[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux1394-devel
Subject:    Re: [PATCH v2] firewire: Enable physical DMA above 4GB
From:       Peter Hurley <peter () hurleysoftware ! com>
Date:       2013-04-30 10:50:44
Message-ID: 1367319044.3494.62.camel () thor ! lan
[Download RAW message or body]

On Mon, 2013-04-29 at 22:17 +0200, Stefan Richter wrote:
> On Apr 29 Peter Hurley wrote:
> > On Sun, 2013-04-28 at 19:43 +0200, Stefan Richter wrote:
> > > On Mar 27 Peter Hurley wrote:
> > > > Quadlet reads to memory above 4GB is painfully slow when serviced
> > > > by the AR DMA context. In addition, the CPU(s) may be locked-up,
> > > > preventing any transfer at all.
> [...]
> > > Question to the 1st paragraph of your changelog:  Aren't both reasons that
> > > you give purely theoretical at the moment?
> > 
> > Definitely not theoretical. I have a user-space tool that makes a core
> > image over Firewire. Orders-of-magnitude slower without this patch.
> [...]
> > > Is there any other actual effect of this patch?
> > 
> > I don't understand this question. The patch does exactly what it
> > purports to do; namely, enable physical DMA above 4GB.
> 
> We have two applications of physical DMA on Linux at the moment:
> 
>   - The SBP-2 initiator.  This one does not benefit from a higher Physical
>     Upper Bound, because all the buffers (SCSI I/O buffers, ORB buffers,
>     s/g tables) are mapped into a 4 GB range anyway.
>     Correct me if I'm mistaken.
> 
>   - CONFIG_PROVIDE_OHCI1394_DMA_INIT and CONFIG_FIREWIRE_OHCI_REMOTE_DMA.
>     The former of these could be made to benefit from raised Physical
>     Upper Bound like the latter, although I am not sure whether there is
>     interesting stuff at high addresses during boot.  The latter benefits
>     from your patch, but _not_ as a matter of better throughput, but rather
>     as a matter of doesn't work -> works.
> 
> Are you possibly alluding to a third application which is similar to
> CONFIG_FIREWIRE_OHCI_REMOTE_DMA but is able to fall back to AR-req +
> AT-resp DMA when getting requests above Physical Upper Bound?  If such an
> application exists, then let's say so in the changelog because this would
> be news not only to me, and it would clarify why you are talking about
> performance rather than enablement.

Ah, I see now why you're asking.

Yes, I had a simple but fragile patchset that handled AR read requests
to phys memory above 4GB. I doubt I'll ever submit it because it's slow,
delicate and of limited utility.

I'll remove the reference to that in changelog for the v3 patch.

> > > Questions to the 2nd paragraph:  dma_get_required_mask(dev)+1 is not
> > > exactly end-of-memory; it is the smallest power-of-two which is greater or
> > > equal to end-of-memory.  For example, on a PC equipped with 2 GB RAMand
> > > running a x86-32 kernel I get 0x8000'0000 (2 G), but on a PC with 16 GB
> > > RAM and x86-64 kernel I get 0x8'0000'0000 (32 G).
> > 
> > Yes, that's true. In fact, on some arches, dma_get_required_mask()
> > simply returns a mask of all addressable virtual memory.
> > 
> > >   So, shouldn't the
> > > changelog say perhaps "Write the PhyUpperBound register to cover all
> > > available memory (up to 128 TB of it)."?
> > 
> > Ok.
> > 
> > > But why are you reading
> > > dma_get_required_mask in the first place rather than just setting it to
> > > 128 TB straight away?
> > 
> > If you feel it's superfluous now, that's fine. In the future, I plan to
> > make a pitch/patches for the dma engine to expose the actual end of
> > physical memory.
> > 
> > If you'd prefer, I can re-submit the equivalent change in that patchset
> > instead. However, it's easier to convince arch maintainers of necessary
> > changes if drivers already exhibit the required use case.
> 
> At least as far as the OHCI-1394 spec says, physUpperBoundOffset (if
> implemented) can be set higher than we ever even want to.  So the only
> motivation for downwards optimizing the value which we write there is to
> reduce possible conflict with AR based applications, isn't it?  If so, and
> given firewire-core's current address handler design, then this downwards
> optimization needs to happen in fw_core_init actually.
> 
> Until these things are worked out, I do prefer that you omit the call to
> dma_get_required_mask in firewire-ohci.

Ok.

Regards,
Peter Hurley



------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
mailing list linux1394-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux1394-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic