[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlrpc-user
Subject:    Re: Introducing a new feature to enable MTOM attachment streaming
From:       Andreas Veithen <andreas.veithen () gmail ! com>
Date:       2011-07-28 10:36:10
Message-ID: CADx4_uWMzpooXw0X9O8Vd-tL7tCRhz0gf9WcA_Jod6MAH-Fwsg () mail ! gmail ! com
[Download RAW message or body]

On Thu, Jul 28, 2011 at 09:31, Sadeep Jayasumana <gayansadeep@gmail.com> wrote:
> Hi,
>
> On Wed, Jul 27, 2011 at 5:39 PM, Andreas Veithen <andreas.veithen@gmail.com>
> wrote:
>>
>> On Wed, Jul 27, 2011 at 13:22, Sadeep Jayasumana <gayansadeep@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > On Wed, Jul 27, 2011 at 4:12 PM, Andreas Veithen
>> > <andreas.veithen@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Jul 27, 2011 at 04:45, Sadeep Jayasumana
>> >> <gayansadeep@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >> Whether streaming is possible or not depends on how the
>> >> >> MIME parts are accessed, but you always need to support buffering if
>> >> >> necessary.
>> >> > Yes, existing functionality will not be broken. In fact, building the
>> >> > Part
>> >> > in memory will be the default behavior. Streaming will kick in only
>> >> > if
>> >> > it is
>> >> > explicitly enabled.
>> >> >> The Attachments object is one of the first things that will be
>> >> >> created
>> >> >> when a new message is received. Therefore nobody will be able to set
>> >> >> a
>> >> >> property in the message context, i.e. the setting would necessarily
>> >> >> be
>> >> >> a global property in axis2.xml. This however is a problem if you
>> >> >> have
>> >> >> services with diverging requirements. The decision to buffer or not
>> >> >> the content of the MIME part can't be taken at this stage in the
>> >> >> processing. It can only be taken when the MIME part is actually
>> >> >> accessed (which includes serializing the message to forward it)
>> >> > I agree with your comment. This functionality could be provided by
>> >> > introducing a new method, setAttachmentStreaming(), to Attachments
>> >> > class
>> >> > or
>> >> > MessageContext class. However, I'm wondering whether it is
>> >> > an elegant way of
>> >> > doling this. Other ways of doing the same involves significant
>> >> > modifications
>> >> > to Axiom API.
>> >>
>> >> I don't think so. The desired way of doing this would be to have an
>> >> Axiom specific subclass of DataHandler (or a DataHandler backed by an
>> >> Axiom specific implementation of DataSource?) with a method such as:
>> >>
>> >> InputStream getInputStream(boolean preserve)
>> >
>> > Coming back to our original problem, we need SOAPMessageFormatter to
>> > stream
>> > attachments when attachment streaming is explicitly enabled (via a
>> > configuration parameter or a message context property). If we introduce
>> > such
>> > a new API method, calling it from the SOAPMessageFormatter would involve
>> > adding number of new methods to different classes.
>> > Thanks,
>> > Sadeep
>>
>> Nope. The code in SOAPMessageFormatter that is responsible for the
>> MTOM case is as follows:
>>
>>                OMElement element = msgCtxt.getEnvelope();
>>                if (preserve) {
>>                    element.serialize(out, format);
>>                } else {
>>                    element.serializeAndConsume(out, format);
>>                }
>>
>> One would simply extend the concept of "consume" to the MIME parts and
>> the magic would happen behind the scenes inside Axiom. The SwA case is
>> a bit different because Axis2 will manipulate the attachments
>> directly. However, I don't think that covering that case requires
>> introducing new methods in lots of different places.
>>
>
> serializeAndConsume() method ultimately retrieves data handlers to parts of
> the message. This results in Attachments#getDataHandler() method call. At
> this point Axiom has three options; building the part in memory (existing
> PartOnMemory implementation), building the part in a File (existing
> PartOnFile implementation) or streaming the attachment directly (introducing
> StreamingPart implementation). This decision is based on the state of the
> Attachments object that is set elsewhere. This is the existing behavior of
> the code. I introduced StreamingPart implementation without changing this
> behavior such that there is a minimal impact on the existing behavior and
> the API.
> However, as you suggest, if we are to organize things such that
> serializeAndConsume() always means attachment streaming, we will have to the
> refactor entire MIME message handling logic of Axiom while being careful not
> to break existing stuff or other components using Axiom. This is a
> significant effort as coming up with a perfect attachment streaming
> implementation that works in all the scenarios in not trivial as you have
> initially mentioned in this mail thread.
> But my point is, till the complete refactoring of MIME handling logic is
> done, this proposed solution adds some new feature to the existing code and
> enables Axis2 to work efficiently in a common use case: calling a web
> service with a large attachment. And this feature will come into play only
> if it is explicit enabled and hence will not break the existing stuff.

Currently, we do an Axiom release +/- every 4 months and the last
release happened this month. This gives us ample time to come up with
a complete solution for this requirement. If you go through the list
of open issues, you can see that there are several which are related
to MIME processing and some are not easily fixable with the current
code. Therefore, a refactoring of the MIME code is planned anyway. I
did some major refactorings in the past (such as the code responsible
to work around issues in particular StAX parsers, as well as the
XOP/MTOM processing code) and they didn't cause major issues in
downstream projects, so reworking the MIME parsing code doesn't scare
me.

> Thanks,
> Sadeep
>>
>> >>
>> >> Probably we would have to introduce a couple of new APIs to make this
>> >> work behind the scene, but it would not imply modification of existing
>> >> APIs. Existing application code would continue to work as usual
>> >> (except maybe in the case of a missing MIME part) and the decision to
>> >> stream or not is taken just in time.
>> >>
>> >> > Thanks,
>> >> > Sadeep
>> >> > On Tue, Jul 26, 2011 at 12:03 AM, Andreas Veithen
>> >> > <andreas.veithen@gmail.com> wrote:
>> >> >>
>> >> >> Some time ago I was thinking about this issue. It is highly non
>> >> >> trivial to solve. Consider the following variant of your example:
>> >> >>
>> >> >>      public void useBinaryData(DataHandler dh1, DataHandler dh2) {
>> >> >>          try {
>> >> >>              OutputStream out = new FileOutputStream(new
>> >> >> File("output1.zip"));
>> >> >>              dh1.writeTo(out);
>> >> >>              out = new FileOutputStream(new File("output2.zip"));
>> >> >>              dh2.writeTo(out);
>> >> >>          } catch (IOException e) {
>> >> >>              throw new RuntimeException(e);
>> >> >>          }
>> >> >>      }
>> >> >>
>> >> >> Taking into account that the client is not required to send the MIME
>> >> >> parts in any particular order and that the service is not required
>> >> >> to
>> >> >> consume them in any particular order, you will always end up with
>> >> >> scenarios where you have no other choice than to buffer some of the
>> >> >> MIME parts. Whether streaming is possible or not depends on how the
>> >> >> MIME parts are accessed, but you always need to support buffering if
>> >> >> necessary.
>> >> >>
>> >> >> What this example also shows is that you need to defer the calls to
>> >> >> Attachments#getDataHandler(String) until the very last moment. When
>> >> >> working with an Axiom tree, this is already the case: the call to
>> >> >> Attachments#getDataHandler(String) occurs when
>> >> >> OMText#getDataHandler()
>> >> >> is called (and not when the OMText node is built). However, when
>> >> >> using
>> >> >> a data binding, one would have to create some sort of DataHandler
>> >> >> proxy that defers the call to Attachments#getDataHandler(String).
>> >> >>
>> >> >> Andreas
>> >> >>
>> >> >> On Mon, Jul 25, 2011 at 09:21, Sadeep Jayasumana
>> >> >> <gayansadeep@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Devs,
>> >> >> > Here is the benefit of this feature from Axis2's perspective.
>> >> >> >
>> >> >> > Currently, when I use Axis2 to deploy a service class as follows,
>> >> >> > public class MTOMService {
>> >> >> >     public void useBinaryData(String username, DataHandler
>> >> >> > dataHandler)
>> >> >> > {
>> >> >> >         try {
>> >> >> >             System.out.println("Name : " + username); // line1
>> >> >> >             OutputStream out = new FileOutputStream(new
>> >> >> > File("output.zip"));
>> >> >> >             dataHandler.writeTo(out); // line2
>> >> >> >             System.out.println("Saving done!");
>> >> >> >         } catch (IOException e) {
>> >> >> >             throw new RuntimeException(e);
>> >> >> >         }
>> >> >> >     }
>> >> >> > }
>> >> >> > the entire attachment is loaded before useBinaryData() method is
>> >> >> > called.
>> >> >> > Therefore, when a large attachment is used execution of line 1
>> >> >> > will
>> >> >> > be significantly delayed.
>> >> >> > However, when the suggested feature is implemented, stream will be
>> >> >> > read
>> >> >> > only
>> >> >> > when it is absolutely needed (i.e., in line 2). Therefore,
>> >> >> > execution
>> >> >> > of
>> >> >> > line1 will happen right after receiving the client request.
>> >> >> > Enabling attachment streaming will be similar to enabling file
>> >> >> > caching
>> >> >> > of attachments [1], an axis2.xml parameter or MessageContext
>> >> >> > property
>> >> >> > could
>> >> >> > be used.
>> >> >> >
>> >> >> > [1] http://axis.apache.org/axis2/java/core/docs/mtom-guide.html#a41
>> >> >> > Thanks,
>> >> >> > Sadeep
>> >> >> >
>> >> >> > On Mon, Jul 25, 2011 at 11:46 AM, Sadeep Jayasumana
>> >> >> > <gayansadeep@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Devs,
>> >> >> >> In Apache Syanpse, we have a requirement to proxy an MTOM enabled
>> >> >> >> web
>> >> >> >> service with minimum overhead. Large files (even in GB range)
>> >> >> >> should
>> >> >> >> be
>> >> >> >> able
>> >> >> >> to go through Synapse without running it OOM.
>> >> >> >> To satisfy this requirement, Synapse should be able to forward an
>> >> >> >> incoming
>> >> >> >> SOAP message with MTOM attachments to the backend service without
>> >> >> >> building
>> >> >> >> the attachments. Synapse might read/modify the SOAP envelop but
>> >> >> >> not
>> >> >> >> the
>> >> >> >> attachments. Therefore, it should be possible to stream
>> >> >> >> attachments
>> >> >> >> directly
>> >> >> >> from the Synpase's client to the backend service.
>> >> >> >> However, in the current implementation of AXIOM and Axis2, MTOM
>> >> >> >> attachments are built (in memory or in a file) by
>> >> >> >> SOAPMessageFormatter.
>> >> >> >> This
>> >> >> >> caused Synapse to run OOM when in the above mentioned scenario.
>> >> >> >> I have come up with a fix for this. It is to introduce a
>> >> >> >> new org.apache.axiom.attachments.impl.AbstractPart implementation
>> >> >> >> which
>> >> >> >> streams non-soap MIME parts without building them.
>> >> >> >> To introduce this new feature without breaking existing stuff,
>> >> >> >> I'm
>> >> >> >> planning to introduce a new message context property which
>> >> >> >> enables
>> >> >> >> MTOM
>> >> >> >> streaming. org.apache.axis2.builder.BuilderUtils class will check
>> >> >> >> this
>> >> >> >> property in the message context and
>> >> >> >> create org.apache.axiom.attachments.Attachments
>> >> >> >> object accordingly. Does
>> >> >> >> this sound like the correct way of introducing this feature?
>> >> >> >> Appreciate your feedback.
>> >> >> >> Thanks,
>> >> >> >> --
>> >> >> >> Sadeep Jayasumana
>> >> >> >> Software Engineer,
>> >> >> >> WSO2 Inc.
>> >> >> >
>> >> >> > --
>> >> >> > Sadeep Jayasumana
>> >> >> > Software Engineer,
>> >> >> > WSO2 Inc.
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@axis.apache.org
>> >> >> For additional commands, e-mail: java-dev-help@axis.apache.org
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sadeep
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@axis.apache.org
>> >> For additional commands, e-mail: java-dev-help@axis.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@axis.apache.org
>> For additional commands, e-mail: java-dev-help@axis.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ws.apache.org
For additional commands, e-mail: dev-help@ws.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic