[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-2d-dev
Subject:    Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased text on translucent backgroun
From:       Philip Race <philip.race () oracle ! com>
Date:       2016-05-05 22:59:43
Message-ID: 572BD05F.5040300 () oracle ! com
[Download RAW message or body]

Yes, this is ready to push.

-phil.

On 5/3/16, 12:23 AM, Jayathirth D V wrote:
> Hi Prahalad,
> 
> Changes are working fine.+1.
> In test case please make jtreg comments as multiline comments and move them below \
> copyright and above import statements for readability before pushing. 
> Thanks,
> Jay
> -----Original Message-----
> From: Jim Graham
> Sent: Tuesday, May 03, 2016 1:42 AM
> To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
> Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased \
> text on translucent backgrounds gets bright artifacts 
> Thanks Prahalad,
> 
> Looks great - +1
> 
> 			...jim
> 
> On 5/2/2016 8:29 AM, Prahalad Kumar Narayanan wrote:
> > Dear Jim&  Everyone on Java2D Community
> > 
> > First, thanks to Jim for every word in the feedback.
> > 
> > Java is used by millions of devices and users. We cannot compromise on quality.
> > It 's through these reviews and iterations that we ensure a consistent solution \
> > gets checked in. 
> > I 've incorporated the suggestions from last review and changes are available in \
> > webrev. Review Link: http://cr.openjdk.java.net/~jdv/Prahalad/8015070/webrev.06/
> > 
> > Quick Brief on Changes :
> > 1.  The macros- Declare ## DST ## SrcOverDstBlendFactor and Store ## DST ## \
> > SrcOverDstBlendFactor are removed. 2.  Their usage is now replaced with
> > DeclareAlphaVarFor4ByteArgb : For declaring the blend factor
> > SrcOver ## DST ## BlendFactor(dstF, dstA) : To return the appropriate blend \
> > factor between dstF and dstA 3.  Most of the macro names contain image TYPE in \
> > the earlier part and ForSTRATEGY at the end of the name. (Note: This is just an \
> > observation not a Thumb rule) Hence SrcOver ## DST ## BlendFactor macro name is \
> > being used. 4.  As with every change in the native code, Internal automated build \
> > system was used to ensure successful compilation across all platforms. 
> > Kindly review the changes and provide your opinion.
> > 
> > Thank you
> > Have a good day
> > 
> > Prahalad N.
> > 
> > 
> > -----Original Message-----
> > From: Jim Graham
> > Sent: Thursday, April 28, 2016 2:05 AM
> > To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
> > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased \
> > text on translucent backgrounds gets bright artifacts 
> > Thanks Prahalad,
> > 
> > First, the macro design issues for all of these LoopMacros.h et al files
> > are pretty complex, so I apologize if this is a big learning step here
> > and if the feedback can look like nit-picking at times.  But the system
> > is complicated enough that we should take care to make sure the new work
> > remains consistent to what is already there and to keep this complicated
> > system maintainable in the long run.
> > 
> > The only issue I have with the new macros is that Store*() naming
> > convention is typically used when the macro itself does the assignment.
> > In the case of the new Store*BlendFactor() macros, they simply return
> > the value and the caller does the assignment, so the name is off.  This
> > could be fixed either by moving "blendF" to an argument to the macro, or
> > renaming it to one of the suggestions I gave earlier that don't imply
> > assignment inside the macro.
> > 
> > Another thing to consider is that the type of the blend factor is
> > actually determined by the alpha blend token "STRATEGY", not by the
> > image type.  Many of the alpha blend types are dependent on the image
> > types being used, but it is not solely determined by the image type,
> > sometimes it depends on the pair of src/dst image types.  In any case,
> > the STRATEGY used for alpha blending is not directly tied to an image
> > type.  Other alpha loop macros take a STRATEGY as an argument and invoke
> > the proper alpha blending value declaration and manipulation macros with
> > "For ## Strategy" macro name expansions.  But we are hard-coding
> > "For4ByteArgb" in this particular macro, which means we are hard-coding
> > the particular type of alpha math.  Since that determination is done by
> > hard-coding in this macro and not by delegating to the image type, then
> > it is inappropriate to ask the image-type-based macro to decide how to
> > declare the blend factor.  If anything, we should use
> > "DeclareAlphaVarFor4ByteArgb()" to declare the variable, not an
> > image-based macro.  Note that the Declare() macros for the alpha math
> > strategies take an argument called "VAR" so then you can assume that
> > they've named the variable the same token that you supplied.
> > 
> > My recommendation would be to delete the new Declare macros and replace
> > their use in LoopMacros.h with a usage of the existing
> > DeclareAlphaVarFor4ByteArgb() macro, and then I would move the resulting
> > variable into the argument list for the Store ## BlendFactor macro and
> > move the assignment part of the statement inside of the macro...
> > 
> > 			...jim
> > 
> > On 4/27/16 6:37 AM, Prahalad Kumar Narayanan wrote:
> > > Dear Jim and Everyone on Java2D Community
> > > 
> > > Good day to you.
> > > 
> > > First, Thanks to Jim for his feedback.
> > > I 've incorporated his feedback in latest version of code and webrev link is \
> > > shared below Webrev Link: \
> > > http://cr.openjdk.java.net/~aghaisas/prahalad/8015070/webrev.05/ 
> > > Quick Description on Changes :
> > > 1. Removing redundant code MultiplyAndStore4ByteArgbComps(res, resA, \
> > > SRC_PREFIX) The redundant code within if and else blocks of if ( mixValSrc != \
> > > 0xff ) has been moved into block that does color blending. As Jim rightly said, \
> > > the change would execute Multiply operation only if blending is required On the \
> > > other hand, If resA reaches maximum value, the fast path is executed. 
> > > 2. Fast path execution
> > > The fast path that directly sets foreground color has been moved into else \
> > > block after checking if ( resA != MaxValFor4ByteArgb ) Conceptually, resA would \
> > > be maximum only when glyphAlpha (mixValSrc) and srcAlpha are maximum. 
> > > 3. Isolated Declare and Store macros.
> > > A single macro DeclareAndInit... has been split into two macros Declare ## DST \
> > > ## SrcOverDstBlendFactor and Store ## DST ## SrcOverDstBlendFactor. This would \
> > > indeed address some compilers that do not allow declaration in the middle of \
> > > the block. 
> > > 4. Other Relevant Information:
> > > The changes have been verified to build on all platforms through JPRT auto \
> > > build system. Java2D benchmarks were run with the changes and no degradation on \
> > > performance was seen compared to webrev.04 Regression was run and no new \
> > > regression failures have been introduced with the change. 
> > > 5. To reduce reviewer 's effort in review, code has been expanded with comments \
> > > herewith. 
> > > #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
> > > FG_PIXEL, PREFIX, SRC_PREFIX) \
> > > do { \
> > > DeclareAlphaVarFor4ByteArgb(resA) \
> > > DeclareCompVarsFor4ByteArgb(res) \
> > > jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
> > > if (mixValSrc) { \
> > > /* Evaluating srcAlpha component */ \
> > > if (mixValSrc != 0xff) { \
> > > /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
> > > PromoteByteAlphaFor4ByteArgb(mixValSrc); \
> > > /* Glyph mask determines visibility of the srcColor */ \
> > > resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
> > > } else { \
> > > /* This is understood easier with floating point logic. */ \
> > > /* 1.0f is max value used in floating point calculation */ \
> > > /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
> > > /* srcA without Multiply with glyph mask. */ \
> > > resA = SRC_PREFIX ## A; \
> > > } \
> > > /* Blending srcColor and dstColor */ \
> > > /* This is required only when resA(i.e., srcA) is not maximum */ \
> > > if (resA != MaxValFor4ByteArgb) { \
> > > DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
> > > DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
> > > DeclareCompVarsFor4ByteArgb(dst) \
> > > DeclareCompVarsFor4ByteArgb(tmp) \
> > > /* Redundant statement moved from previous if -else block */ \
> > > MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
> > > /* Based on the pixelFormat we could reduce calculations */ \
> > > /* done to load the color and alpha components */ \
> > > if (!(DST ## IsPremultiplied)) { \
> > > Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
> > > dstA, dstR, dstG, dstB); \
> > > Store4ByteArgbCompsUsingOp(tmp, =, dst); \
> > > } else { \
> > > Declare ## DST ## AlphaLoadData(DstPix) \
> > > jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
> > > DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
> > > pixelOffset); \
> > > /* The macro's implementation loads color components  */ \
> > > /* without divide by alpha adjustment as required for */ \
> > > /* subsequent calculations. Note: This is used only   */ \
> > > /* with preMultiplied alpha based pixel formats */ \
> > > LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
> > > DstPix, \
> > > dst); \
> > > Postload4ByteArgbFrom ## DST(pixelAddress, \
> > > DstPix, \
> > > tmp); \
> > > } \
> > > /* Avoid blending operatons if dst is fully transparent */ \
> > > if (dstA) { \
> > > Declare ## DST ## SrcOverDstBlendFactor(blendF) \
> > > /* dstA would be 0 if either of the following is true.  */ \
> > > /* 1. srcA is max. Parent if condition validates this.  */ \
> > > /* 2. dstA is zero. The current if condition validates  */ \
> > > /* Henceforth, the following Multiply need not be moved */ \
> > > /* ahead of the if condition. This also helps to better */ \
> > > /* performance */ \
> > > dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
> > > resA += dstA; \
> > > /* Declares blendF variable and assigns appropriate */ \
> > > /* alpha value. The definitions are contained in the*/ \
> > > /* header files of respective pixel formats */ \
> > > blendF = Store ## DST ## SrcOverDstBlendFactor(dstF, \
> > > dstA); \
> > > /* This is understood easier with floating point logic.*/ \
> > > /* 1.0f is max value used in floating point calculation*/ \
> > > /* (1.0f * tmp) gives tmp. Hence we avoid Multiply     */ \
> > > /* operation and directly add loaded color to result*/ \
> > > if (blendF != MaxValFor4ByteArgb) { \
> > > MultiplyAndStore4ByteArgbComps(tmp, \
> > > blendF, \
> > > tmp); \
> > > } \
> > > Store4ByteArgbCompsUsingOp(res, +=, tmp); \
> > > } \
> > > } else { \
> > > /* resA is maximum only when srcA and glyphAlpha are max   */ \
> > > /* Hence the fast path to store foreground pixel color and */ \
> > > /* break to the next pixel. */ \
> > > Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
> > > FG_PIXEL, PREFIX); \
> > > break; \
> > > } \
> > > /* In the above calculations, color values are multiplied with */ \
> > > /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
> > > /* based pixel formats. For non pre-multiplied alpha based     */ \
> > > /* pixel formats, the alpha is removed from color components   */ \
> > > /* and then stored to the resulting color */ \
> > > if (!(DST ## IsOpaque)&&  \
> > > !(DST ## IsPremultiplied)&&  resA&&  \
> > > resA<  MaxValFor4ByteArgb) \
> > > { \
> > > DivideAndStore4ByteArgbComps(res, res, resA); \
> > > } \
> > > Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
> > > PIXEL_INDEX, res); \
> > > } \
> > > } while (0);
> > > 
> > > Kindly review the changes present in the webrev and provide your views.
> > > 
> > > Thank you once again for your effort in review
> > > Have a great day
> > > 
> > > Prahalad N.
> > > 
> > > 
> > > -----Original Message-----
> > > From: Jim Graham
> > > Sent: Wednesday, April 27, 2016 2:12 AM
> > > To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
> > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070: Antialiased \
> > > text on translucent backgrounds gets bright artifacts 
> > > Hi Prahalad,
> > > 
> > > One potential portability issue - you have a "DeclareAndInit" macro for the new \
> > > "choose which blend factor" macro in the middle of a block. Some C compilers \
> > > allow declaring a new variable in the middle of a block, but not all.  Since it \
> > > would be hard to move the declaration to the top of the block, since it depends \
> > > on a value computed in the first couple of lines, it might be better to change \
> > > it from a "DeclareAndInit" style macro into a regular declaration, and a macro \
> > > to get the value, so "jint blendF;" at the top of the block and "blendF = \
> > > SrcOverDstBlendFactorFor ## DST(...);" in the middle of the block.  (Or name it \
> > > "SrcOver ## DST ## BlendFactor(...)?" 
> > > I just noticed something about this .04 version of the function.  At the top \
> > > you have a test for mixValSrc != 255 and then both clauses of the if statement \
> > > end with essentially the same code, multiplying the source by the value in \
> > > resA.  (The version in the else clause uses SRC_PREFIX ## A, but it could just \
> > > as easily also use resA since they are the same value at that point.) 
> > > This means you could potentially move both "MultiplyAndStore" macros until \
> > > after the if and use resA as the multiplier.  Now, if you look, the immediate \
> > > next statement tests if resA is maxVal.  If it is maxVal, then you don't need \
> > > to do that multiply at all, so the macro could be moved even further to be \
> > > inside the "if (resA != maxVal)" block. 
> > > At that point, you could then reinstate the "else" on that if block and move \
> > > the Store##DST##PixelData into that else, to save you another test of resA.  \
> > > This would essentially look like this: 
> > > if (mixValSrc != 255) {
> > > PromoteByteAlpha...;
> > > resA = MultiplyAlpha...;
> > > } else {
> > > resA = SRC_PREFIX ## A;
> > > }
> > > if (resA != MaxVal) {
> > > Declare,declare,declare,declare...;
> > > Multiply...Comps(res, resA, SRC_PREFIX); } else {
> > > Store ## DST ## PixelData...;
> > > }
> > > 
> > > It shortens the code a little, but I'm not sure if it would really help \
> > > performance other than not having to do the test for maxVal twice. Still, tests \
> > > are fairly expensive in code like this so it could be worth a shot at testing, \
> > > and it would simplify the code a bit in either case... 
> > > 			...jim
> > > 
> > > On 4/15/16 12:25 AM, Prahalad Kumar Narayanan wrote:
> > > > Hello Jim&  Everyone on Java2D Community
> > > > 
> > > > Good day to you.
> > > > 
> > > > This is a review request and a follow-up to the bug-fix for the issue
> > > > Bug     : JDK-8015070
> > > > Link    : https://bugs.openjdk.java.net/browse/JDK-8015070
> > > > 
> > > > Webrev Link :
> > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.04/
> > > > 
> > > > Quick Inferences on Changes in Current -Webrev.04
> > > > 
> > > > 1 ) Subtle changes to the blend loop-
> > > > . The subtle changes taken up, have helped to improve the performance.
> > > > . With the current logic in webrev.04, Java2DBench reports better performance \
> > > >                 than Un-Optimized solution that was present in webrev.00
> > > > . J2DBench was run for Font Styles { Plain, Bold, Italic, Bold n Italic } and \
> > > >                 Font Sizes { 12, 20, 36, 72 }
> > > > . My sincere thanks to Jim for all his detailed feedback through the multiple \
> > > > reviews that has evolved the solution today. 
> > > > (Details on changes)
> > > > 1.a.  Loading of Color components
> > > > . When modelled as per SRCOVER_MASK_FILL code, the logic required few \
> > > >                 additional calculations to load color components.
> > > > . The extra calculations indeed impacted performance figures.
> > > > . This could be offset in  two possible ways
> > > > a. Inspect parent macro- NAME_SOLID_DRAWGLYPHLISTAA and advance by pixel \
> > > > address and not by pixel index. The parent macro invokes \
> > > > GlyphListAABlendFourByteArgb through this macro- GlyphListAABlend ## \
> > > > STRATEGY(DST, pixels, x, pPix, fgpixel, solidpix, src); Changing parent macro \
> > > > will cause spurious changes across GlyphListAABlend ## other pixel formats. \
> > > > There is additional risk of breaking the stable and well optimized  \
> > > > LoopMacros.h. b. Load color components based on pre-Multiplication status
> > > > This has been taken up and change is limited to the function being modified.
> > > > Thankfully J2DBench has still reported improvement in performance.
> > > > 
> > > > 1.b.  New Macro to avoid if (DST ## IsPremultiplied) {
> > > > . A new macro- DeclareAndInit ## DST ## SrcOverMaskBlendFactor has been \
> > > >                 introduced to choose between dstF, or dstA
> > > > . The implementation is available in the header files of pixel formats ( Eg: \
> > > >                 IntArgb.h IntArgbPre.h )
> > > > . There are 29 different pixel formats known to Java2D, and
> > > > . Hence, the new macro's implementation is added only to those pixel formats \
> > > > that require the current glyph blending logic. 
> > > > 2 ) Testing across different formats
> > > > . The Regression test code has been modified to test anti-aliased text \
> > > >                 rendering on 7 different pixel formats-
> > > > . IntArgb, IntArgb_Pre, FourByteAbgr, FourByteAbgr_Pre, IntRGB, IntBGR, \
> > > >                 3ByteBGR.
> > > > . As expected, the test fails without the fix on JDK 8 and JDK 7 versions&  \
> > > > passes with JDK 9-internal containing the fix. 
> > > > 3 ) Explanation on Code Changes :
> > > > . With multiple reviews and changes, today the code fixes the bug and is well \
> > > >                 optimized as well.
> > > > . For ease of reviewer and effort in review, I 've explained the logic with \
> > > >                 /* comment statements */ herewith.
> > > > . Note: These comments don't figure in the webrev.
> > > > As one cannot guarantee how /* comments */ within macros would be perceived \
> > > > by compiler across different platforms. 
> > > > #define GlyphListAABlend4ByteArgb(DST, GLYPH_PIXELS, PIXEL_INDEX, DST_PTR, \
> > > > FG_PIXEL, PREFIX, SRC_PREFIX) \
> > > > do { \
> > > > DeclareAlphaVarFor4ByteArgb(resA) \
> > > > DeclareCompVarsFor4ByteArgb(res) \
> > > > jint mixValSrc = GLYPH_PIXELS[PIXEL_INDEX]; \
> > > > if (mixValSrc) { \
> > > > /* Evaluating srcColor components */ \
> > > > if (mixValSrc != 0xff) { \
> > > > /* No-op. Retained to match reference as SRCOVER_MASK_FILL */ \
> > > > PromoteByteAlphaFor4ByteArgb(mixValSrc); \
> > > > /* Glyph mask determines visibility of the srcColor */ \
> > > > resA = MultiplyAlphaFor4ByteArgb(mixValSrc, SRC_PREFIX ## A); \
> > > > MultiplyAndStore4ByteArgbComps(res, resA, SRC_PREFIX); \
> > > > } else { \
> > > > /* If mixValSrc and srcA are maximum, then result color is */ \
> > > > /* opaque. Hence the fast path to store foreground pixel   */ \
> > > > /* color and return. */ \
> > > > if (SRC_PREFIX ## A == MaxValFor4ByteArgb) { \
> > > > Store ## DST ## PixelData(DST_PTR, PIXEL_INDEX, \
> > > > FG_PIXEL, PREFIX); \
> > > > break; \
> > > > } \
> > > > /* This is understood easier with floating point logic. */ \
> > > > /* 1.0f is max value used in floating point calculation */ \
> > > > /* (1.0f * srcA) gives srcA. Hence we directly consider */ \
> > > > /* srcA without Multiply with glyph mask. */ \
> > > > resA = SRC_PREFIX ## A; \
> > > > MultiplyAndStore4ByteArgbComps(res, \
> > > > SRC_PREFIX ## A, \
> > > > SRC_PREFIX); \
> > > > } \
> > > > /* Evaluating dstColor components */ \
> > > > /* This is required only when resA(i.e., srcA) is not maximum */ \
> > > > if (resA != MaxValFor4ByteArgb) { \
> > > > DeclareAndInvertAlphaVarFor4ByteArgb(dstF, resA) \
> > > > DeclareAndClearAlphaVarFor4ByteArgb(dstA) \
> > > > DeclareCompVarsFor4ByteArgb(dst) \
> > > > DeclareCompVarsFor4ByteArgb(tmp) \
> > > > /* Based on the pixelFormat we could reduce calculations */ \
> > > > /* done to load the color and alpha components */ \
> > > > if (!(DST ## IsPremultiplied)) { \
> > > > Load ## DST ## To4ByteArgb(DST_PTR, pix, PIXEL_INDEX, \
> > > > dstA, dstR, dstG, dstB); \
> > > > Store4ByteArgbCompsUsingOp(tmp, =, dst); \
> > > > } else { \
> > > > Declare ## DST ## AlphaLoadData(DstPix) \
> > > > jint pixelOffset = PIXEL_INDEX * (DST ## PixelStride); \
> > > > DST ## DataType *pixelAddress = PtrAddBytes(DST_PTR, \
> > > > pixelOffset); \
> > > > /* The macro's implementation loads color components  */ \
> > > > /* without divide by alpha adjustment as required for */ \
> > > > /* subsequent calculations. Note: This is used only   */ \
> > > > /* with preMultiplied alpha based pixel formats */ \
> > > > LoadAlphaFrom ## DST ## For4ByteArgb(pixelAddress, \
> > > > DstPix, \
> > > > dst); \
> > > > Postload4ByteArgbFrom ## DST(pixelAddress, \
> > > > DstPix, \
> > > > tmp); \
> > > > } \
> > > > /* Avoid blending operatons if dst is fully transparent */ \
> > > > if (dstA) { \
> > > > /* dstA would be 0 if either of the following is true. */ \
> > > > /* 1. srcA is max. Parent if condition validates this. */ \
> > > > /* 2. dstA is zero. The current if condition validates */ \
> > > > /* Henceforth, the following Multiply need not be moved*/ \
> > > > /* ahead of the if condition. This also helps to better*/ \
> > > > /* performance */ \
> > > > dstA = MultiplyAlphaFor4ByteArgb(dstF, dstA); \
> > > > resA += dstA; \
> > > > /* Declares blendF variable and assigns appropriate */ \
> > > > /* alpha value. The definitions are contained in the*/ \
> > > > /* header files of respective pixel formats */ \
> > > > DeclareAndInit ## DST ## SrcOverDstBlendFactor(dstF, \
> > > > dstA, \
> > > > blendF); \
> > > > /* This is understood easier with floating point logic.*/ \
> > > > /* 1.0f is max value used in floating point calculation*/ \
> > > > /* (1.0f * tmp) gives tmp. Hence we avoid 3 Multiply   */ \
> > > > /* operations and add the loaded color to result       */ \
> > > > if (blendF != MaxValFor4ByteArgb) { \
> > > > MultiplyAndStore4ByteArgbComps(tmp, \
> > > > blendF, \
> > > > tmp); \
> > > > } \
> > > > Store4ByteArgbCompsUsingOp(res, +=, tmp); \
> > > > } \
> > > > } \
> > > > /* In the above calculations, color values are multiplied with */ \
> > > > /* with alpha. This is perfectly fine for pre-Multiplied alpha */ \
> > > > /* based pixel formats. For non pre-multiplied alpha based     */ \
> > > > /* pixel formats, the alpha is removed from color components   */ \
> > > > /* and then stored to the resulting color.  */ \
> > > > if (!(DST ## IsOpaque)&&  \
> > > > !(DST ## IsPremultiplied)&&  resA&&  \
> > > > resA<  MaxValFor4ByteArgb) \
> > > > { \
> > > > DivideAndStore4ByteArgbComps(res, res, resA); \
> > > > } \
> > > > Store ## DST ## From4ByteArgbComps(DST_PTR, pix, \
> > > > PIXEL_INDEX, res); \
> > > > } \
> > > > } while (0);
> > > > 
> > > > My apologies if the above code did not appear on the final webrev email.
> > > > ( In few instances, the newlines don't appear in plain-text format )
> > > > 
> > > > Kindly review the changes present in webrev and provide your views.
> > > > If the changes are good, we could take up for the code check-in.
> > > > 
> > > > Thank you for your time in review
> > > > Have a good day
> > > > 
> > > > Prahalad N.
> > > > 
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: Jim Graham
> > > > Sent: Tuesday, April 05, 2016 3:07 AM
> > > > To: Prahalad Kumar Narayanan; Sergey Bylokhov; Philip Race
> > > > Cc: Praveen Srivastava
> > > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > 
> > > > Hi Prahalad,
> > > > 
> > > > Can I see the webrev diffs for the "today's experiment" code with the new \
> > > > macro? 
> > > > Also, Did you test this with opaque destinations?  The most common use
> > > > of text is on an opaque destination, so that would matter more.  I
> > > > would
> > > > suggest: INT_RGB, THREE_BYTE_BGR, INT_ARGB, INT_ARGB_PRE in that order of \
> > > > precedence of importance.  Also, test with translucent colors, though those \
> > > > are less important than opaque colors. 
> > > > I'm still looking at why the non-pre would be slower than the pre.
> > > > About the only difference is the one line "if (!PRE) { DSTF = DSTA; }".
> > > > One suggestion might be to move the test for transparent destination up a \
> > > > couple of lines, and to get rid of the extra "DSTF = dstA" assignement with \
> > > > the following structure: 
> > > > 	dstA = Mult...();
> > > > 	if (dstA) {
> > > > 	    resA += dstA;
> > > > 	    Declare...
> > > > 	    Postload...
> > > > 	    if (DST ## IsPremultiplied) {
> > > > 	        MultiplyAndStore(..., DSTF, ...);
> > > > 	    } else {
> > > > 	        MultiplyAndStore(..., dstA, ...);
> > > > 	    }
> > > > 	    Store...
> > > > 	}
> > > > 
> > > > Basically, dstA == 0 is the actual test for "do we need to even try to blend \
> > > > the destination in here".  If it is zero then there is no need to add dstA to \
> > > > resA and there is no need to adjust the factor we blend by \
> > > > (MultiplyAndStore).  It can be triggered by either a transparent destination \
> > > > pixel or an opaque source pixel, but either way, dstA is the right test, not \
> > > > DSTF.  The second part, eliminating the DSTF=dstA assignment, gets rid of a \
> > > > line that might trip up the optimizer by simply having the macro expand \
> > > > differently for the two types.  To be even more streamlined, we could create \
> > > > a new set of macros: 
> > > > // In the header files for PRE types:
> > > > #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dF)
> > > > // In the header files for non-PRE types:
> > > > #define SRCOVER_DST_BLEND_FACTOR_<TYPE>(dF, dA)   (dA)
> > > > 
> > > > Then we wouldn't need the test above for "if (DST ## Pre)", we would just \
> > > > expand the macro with:  MultiplyAndStore(..., SRCOVER_DST_BLEND_FACTOR ## \
> > > > DST(DSTF, dstA), ...) which would hopefully confuse the optimizer even less. 
> > > > If you want to really eliminate the pixel address computations, you could \
> > > > rewrite the calling loop so that it steps along a pixel pointer rather than \
> > > > using indexing.  Have the calling function/macro set up a pRas pointer to the \
> > > > pixel and step that along the row by TYPE##PixelStride as it iterates across \
> > > > the glyph, then hand that into the Glyph blend macro as DST_PTR (and \
> > > > eliminate PIXEL_INDEX as it would be "0" in this case)... 
> > > > 			...jim
> > > > 
> > > > On 4/4/16 4:37 AM, Prahalad Kumar Narayanan wrote:
> > > > > Dear Jim
> > > > > 
> > > > > Good day to you.
> > > > > 
> > > > > ( Just in-case, you had missed my long Friday 's email )
> > > > > 
> > > > > Quick Recap of Proceedings
> > > > > 
> > > > > 1.On Friday, I had profiled two solutions that fix the bug-
> > > > > JDK-8015070, using Java2D Bench
> > > > > 
> > > > > 2.Profiling was done for 16 test cases with different combinations of
> > > > > 
> > > > > a.Font Style: Plain, Bold, Italic, Bold-Italic
> > > > > 
> > > > > b.Font Size: 12, 20, 36, 72
> > > > > 
> > > > > 3.Result from Friday 's profiling experiments:
> > > > > 
> > > > > a.Regular Solution (Un-optimized) : This was observed to be faster
> > > > > for IntArgb pixel format
> > > > > 
> > > > > b.Optimized Solution (based on SrcOver_MaskFill with fast path) :
> > > > > This was observed to be faster for IntArgb_Pre pixel format
> > > > > 
> > > > > Update from Today's Experiments
> > > > > 
> > > > > 1.First, I understood that new calculations introduced (pixelAddress
> > > > > computation) impacted performance of optimized solution in IntArgb format.
> > > > > 
> > > > > 2.Henceforth, I made the following changes, while loading destination \
> > > > > color: 
> > > > > a.Check if the pixel format is PreMultiplied
> > > > > 
> > > > > b.If the format is preMultiplied, then>  take up new calculations and
> > > > > use LoadAlphaFrom ## DST ## For4ByteArgb macro that does *Not* cause
> > > > > divide by alpha adjustment
> > > > > 
> > > > > c.If the format is regular Argb, then>  take up loading of colors
> > > > > using standard Load ## DST ## To4ByteArgb
> > > > > 
> > > > > 3.Once the release build was available, Java2D Bench was re-run
> > > > > (using pre-saved options file)
> > > > > 
> > > > > Result from Bench metrics:
> > > > > 
> > > > > a.In most of the test cases, the optimized solution has higher metric :
> > > > > Avg characters/ second for both IntArgb and IntArgb_Pre formats
> > > > > 
> > > > > b.In 6 / total-16 test cases, optimized solution was marginally lower
> > > > > than the metrics of Regular un-optimized algorithm (only for
> > > > > IntArgb_Pre)
> > > > > 
> > > > > c.However, J2DAnalyzer reported that even these 6-test cases were
> > > > > within 10% deviation. Hence the algorithms were categorized to be
> > > > > 'same' in performance.
> > > > > 
> > > > > Suggestion Required
> > > > > 
> > > > > 1.The attached zip file, contains Algorithms.cpp - Which lists down
> > > > > different algorithms profiled so far.
> > > > > 
> > > > > 2.I 've introduced comments within the macro to explain the change.
> > > > > 
> > > > > a.Note: These comments will be removed from the final source code to
> > > > > be checked in.
> > > > > 
> > > > > 3.Kindly review the latest algorithm (for any issues/ bugs) and
> > > > > provide your suggestions.
> > > > > 
> > > > > a.latest algorithm can be easily traced by searching for
> > > > > "LoadOptimized Algorithm v3" within the file.
> > > > > 
> > > > > Thank you for your time in review&  detailed feedback that you get
> > > > > every time.
> > > > > 
> > > > > Every such review improves the quality of code&  the solution
> > > > > 
> > > > > Prahalad N.
> > > > > 
> > > > > *From:* Prahalad Kumar Narayanan
> > > > > *Sent:* Friday, April 01, 2016 5:07 PM
> > > > > *To:* Jim Graham; Sergey Bylokhov; Philip Race
> > > > > *Cc:* Praveen Srivastava
> > > > > *Subject:* RE: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > > 
> > > > > Dear Jim
> > > > > 
> > > > > Good day to you.
> > > > > 
> > > > > Thanks for your suggestions in the reviews that has evolved the fix
> > > > > to the bug.
> > > > > 
> > > > > As guided by you, I measured the performance of Text Rendering with
> > > > > Text Antialiasing Hint on using Java2D Bench Tool.
> > > > > 
> > > > > Solutions Profiled:
> > > > > 
> > > > > We have two solutions - Un optimized solution and Optimized
> > > > > solution modelled as per SRCOVER_MASKFILL
> > > > > 
> > > > > ( Both the solutions were profiled on windows x86_64 with
> > > > > JDK 9-internal Release build )
> > > > > 
> > > > > Test Cases Profiled:
> > > > > 
> > > > > With Font set to : Lucida sans, different combinations of
> > > > > 
> > > > > Font Styles : Plain, Bold, Italic, Bold Italic&&
> > > > > 
> > > > > Font Sizes   : 12, 20, 36, 72 points were tested.
> > > > > 
> > > > > Attached herewith: JDK8015070-Profiling Data.zip
> > > > > 
> > > > > The archive contains:
> > > > > 
> > > > > 1. Algorithms.cpp     : Just to have a quick glance of all the
> > > > > algorithms that we have tried so far.
> > > > > 
> > > > > 2. *.txt Files               : For each test, Java2D bench
> > > > > reports the average metrics/second.
> > > > > 
> > > > > The text file
> > > > > contains collection of all such average metric for nearly 16
> > > > > different test cases.
> > > > > 
> > > > > 3. res Output            : .res output from each of the test runs.
> > > > > 
> > > > > Observation from J2DBench Reports
> > > > > 
> > > > > 1.  I could not get time to run the Analyzer tool across
> > > > > each of these *.res files. I shall do them on Monday.
> > > > > 
> > > > > 2.  From the summary text ( average chars per. Second ) that
> > > > > J2DBench reported,
> > > > > 
> > > > > Un-optimized solution seems to be better for
> > > > > IntArgb pixel format and
> > > > > 
> > > > > Optimized solution is better for IntArgb_Pre
> > > > > pixel format by significant margin.
> > > > > 
> > > > > 3.  I tried to improve the optimized algorithm (based on
> > > > > SRCOVER_MASKFILL ) further by adding a if (dstA) { ...
> > > > > 
> > > > > Though there is a marginal improvement, the
> > > > > optimized solution could not exceed numbers of regular algorithm (for
> > > > > IntArgb pixel format)
> > > > > 
> > > > > This could be due to the extra calculations that
> > > > > we do in-order to load color components separately.
> > > > > 
> > > > > However, for IntArgb_Pre pixel format, the
> > > > > optimized solution is way-ahead and gives better performance.
> > > > > 
> > > > > 4. In the summary reports, you will find
> > > > > CompatibleBufferedImage( Translucent ) as well.
> > > > > 
> > > > > In a separate experiment, I found that the pixel
> > > > > format for compatible buffered image got mapped IntArgb_Pre.
> > > > > 
> > > > > Thus, the performance numbers match with that of
> > > > > IntArgb_Pre pixel format
> > > > > 
> > > > > At the present moment, I 'm confused on the solution would be better
> > > > > to fix the Bug.
> > > > > 
> > > > > Kindly share your views and thoughts
> > > > > 
> > > > > Thank you
> > > > > 
> > > > > Have a good day
> > > > > 
> > > > > Prahalad N.
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Jim Graham
> > > > > Sent: Thursday, March 31, 2016 1:46 AM
> > > > > To: Prahalad Kumar Narayanan; 2d-dev@openjdk.java.net
> > > > > <mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov
> > > > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > > 
> > > > > Hi Prahalad,
> > > > > 
> > > > > Benchmarks must run for a significant period of time to be valid.
> > > > > 
> > > > > Measuring with nanoTime() is fine, but the run times should exceed at
> > > > > least a couple of seconds, not a few nanoseconds.  You might want to
> > > > > use Java2DBench instead (in src/demo/share/java2d in the java.desktop
> > > > > repo), which calibrates test times, does multiple runs, and includes
> > > > > an analyzer to compare results from multiple test runs...
> > > > > 
> > > > > ...jim
> > > > > 
> > > > > On 3/30/2016 4:27 AM, Prahalad Kumar Narayanan wrote:
> > > > > 
> > > > > > Hello Jim and Everyone on Java2D Group
> > > > > > Good day to you.
> > > > > > A quick follow-up to Review Request on bug:
> > > > > > Bug          : JDK-8015070
> > > > > > Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
> > > > > > Thank you Jim for the detailed feedback.
> > > > > > It takes a lot of time not only in performing the review, but also in \
> > > > > > getting the feedback with clear words. In each review, the solution \
> > > > > > definitely gets better&  better. I 'm happy about it...! :)
> > > > > > Quick Inferences from previous feedback:
> > > > > > Incorporating the fast path into current logic:
> > > > > > 1.  I agree with you on this point and I noticed this when we modelled \
> > > > > > the solution as per the mask fill. 2.  I ignored it initially for two \
> > > > > > reasons, a.  The statement - if (resA != MaxValFor4ByteArgb)...  follows \
> > > > > > srcColor pre-multiplication step and this will ensure to skip most of the \
> > > > > > blending calculations pertaining to destination. b.  Any addition / \
> > > > > > tweaks to current logic, might deviate from the SRCOVER_MASKFILL \
> > > > > > reference model. Many a time, managing code with similar logic across \
> > > > > > implementation helps a lot. 3.   As you said, including the fast path \
> > > > > > will avoid few multiplications and if checks too. The changes are \
> > > > > > available in the current webrev. Link:
> > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
> > > > > > Profiling method, and Metrics:
> > > > > > 1.  The profiling method that was done yesterday was mere
> > > > > > execution of the regression test (available in the webrev) and time
> > > > > > measured with System.currentTimeMillis API
> > > > > > Since only one iteration was run, the time soared into milli seconds. \
> > > > > > This could be due to internal glyph caching mechanism. 2.  Today, I \
> > > > > > evolved the regression test, into a benchmark that does the following: a. \
> > > > > > Select Font style : {Plain, Bold, Italic, Bold Italic}
> > > > > > b.  Select Font size : {20, 40, 60, 80}
> > > > > > c.  Trigger drawstring once before benchmark is run. This is to ensure, \
> > > > > > the font subsystem is done with glyph caching internally. d.  Use famous \
> > > > > > string that has all characters-" The quick brown fox jumps over the lazy \
> > > > > > dog. 0123456789. " e.  For every style-size combination - run the test
> > > > > > for 20 iterations and take the average. (Note: Font is fixed to
> > > > > > 'verdana' )
> > > > > > f.  Modify the precision from System.currentTimeMillis to \
> > > > > > System.nanoTime() and reduce elapsedTime to micro seconds. 3.  With the \
> > > > > > above setup in code, my observation on windows is as follows: a.  In many \
> > > > > > cases, The optimized logic is either equal-to (or) better in performance \
> > > > > > than the un-optimized logic. The difference is very minimal - few tens to \
> > > > > > few hundreds of micro-seconds. b.  The optimized algorithm improves \
> > > > > > performance for Pre-multiplied alpha based destination buffer. c.   There \
> > > > > > are occasional huge deviations where optimized logic seems to take longer \
> > > > > > time. These could be due to external factors
> > > > > > like- stalls for memory, bus io etc.,
> > > > > > Since, the deviation is in micro seconds, I believe, it may not be a \
> > > > > > concern. d.  The complete list of time measurement taken up
> > > > > > on windows x86_64 release build is as-follows-
> > > > > > I 'm not sure, how the data appears in the final webrev-email.
> > > > > > Kindly excuse, if the data is un-readable.
> > > > > > Platform : Windows x86_64 Release Build Algorithm : Unoptimized.
> > > > > > webrev.00
> > > > > > ````````````````````````````````````````````````````````````````````
> > > > > > `
> > > > > > `
> > > > > > ```` Executing Bench For Image Type: IntArgb
> > > > > > -Font Style: Plain /Font Size: 20 /Time consumed (us): 84.742
> > > > > > -Font Style: Plain /Font Size: 40 /Time consumed (us): 318.395
> > > > > > -Font Style: Plain /Font Size: 60 /Time consumed (us): 657.474
> > > > > > -Font Style: Plain /Font Size: 80 /Time consumed (us): 813.079
> > > > > > -Font Style: Bold /Font Size: 20 /Time consumed (us): 386.380
> > > > > > -Font Style: Bold /Font Size: 40 /Time consumed (us): 339.301
> > > > > > -Font Style: Bold /Font Size: 60 /Time consumed (us): 492.631
> > > > > > -Font Style: Bold /Font Size: 80 /Time consumed (us): 625.812
> > > > > > -Font Style: Italic /Font Size: 20 /Time consumed (us): 235.059
> > > > > > -Font Style: Italic /Font Size: 40 /Time consumed (us): 320.180
> > > > > > -Font Style: Italic /Font Size: 60 /Time consumed (us): 474.558
> > > > > > -Font Style: Italic /Font Size: 80 /Time consumed (us): 1188.169
> > > > > > -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
> > > > > > 426.988
> > > > > > -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
> > > > > > 374.064
> > > > > > -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
> > > > > > 732.375
> > > > > > -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
> > > > > > 864.68
> > > > > > Executing Bench For Image Type: IntArgb_Pre
> > > > > > -Font Style: Plain /Font Size: 20 /Time consumed (us): 129.768
> > > > > > -Font Style: Plain /Font Size: 40 /Time consumed (us): 206.299
> > > > > > -Font Style: Plain /Font Size: 60 /Time consumed (us): 249.941
> > > > > > -Font Style: Plain /Font Size: 80 /Time consumed (us): 362.372
> > > > > > -Font Style: Bold /Font Size: 20 /Time consumed (us): 145.096
> > > > > > -Font Style: Bold /Font Size: 40 /Time consumed (us): 151.589
> > > > > > -Font Style: Bold /Font Size: 60 /Time consumed (us): 240.972
> > > > > > -Font Style: Bold /Font Size: 80 /Time consumed (us): 331.894
> > > > > > -Font Style: Italic /Font Size: 20 /Time consumed (us): 95.028
> > > > > > -Font Style: Italic /Font Size: 40 /Time consumed (us): 245.300
> > > > > > -Font Style: Italic /Font Size: 60 /Time consumed (us): 270.379
> > > > > > -Font Style: Italic /Font Size: 80 /Time consumed (us): 398.139
> > > > > > -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
> > > > > > 93.243
> > > > > > -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
> > > > > > 475.406
> > > > > > -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
> > > > > > 280.085
> > > > > > -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
> > > > > > 357.486
> > > > > > Platform : Windows x86_64 Release Build Algorithm : Optimized.
> > > > > > webrev.03
> > > > > > ````````````````````````````````````````````````````````````````````
> > > > > > `
> > > > > > `
> > > > > > ```` Executing Bench For Image Type: IntArgb
> > > > > > -Font Style: Plain /Font Size: 20 /Time consumed (us): 120.954
> > > > > > -Font Style: Plain /Font Size: 40 /Time consumed (us): 364.871
> > > > > > -Font Style: Plain /Font Size: 60 /Time consumed (us): 561.799
> > > > > > -Font Style: Plain /Font Size: 80 /Time consumed (us): 653.390
> > > > > > -Font Style: Bold /Font Size: 20 /Time consumed (us): 261.566
> > > > > > -Font Style: Bold /Font Size: 40 /Time consumed (us): 311.054
> > > > > > -Font Style: Bold /Font Size: 60 /Time consumed (us): 490.735
> > > > > > -Font Style: Bold /Font Size: 80 /Time consumed (us): 656.559
> > > > > > -Font Style: Italic /Font Size: 20 /Time consumed (us): 314.289
> > > > > > -Font Style: Italic /Font Size: 40 /Time consumed (us): 378.750
> > > > > > -Font Style: Italic /Font Size: 60 /Time consumed (us): 491.181
> > > > > > -Font Style: Italic /Font Size: 80 /Time consumed (us): 770.172
> > > > > > -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
> > > > > > 375.336
> > > > > > -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
> > > > > > 571.371
> > > > > > -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
> > > > > > 548.300
> > > > > > -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
> > > > > > 714.526
> > > > > > Executing Bench For Image Type: IntArgb_Pre
> > > > > > -Font Style: Plain /Font Size: 20 /Time consumed (us): 45.026
> > > > > > -Font Style: Plain /Font Size: 40 /Time consumed (us): 219.016
> > > > > > -Font Style: Plain /Font Size: 60 /Time consumed (us): 279.617
> > > > > > -Font Style: Plain /Font Size: 80 /Time consumed (us): 282.829
> > > > > > -Font Style: Bold /Font Size: 20 /Time consumed (us): 51.809
> > > > > > -Font Style: Bold /Font Size: 40 /Time consumed (us): 117.563
> > > > > > -Font Style: Bold /Font Size: 60 /Time consumed (us): 508.049
> > > > > > -Font Style: Bold /Font Size: 80 /Time consumed (us): 402.802
> > > > > > -Font Style: Italic /Font Size: 20 /Time consumed (us): 79.320
> > > > > > -Font Style: Italic /Font Size: 40 /Time consumed (us): 227.473
> > > > > > -Font Style: Italic /Font Size: 60 /Time consumed (us): 330.488
> > > > > > -Font Style: Italic /Font Size: 80 /Time consumed (us): 353.782
> > > > > > -Font Style: Bold-Italic /Font Size: 20 /Time consumed (us):
> > > > > > 54.687
> > > > > > -Font Style: Bold-Italic /Font Size: 40 /Time consumed (us):
> > > > > > 235.505
> > > > > > -Font Style: Bold-Italic /Font Size: 60 /Time consumed (us):
> > > > > > 227.205
> > > > > > -Font Style: Bold-Italic /Font Size: 80 /Time consumed (us):
> > > > > > 324.308
> > > > > > Updated webrev with changes for the fast-path :
> > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.03/
> > > > > > Kindly review and provide your suggestions.
> > > > > > Thank you once again for detailed review and feedback Have a good
> > > > > > day
> > > > > > Prahalad N.
> > > > > > -----Original Message-----
> > > > > > From: Jim Graham
> > > > > > Sent: Wednesday, March 30, 2016 2:46 AM
> > > > > > To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
> > > > > > <mailto:2d-dev@openjdk.java.net>; Sergey Bylokhov
> > > > > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > > > Hi Prahalad,
> > > > > > This latest version looks like it should produce correct answers.
> > > > > > I'd like to see benchmark results on more platforms, particularly Windows \
> > > > > > since the different compilers may optimize these things differently. \
> > > > > > Also, you didn't mention what data set you used for benchmarking. I'd
> > > > > > like to see benchmark results for small, medium and large font
> > > > > > sizes,
> > > > > > and possibly bold and italic fonts as well.  The reason is that the
> > > > > > relative ratios of "empty glyph pixels" to "partial glyph pixels" to
> > > > > > "fully covered glyph pixels" changes depending on the font type and
> > > > > > size so if you made one of those faster and another slower then the
> > > > > > results may be seen as a gain in one type of test if you only test
> > > > > > one
> > > > > > font type and size and it happens to match the part of the code that
> > > > > > is more streamlined.  Also, for small font sizes the per-glyph
> > > > > > overhead might hide per-pixel issues.  Please share which fonts and
> > > > > > sizes you used for testing and consider using some different sizes,
> > > > > > including something very large like 36 or 48 points (something with
> > > > > > large areas of opaque
> > > > > > pixels) as well as more normal sizes that appear in GUIs.  Also, bold \
> > > > > > fonts can have a higher percentage of opaque pixels. In particular...
> > > > > > This latest version is missing the "fast path" from the existing code for \
> > > > > > the case of srcA == 255 and glyphA == 255 where it just stores the \
> > > > > > FG_PIXEL values directly.  For large fonts/glyphs that case may be as \
> > > > > > important as detecting empty glyph pixels. On the other hand, an \
> > > > > > additional "if" may cause the compiler to generate less efficient code as \
> > > > > > per Sergey's concern.  Since this "if" eliminates some multiplies and \
> > > > > > possibly some divides, I'd hope it would be a win-win. You could add the \
> > > > > > fast path back inside the case where mixValSrc is 255 and just test srcA \
> > > > > > for 255 and do the Store ## DST ## PixelData() macro that used to be at \
> > > > > > the end of the block in the existing code, and then use "break;" to \
> > > > > > escape out of the do/while surrounding the whole macro so it moves on to \
> > > > > > the next pixel. (Arguably, we might want to consider teaching our \
> > > > > > SRCOVER_MASKFILL to
> > > > > > do the same thing.  I think that was one of the added values of
> > > > > > having
> > > > > > a separate GLYPH loop, but really both should be optimizing that
> > > > > > case...)
> > > > > > I can see now that the macro switch to use the same macro set as \
> > > > > > SRCOVER_MASKFILL required you to compute the pixel address, as you noted \
> > > > > > in your summary.  It makes the new macro more cumbersome and makes it \
> > > > > > look like it's doing a bit more work per-pixel, but in reality I think \
> > > > > > the overall operations end up being the same as long as the compiler \
> > > > > > optimizes the deliberate multiplications the same way it did for implicit \
> > > > > > multiplications in the "pRas[foo]" and "pRas[foo*4]" code that was being \
> > > > > >                 generated previously.  Benchmarks will tell us if we're \
> > > > > >                 good there...
> > > > > > ...jim
> > > > > > On 3/28/16 5:33 AM, Prahalad Kumar Narayanan wrote:
> > > > > > > Hello Everyone on Java2D Group
> > > > > > > Good day to you.
> > > > > > > This is a follow-up to Review Request on bug:
> > > > > > > Bug          : JDK-8015070
> > > > > > > Bug Link :https://bugs.openjdk.java.net/browse/JDK-8015070
> > > > > > > First, Thanks to Jim and Sergey for their feedback on the changes so \
> > > > > > > far. Inferences from Jim 's Feedback on Loading destination colors:
> > > > > > > 1.  The API or Macro that I had earlier used to Load DST colors, \
> > > > > > > indeed, adjusted destination color components with divide-by-alpha if \
> > > > > > > destination was already pre-multiplied. My apologies.. I should have \
> > > > > > > spotted this error at the first iteration itself. 2.  Due to the \
> > > > > > > divide-by-alpha adjustment, the remaining logic would become incorrect. \
> > > > > > > ( Especially, the multiplication with dstF based on pre-mulitplication \
> > > > > > > status ) 3.  Correct API is being used now, and the dstColor components \
> > > > > > > are loaded directly without any divide-by-alpha adjustment. Inferences \
> > > > > > > from Sergey's Feedback on Performance. 1.  To set the context for \
> > > > > > > everyone, the logic present in the current webrev.02 is modelled as per \
> > > > > > > SRCOVER_MASKFILL. There are multiple if (...) conditions that remove \
> > > > > > > un-necessary blending calculations. Ideally this should improve \
> > > > > > > performance. However, since some data are not readily available (as \
> > > > > > > present in SRCOVER_MASKFILL), few additional calculations have been \
> > > > > > > added. They are: pre-multiplying srcColor with alpha and assigning to \
> > > > > > > res. Finding the correct address of Pixel using DST_PTR and \
> > > > > > > PixelStride. Henceforth, as Sergey suggests, Observation on performance \
> > > > > > > will be beneficial. 2.  The performance of the new logic was measured \
> > > > > > > with linux-x86_64-normal-server-release config and compared with the
> > > > > > > logic used in un-optimized code in webrev.00
> > > > > > > 3.  Result: The newer logic provides a fractional gain (about 15 - 20 \
> > > > > > > ms) over the older logic. Other Subtle Changes:
> > > > > > > 1.  The test file has been renamed from
> > > > > > > AADrawStringArtifact.java to more meaningful -
> > > > > > > AntialiasedTextArtifact.java
> > > > > > > 2.  The test file tests for both TYPE_INT_ARGB and TYPE_INT_ARGB_PRE \
> > > > > > > BufferedImage formats. The code has been well commented to explain the \
> > > > > > > logic used in every function. Kindly take your time to review the \
> > > > > > > changes in the webrev link mentioned below and provide your \
> > > > > > > suggestions. Webrev Link:
> > > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.02/
> > > > > > > Thank you for your time in review
> > > > > > > Have a good day
> > > > > > > Prahalad N.
> > > > > > > -----Original Message-----
> > > > > > > From: Jim Graham
> > > > > > > Sent: Thursday, March 24, 2016 7:57 AM
> > > > > > > To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
> > > > > > > <mailto:2d-dev@openjdk.java.net>
> > > > > > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > > > > Hi Prahalad,
> > > > > > > (On a side note - ouch!  I came up with these macros in the first
> > > > > > > place, but 20 years later I'm now realizing just how hard they are
> > > > > > > to
> > > > > > > navigate and review.  My apologies...)
> > > > > > > The macro you are still using to load the destination data is one that \
> > > > > > > loads it to non-premultiplied components, which means it will divide \
> > > > > > > out the alpha if the destination is PRE.  The rest of the logic assumes \
> > > > > > > that the components were loaded without any adjustment of their \
> > > > > > > premultiplication so not only is that division an unnecessary \
> > > > > > > operation, it makes the following math wrong. The SRCOVER_MASKFILL \
> > > > > > > macro seems to use "Postload ## STRATEGY ## From ## TYPE" which seems \
> > > > > > > to load them into separate components without any adjustment of their \
> > > > > > > pre-multiplication status.  This is paired with "LoadAlphaFrom ## TYPE \
> > > > > > > ## For ## STRATEGY" to load just the destination alpha for computing \
> > > > > > >                 dstF...
> > > > > > > ...jim
> > > > > > > On 3/22/16 4:35 AM, Prahalad Kumar Narayanan wrote:
> > > > > > > > Hello Everyone on Java2D Group
> > > > > > > > Good day to you.
> > > > > > > > This is a Follow-up to Review Request on the bug:
> > > > > > > > Bug          : JDK-8015070   Anti-aliased Text on Translucent \
> > > > > > > > background gets bright artifacts Bug Link \
> > > > > > > > :https://bugs.openjdk.java.net/browse/JDK-8015070 First, Sincere \
> > > > > > > > thanks to Jim for his valuable feedback. 1.  As Jim correctly \
> > > > > > > > mentioned, SRCOVER_MASKFILL has a similar logic to blend two \
> > > > > > > > Translucent colors based on an Alpha mask. 2.  The calculations are \
> > > > > > > > exactly the same as the changes in previous webrev. However the logic \
> > > > > > > > of SRCOVER_MASKFILL is 'Optimized' to reduce the number of \
> > > > > > > > computations. 3.  This optimization is definitely required because, \
> > > > > > > >                 the logic is executed for every single pixel in a \
> > > > > > > >                 glyph.
> > > > > > > > Example: If a string is made up of 5 English
> > > > > > > > characters with each character having 32 x 32 pixels,
> > > > > > > > The anti-aliasing logic will be executed for 5 x 32 x 32 iterations.
> > > > > > > > Reducing computation per pixel will imply a huge benefit for complete \
> > > > > > > > drawString operation. Observation from SRCOVER_MASKFILL
> > > > > > > > 1.  The mask fill reduces computations by through multiple if(...) \
> > > > > > > > conditions. Each if condition affirms whether the next set of \
> > > > > > > > computations are required. 2.  Optimization 1: If mask value is 0- \
> > > > > > > > skip entire logic. 3.  Optimization 2: If mask value is maximum, say \
> > > > > > > > 255, take srcA directly without multiplying with maskAlpha ( Reason: \
> > > > > > > > 1 * srcAlpha = srcAlpha )
> > > > > > > > 4.  Optimization 3: Compute pre-multiplied resColor in two steps. \
> > > > > > > > First with srcColor and then with dstColor. If the resAlpha from 1st \
> > > > > > > > step (i.e., srcColor) is fully opaque, avoid blending dstColor \
> > > > > > > > altogether. Changes in Current Webrev.01
> > > > > > > > 1.  The fix for the current bug is modelled based on \
> > > > > > > > SRCOVER_MASKFILL. 2.  The changes have been verified to work on \
> > > > > > > > windows, linux and mac operating systems. 3.  The automated Test \
> > > > > > > > file- AADrawStringArtifact.java runs as expected
> > > > > > > > Identifies artifact&  throws exception when run on JDK 7 and 8.
> > > > > > > > With JDK9, the test file returns without error.
> > > > > > > > 3.  JPRT build has been run to ensure, changes build on all supported \
> > > > > > > > platforms. JPRT job link :
> > > > > > > > http://scaaa637.us.oracle.com//archive/2016/03/2016-03-22-070604.p
> > > > > > > > ra
> > > > > <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-22-070604.pra>
> > > > > 
> > > > > > > > h
> > > > > > > > n
> > > > > > > > ara-linux.client
> > > > > > > > Kindly review the changes in the below mentioned link and provide
> > > > > > > > your views
> > > > > > > > Webrev Link :
> > > > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.01/
> > > > > > > > Thank you for your time in review
> > > > > > > > Have a good day
> > > > > > > > Prahalad N.
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jim Graham
> > > > > > > > Sent: Friday, March 18, 2016 6:07 AM
> > > > > > > > To: Prahalad Kumar Narayanan;2d-dev@openjdk.java.net
> > > > > > > > <mailto:2d-dev@openjdk.java.net>
> > > > > > > > Subject: Re: [OpenJDK 2D-Dev] [2D-Dev] Review Request: JDK-8015070:
> > > > > > > > Antialiased text on translucent backgrounds gets bright artifacts
> > > > > > > > Hi Prahalad,
> > > > > > > > This basically boils down to "alpha blending math needs to be \
> > > > > > > > performed in premultiplied form" and the existing code was not doing \
> > > > > > > > that. Your changes do add a pre-multiplication step to the math in \
> > > > > > > > two places
> > > > > > > > - when you mix the src alpha and the glyph alpha at the top of the \
> > > > > > > > macro, and again when you do the Multiply(dst, dstA, dst) step in the \
> > > > > > > > middle.  In that sense, the new math is correct. However, it is not \
> > > > > > > > the optimal way to implement this.  In particular, the macro used \
> > > > > > > > here to load the data from the destination is the one that loads it \
> > > > > > > > into 4 ARGB non-premultiplied values.  If the destination is non-PRE, \
> > > > > > > > then your added multiply step is exactly what is needed, but it could \
> > > > > > > > be combined with the multiply that will happen later in the blending \
> > > > > > > > equation, so it is an added step rather than a fixed fraction in the \
> > > > > > > > existing MultMultAdd parameters. Additionally, if the destination is \
> > > > > > > > already PRE, then the macro being used to load the dst pixel data \
> > > > > > > > there will have done a divide
> > > > > > > > step to divide out the alpha from the destination, only to have
> > > > > > > > you
> > > > > > > > reverse that math with your new
> > > > > > > > Multiply() step.  That's a lot of math to end up with a NOP.
> > > > > > > > The MUL8 you added for the srcA and glyph value is needed, that \
> > > > > > > > change is good.  Since it is common for glyph pixels to have zero \
> > > > > > > > alpha, you might want to test the glyph alpha for 0 and skip the \
> > > > > > > > pixel before you do the multiply, though.  This would add one more \
> > > > > > > > if, but it would be a common case. The trickier part is to load the \
> > > > > > > > destination components without un-premultiplying them.  Unfortunately \
> > > > > > > > there is no "Load...ToArgbPre" macro to parallel the Load macro used \
> > > > > > > > in the function.  Perhaps there should be, but you'd still end up \
> > > > > > > > with an extra multiply step as I mentioned above because you can fold \
> > > > > > > > the premultiplication of the dst data into the MultMultAdd by \
> > > > > > > > carefully choosing the parameters you use in the math there. The good \
> > > > > > > > news is that the way that the SRCOVER_MASKFILL uses the various \
> > > > > > > > type-specific macros works around this a bit and minimizes the number \
> > > > > > > > of multiplies that happen.  You could check out \
> > > > > > > > DEFINE_SRCOVER_MASKFILL and see how it works in the case where pMask \
> > > > > > > > is not null (pMask is an alpha mask with values very similar to the \
> > > > > > > > glyphAA data).  Modeling this code on that code would correct the \
> > > > > > > >                 math and minimize it as well...
> > > > > > > > ...jim
> > > > > > > > On 3/17/16 3:00 AM, Prahalad Kumar Narayanan wrote:
> > > > > > > > > Hello Everyone on Java2D Group
> > > > > > > > > Good day to you.
> > > > > > > > > Herewith, I 'm sharing the webrev for two identical Java2D Bugs.
> > > > > > > > > Bug ID : JDK-8015070
> > > > > > > > > Title     : Antialiased text on translucent backgrounds gets
> > > > > > > > > bright artifacts
> > > > > > > > > Link      :https://bugs.openjdk.java.net/browse/JDK-8015070
> > > > > > > > > Bug ID : JDK-8013564 ( currently closed as duplicate )
> > > > > > > > > Title     : Font rendering anti-aliasing in translucent
> > > > > > > > > BufferedImages broken
> > > > > > > > > Link      :https://bugs.openjdk.java.net/browse/JDK-8013564
> > > > > > > > > Webrev Link :
> > > > > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
> > > > > > > > > Quick Summary on Bugs :
> > > > > > > > > ````````````````````````````````````````````````
> > > > > > > > > 1.  Artifacts appear on Edges of text characters when
> > > > > > > > > anti-aliased
> > > > > > > > > text is drawn on Translucent background
> > > > > > > > > 2.  The issue is reproducible on all platforms - windows, linux and \
> > > > > > > > > mac os. 3.  Besides, the issue is reproduced with the commonly used \
> > > > > > > > > pixel format- 4ByteArgb.
> > > > > > > > > Root Cause&  Solution :
> > > > > > > > > ````````````````````````````````````````````````
> > > > > > > > > 1.  The Macro: GlyphListAABlend4ByteArgb in File: LoopMacros.h
> > > > > > > > > uses
> > > > > > > > > the standard blending algorithm
> > > > > > > > > dstColor = [ srcColor * glyphAlpha + dstColor * (1 -
> > > > > > > > > glyphAlpha) ] / dstAlpha
> > > > > > > > > 2.  The above equation works only when the srcColor and dstColor \
> > > > > > > > > are Opaque. 3.  When srcColor and dstColor are Translucent, their
> > > > > > > > > alphaComponent will influence the visibility of the color, and
> > > > > > > > > visibility of the color below.
> > > > > > > > > 4.  The new set of calculations for blending Translucent source
> > > > > > > > > and
> > > > > > > > > destination colors is given as
> > > > > > > > > resAlpha = 1 - ((1 - srcAlpha) * (1 - dstAlpha))
> > > > > > > > > resColor = [ srcColor * srcAlpha + dstColor *
> > > > > > > > > dstAlpha
> > > > > > > > > *
> > > > > > > > > (1
> > > > > > > > > -
> > > > > > > > > srcAlpha) ] / resAlpha
> > > > > > > > > 5.  Reference text for the equation:
> > > > > > > > > https://en.wikipedia.org/wiki/Alpha_compositing
> > > > > > > > > 6.  With the above modification to the blending logic, the
> > > > > > > > > artifacts do not appear and issues are fixed.
> > > > > > > > > Jtreg&  Jprt Results :
> > > > > > > > > ````````````````````````````````````````````````
> > > > > > > > > 1.  A simple test-file: AADrawStringArtifact.java has been
> > > > > > > > > created
> > > > > > > > > to be a part of jtreg test cases.
> > > > > > > > > The test file is well commented to explain - nature
> > > > > > > > > of
> > > > > > > > > artifact and how the test tries to identify them.
> > > > > > > > > As required, the test case fails with Jdk 7, Jdk 8
> > > > > > > > > and
> > > > > > > > > succeeds with Jdk 9-internal (built with changes for the bug fix)
> > > > > > > > > 2.  The changes to blending logic lies within
> > > > > > > > > java.desktop/src/share/native/...
> > > > > > > > > Henceforth, JPRT was used to ensure successful build
> > > > > > > > > across all supported platforms
> > > > > > > > > Jprt Job Link :
> > > > > > > > > http://scaaa637.us.oracle.com//archive/2016/03/2016-03-17-072001.
> > > > > > > > > pr
> > > > > <http://scaaa637.us.oracle.com/archive/2016/03/2016-03-17-072001.pr>
> > > > > 
> > > > > > > > > a
> > > > > > > > > h
> > > > > > > > > n
> > > > > > > > > ara-linux.client
> > > > > > > > > The build succeeds on all platforms.
> > > > > > > > > Kindly review the webrev link and provide your views and \
> > > > > > > > > suggestions. Webrev Link :
> > > > > > > > > http://cr.openjdk.java.net/~psadhukhan/prahlad/8015070/webrev.00/
> > > > > > > > > If the changes look good, we could take up the changes for source \
> > > > > > > > > checkin. Thank you for your time in review
> > > > > > > > > Have a good day
> > > > > > > > > Prahalad N.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic