[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mozilla-i18n
Subject:    Options for numeric shaping
From:       smontagu () il ! ibm ! com
Date:       2001-02-20 17:05:02
[Download RAW message or body]




The following is my understanding of the Bidi options for numeric shaping.
I hope it will go some way to answering Frank's questions. I have also
posed some questions about the existing code, hoping for feedback.

I will use the following terminology:

European numerals: numerals in the Unicode range 0030-0039
Hindi numerals: numerals in the Unicode range 0660-0669

This isn't really correct, but talking about "Arabic numerals" and "Arabic
letters" in the same sentence always confuses me.

The option flag IBMBIDI_NUMERAL has 4 possible values:
IBMBIDI_NUMERAL_HINDI: Convert all European numerals to Hindi numerals
IBMBIDI_NUMERAL_ARABIC: Convert all Hindi numerals to European numerals
IBMBIDI_NUMERAL_REGULAR ==Regular Contextual: if the last alphabetic
character before a European numeral is an Arabic letter, convert it to a
Hindi numeral; if there is no such character, make no conversion by
default.
IBMBIDI_NUMERAL_HINDICONTEXT == Hindi Contextual: if the last alphabetic
character before a European numeral is an Arabic letter, convert it to a
Hindi numeral; if there is no such character, convert to Hindi numerals by
default.

I am not clear whether in the "contextual" options, Hindi numerals after
non-Arabic text should be converted to European numerals.

Here is the HandleNumbers method from nsBidiUtilsImp.cpp:

NS_IMETHODIMP nsBidiUtilsImp::HandleNumbers(PRUnichar* aBuffer, PRUint32
aSize, PRUint32 aNumFlag)
{
  uint32 i;
  // IBMBIDI_NUMERAL_REGULAR *
  // IBMBIDI_NUMERAL_HINDICONTEXT
  // IBMBIDI_NUMERAL_ARABIC
  // IBMBIDI_NUMERAL_HINDI
//ahmed
  mNumflag=aNumFlag;

  switch (aNumFlag) {
    case IBMBIDI_NUMERAL_HINDI:
      for (i=0;i<aSize;i++)
        nsBidiUtilsImp::NumbersToHindi(&(aBuffer[i]));
      break;
    case IBMBIDI_NUMERAL_ARABIC:
      for (i=0;i<aSize;i++)
        nsBidiUtilsImp::NumbersToArabic(&(aBuffer[i]));
      break;
    default : // IBMBIDI_NUMERAL_REGULAR, IBMBIDI_NUMERAL_HINDICONTEXT
      for (i=0;i<aSize;i++) {
        if (i>0) // not 1st char
          if (IS_ARABIC_CHAR(aBuffer[i-1]))
nsBidiUtilsImp::NumbersToHindi(&(aBuffer[i]));
          else nsBidiUtilsImp::NumbersToArabic(&(aBuffer[i]));
      }
      break;
  }
  return NS_OK;
}

and here is where it is called from nsBidiPresUtils.cpp:

if (IBMBIDI_NUMERAL_HINDI == mBidioptions.mNumeral)
  mUnicodeUtils->HandleNumbers(aText,aTextLength,IBMBIDI_NUMERAL_HINDI);
else if (IBMBIDI_NUMERAL_ARABIC == mBidioptions.mNumeral)
  mUnicodeUtils->HandleNumbers(aText,aTextLength,IBMBIDI_NUMERAL_ARABIC);
else if ( (IBMBIDI_NUMERAL_REGULAR == mBidioptions.mNumeral)
         || (IBMBIDI_NUMERAL_HINDICONTEXT == mBidioptions.mNumeral) ) {
  if (U_EUROPEAN_NUMBER == aTextClass)
 mUnicodeUtils->HandleNumbers(aText,aTextLength,IBMBIDI_NUMERAL_ARABIC);
  else if (U_ARABIC_NUMBER == aTextClass)
    mUnicodeUtils->HandleNumbers(aText,aTextLength,IBMBIDI_NUMERAL_HINDI);
}

N.B. aTextClass is the "resolved text class" after performing the Bidi
algorithm, so a European number after Arabic text will have the text class
U_ARABIC_NUMBER. However, a Hindi number after non-Arabic text also has
the text class U_ARABIC_NUMBER, so in the two "contextual" options Hindi
numbers are never converted to European numbers. As I said above, I'm not
sure if this is correct or not.

I have two questions about all this.
At the moment, there is no difference between IBMBIDI_NUMERAL_REGULAR and
IBMBIDI_NUMERAL_HINDICONTEXT. How could we change this?

Why do we have a method HandleNum as well as HandleNumbers? I see that
HandleNum is called from DoCopy in nsPresShell.cpp, when the clipboard
mode is logical, the document is visual, the charset is Arabic, and not on
a Bidi system. Why is this necessary.

Best regards,
Simon

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic