'[jira] [Created] (AVRO-3834) [Python] Incorrect decimal encoding/decoding'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       avro-dev
Subject:    [jira] [Created] (AVRO-3834) [Python] Incorrect decimal encoding/decoding
From:       "Steve Stagg (Jira)" <jira () apache ! org>
Date:       2023-08-17 9:16:00
Message-ID: JIRA.13547698.1692263726000.11300.1692263760084 () Atlassian ! JIRA
[Download RAW message or body]

Steve Stagg created AVRO-3834:
---------------------------------

             Summary: [Python] Incorrect decimal encoding/decoding
                 Key: AVRO-3834
                 URL: https://issues.apache.org/jira/browse/AVRO-3834
             Project: Apache Avro
          Issue Type: Bug
          Components: logical types, python
    Affects Versions: 1.11.2
         Environment: Python 3.10.3, Avro 1.11.2

  
            Reporter: Steve Stagg


When encoding `decimal.Decimal` values using the python avro library, the exponent of \
the value is largely ignored.

This means that incorrect twos-complement values are calculated, and we end up with \
incorrect avros are produced.

Here's a reasonalby compact reproducer:

```python
import avro
import avro.io
from decimal import Decimal
from io import BytesIO

TESTS = [
    '314',
    '31',
    '3',
    '3.1',
    '31.4',
    '3.14',
    '3.141',
    '3.1415',
]

if __name__ == '__main__':
    schema_text = '''{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 8,
  "scale": 4
    }'''
    print(f"AVRO VERSION: {avro.__version__}")
    schema = avro.schema.parse(schema_text)
    writer = avro.io.DatumWriter(schema)
    reader = avro.io.DatumReader(schema)

    for val in TESTS:
        buf = BytesIO()

        val = Decimal(val)
        writer.write(val, avro.io.BinaryEncoder(buf))
        buf.seek(0)
        decoded_val = reader.read(avro.io.BinaryDecoder(buf))
        
        match = val == decoded_val
        result = 'PASS' if match else 'FAIL'
        print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}   {result}')
        
```
Which outputs:
```
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314    FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031    FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003    FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031    FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314    FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314    FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141    FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415    PASS
```

The problem is that the code here:
https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
 does not use `exp` to shift the digits, exp is just checked to ensure it's not \
greater than scale for validation purposes.

If you look in the output, the produced avro bytes for '31.4' and '3.14' is \
identical, because the exp is ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic