[prev in list] [next in list] [prev in thread] [next in thread]
List: avro-dev
Subject: [jira] [Created] (AVRO-3834) [Python] Incorrect decimal encoding/decoding
From: "Steve Stagg (Jira)" <jira () apache ! org>
Date: 2023-08-17 9:16:00
Message-ID: JIRA.13547698.1692263726000.11300.1692263760084 () Atlassian ! JIRA
[Download RAW message or body]
Steve Stagg created AVRO-3834:
---------------------------------
Summary: [Python] Incorrect decimal encoding/decoding
Key: AVRO-3834
URL: https://issues.apache.org/jira/browse/AVRO-3834
Project: Apache Avro
Issue Type: Bug
Components: logical types, python
Affects Versions: 1.11.2
Environment: Python 3.10.3, Avro 1.11.2
Reporter: Steve Stagg
When encoding `decimal.Decimal` values using the python avro library, the exponent of \
the value is largely ignored.
This means that incorrect twos-complement values are calculated, and we end up with \
incorrect avros are produced.
Here's a reasonalby compact reproducer:
```python
import avro
import avro.io
from decimal import Decimal
from io import BytesIO
TESTS = [
'314',
'31',
'3',
'3.1',
'31.4',
'3.14',
'3.141',
'3.1415',
]
if __name__ == '__main__':
schema_text = '''{
"type": "bytes",
"logicalType": "decimal",
"precision": 8,
"scale": 4
}'''
print(f"AVRO VERSION: {avro.__version__}")
schema = avro.schema.parse(schema_text)
writer = avro.io.DatumWriter(schema)
reader = avro.io.DatumReader(schema)
for val in TESTS:
buf = BytesIO()
val = Decimal(val)
writer.write(val, avro.io.BinaryEncoder(buf))
buf.seek(0)
decoded_val = reader.read(avro.io.BinaryDecoder(buf))
match = val == decoded_val
result = 'PASS' if match else 'FAIL'
print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val} {result}')
```
Which outputs:
```
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031 FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003 FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031 FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141 FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415 PASS
```
The problem is that the code here:
https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
does not use `exp` to shift the digits, exp is just checked to ensure it's not \
greater than scale for validation purposes.
If you look in the output, the produced avro bytes for '31.4' and '3.14' is \
identical, because the exp is ignored.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic