[prev in list] [next in list] [prev in thread] [next in thread]
List: python-list
Subject: Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
From: "hongy... () gmail ! com" <hongyi ! zhao () gmail ! com>
Date: 2021-09-30 3:53:44
Message-ID: aa18ac06-02d2-4b26-976a-0f38ab8b5c6cn () googlegroups ! com
[Download RAW message or body]
On Thursday, September 30, 2021 at 9:20:37 AM UTC+8, hongy...@gmail.com wrote:
> On Thursday, September 30, 2021 at 5:20:04 AM UTC+8, Peter J. Holzer wrote:
> > On 2021-09-29 01:22:03 -0700, hongy...@gmail.com wrote:
> > > I tried to convert a xls file into csv with the following command, but failed:
> > >
> > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls
> > > XLRDError: Unsupported format, or corrupt file: Expected BOF record; found \
> > > b'\r\n\r\n\r\n\r\n'
> > > The above testing file is located at here [1].
> > >
> > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls
> > Why is that file name .xls when it's obviously an HTML file?
> Good catch! Thank you for pointing this out. This file is automatically exported \
> from my university's teaching management system, and it was assigned the .xls \
> extension by default.
According to the above comment, after I change the extension to html, the following \
python code will do the trick:
import sys
import pandas as pd
if len(sys.argv) != 2:
print('Usage: ' + sys.argv[0] + ' input-file')
exit(1)
myhtml_pd = pd.read_html(sys.argv[1])
#In [25]: len(myhtml_pd)
#Out[25]: 3
for i in myhtml_pd[2].index:
if i > 0:
for j in myhtml_pd[2].columns:
if j >1 and not pd.isnull(myhtml_pd[2].loc[i][j]):
print(myhtml_pd[2].loc[i][j])
HZ
--
https://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic