[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mercurial
Subject:    Re: Encoding problem on Subversion import
From:       Matt Mackall <mpm () selenic ! com>
Date:       2007-11-29 23:57:03
Message-ID: 20071129235703.GZ19691 () waste ! org
[Download RAW message or body]

On Thu, Nov 29, 2007 at 04:04:49PM -0600, Russ Brown wrote:
> Russ Brown wrote:
> > Hi
> > 
> > I'm trying to import trunk of our subversion repository, and have hit an
> > unhandled exception:
> > 
> > ** unknown exception encountered, details follow
> > ** report bug details to http://www.selenic.com/mercurial/bts
> > ** or mercurial@selenic.com
> > ** Mercurial Distributed SCM (version 0.9.5)
> > Traceback (most recent call last):
> >   File "/usr/bin/hg", line 14, in ?
> >     mercurial.dispatch.run()
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 20, in run
> >     sys.exit(dispatch(sys.argv[1:]))
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 29, in dispatch
> >     return _runcatch(u, args)
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 45, in _runcatch
> >     return _dispatch(ui, args)
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 348, in _dispatch
> >     ret = _runcommand(ui, options, cmd, d)
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 401, in _runcommand
> >     return checkargs()
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 357, in checkargs
> >     return cmdfunc()
> >   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line
> > 342, in <lambda>
> >     d = lambda: func(ui, *args, **cmdoptions)
> >   File "/usr/lib/python2.4/site-packages/hgext/convert/__init__.py",
> > line 380, in convert
> >     c.convert()
> >   File "/usr/lib/python2.4/site-packages/hgext/convert/__init__.py",
> > line 270, in convert
> >     self.copy(c)
> >   File "/usr/lib/python2.4/site-packages/hgext/convert/__init__.py",
> > line 214, in copy
> >     changes = self.source.getchanges(rev)
> >   File "/usr/lib/python2.4/site-packages/hgext/convert/subversion.py",
> > line 226, in getchanges
> >     files, copies = self.expandpaths(rev, paths, parents)
> >   File "/usr/lib/python2.4/site-packages/hgext/convert/subversion.py",
> > line 395, in expandpaths
> >     self.ui.debug("Copied to %s from %s@%s\n" % (entry, copyfrom_path,
> > ent.copyfrom_rev))
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 31:
> > ordinal not in range(128)
> > 
> > 
> > 
> > The log message for the revision being imported is:
> > 
> > "renamed file with pound sign"
> > 
> > which I remember committing myself: I was renaming a file that had been
> > given a name containing a British Pound symbol. The oddly-named file was
> > committed by Windows users, but caused problems for us svk users hence
> > the need to rename it at the time. I also seem to remember Trac having
> > some problems with it too.
> > 
> > Is there anything I can try to get this working? Thanks!
> > 
> 
> Responding to my own post here, but commenting out the debug line the
> error occurs on allows the conversion to continue. I don't know Python
> well enough to know how to actually *fix* the line though or I'd submit
> a patch. Most likely the line needs to be changed to be Unicode-aware or
> something like that.

Urgh. The problem is in the SVN Python interface. It's handing us
Unicode when we want raw bytes. 

The insidious thing is that Python tries to pretend that Unicode
strings are normal strings, so you don't know it's doing stupid things
behind the scenes until you throw something unusual at it. Then it
barfs all over your poor user.

The other insidious thing is that the original raw bytes are now no
longer retrievable. You can't convert a Unicode string back into the
original raw bytes because it could have been encoded in any number of
character sets. Probably the best we can do is coerce all these
filenames to utf-8 and hope for the best.

(There's far too much wishful thinking and hand-waving in Unicodeland.)

-- 
Mathematics is the supreme nostalgia of our time.
_______________________________________________
Mercurial mailing list
Mercurial@selenic.com
http://selenic.com/mailman/listinfo/mercurial
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic