[prev in list] [next in list] [prev in thread] [next in thread] 

List:       taglib-devel
Subject:    Re: Question about taglib abilities
From:       Антон Сергунов <setosha () gmail ! com>
Date:       2012-07-15 17:55:25
Message-ID: CAPdia6vRRdLoJChbvUNaPTxcxVE6a-D8h-HOPY+UTTPHy9RqaA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Yes.

Usually id3 tag has both id3v1 and id3v2 tags.

Id3v1 should have all text fields latin1 encoded.
But by historical reasons it usual (thanks to winamp player) has local
windows encoding.
Because it was only way to save non latin1 strings.

so tagLib has TagLib::ID3v1::Tag::setStringHandler() function to overwrite
id3v1 string encoder.


id3v2 has encoding field and can use unicode-16 utf-8.
But I saw my own eyes id3v2 fields saved this way.
I think some software translte id3v1 tags to id3v2 as is.

That's why a told you to check TagLib::String::isLatin1 and then convert to
unicode from user's local windows encoding.

2012/7/16 Christian Convey <christian.convey@gmail.com>

> So is the following correct regarding libtag, and mp3 files using
> ID3v2.2?  (Please forgive me if I'm getting too off-topic of libtag.
> I'll stop asking whenever you like.)
>
> The "latin1" coding, strictly speaking, refers to ISO 8859-1.
>
> What Microsoft calls "cp1250" and "cp1252" are also eight-bit
> character encodings.  These two encodings are consistent with ISO
> 8859-1, but they are supersets of 8859-1.
>
> When an ID3v2.2 tag has a field (such as Title) encoded in cp1250 or
> cp1252, the field's "type" byte will indicate that the "latin1"
> encoding is being used.  However, there is not enough information in
> the ID3 metadata to know with certainty whether the actual code page
> is cp1250, cp1252, or something else.
>
> Is that it?
>
> Thanks again.
> - Christian
>
> On Sun, Jul 15, 2012 at 12:40 PM, =E1=CE=D4=CF=CE =F3=C5=D2=C7=D5=CE=CF=
=D7 <setosha@gmail.com>
> wrote:
> > windows cp1250 or cp1252 for german
> >
> >
> > 2012/7/15 Christian Convey <christian.convey@gmail.com>
> >>
> >> Thanks very much, that's a big help.
> >>
> >> Do you happen to know if it's common for MP3 tagging software to use
> >> character encodings *other than* the five valid ID3v2 encodings
> >> (latin1, UTF16, UTF16BE, UTF16LE, and UTF8) ?
> >>
> >> I'm trying to anticipate how many different character encodings I'll
> >> have to try out when debugging this MP3 file.
> >>
> >> Thanks,
> >> Christian
> >>
> >> On Sun, Jul 15, 2012 at 11:27 AM, =E1=CE=D4=CF=CE =F3=C5=D2=C7=D5=CE=
=CF=D7 <setosha@gmail.com>
> >> wrote:
> >> > TagLib doesn't convert strings. It read encoding (String::Type) and
> raw
> >> > data
> >> > (ByteArray) from file.
> >> > You can then perform conversion with String::toWString() but before =
it
> >> > contains raw byte data from file.
> >> >
> >> > But I can't find function to get type enum here.
> >> > So you can get raw data with String::data(Type t)
> >> >
> >> >
> >> > 2012/7/15 Christian Convey <christian.convey@gmail.com>
> >> >>
> >> >> Thanks.  But this is actually a podcast run by someone else:
> >> >> http://www.dw.de/dw/0,,2548,00.html
> >> >>
> >> >> So actually fixing the problem is outside of my power.  What I'd li=
ke
> >> >> to do is research the problem with their mp3 files carefully, so th=
at
> >> >> I can tell them precisely with the problem is.
> >> >>
> >> >> (For example, "Your mp3 tagging software is claiming that the text =
is
> >> >> encoded using UTF-8, but it's actually UTF-16.")
> >> >>
> >> >> On Sun, Jul 15, 2012 at 10:52 AM, =E1=CE=D4=CF=CE =F3=C5=D2=C7=D5=
=CE=CF=D7 <setosha@gmail.com>
> >> >> wrote:
> >> >> > Most common id3 encoding problem is using local 8bit win encoding
> in
> >> >> > Latin1
> >> >> > fields. You can use special Latin1 handler or (better works for m=
e)
> >> >> > if
> >> >> > string is in Latin1 convert it to local 8 bit windows encoding.
> >> >> >
> >> >> > 15.07.2012 21:35 =D0=CF=CC=D8=DA=CF=D7=C1=D4=C5=CC=D8 "Christian =
Convey"
> >> >> > <christian.convey@gmail.com> =CE=C1=D0=C9=D3=C1=CC:
> >> >> >>
> >> >> >> I'm new to ID3 tag handling.  Can you tell me if taglib can be
> used
> >> >> >> to
> >> >> >> solve a particular problem?
> >> >> >>
> >> >> >> I have MP3 files frm a podcast, and I suspect that there's an
> >> >> >> inconsistency between the actual encoding of the ID3v2.2 Title
> >> >> >> field,
> >> >> >> and the byte that states what encoding is used for that string.
> >> >> >>
> >> >> >> Can taglib tell me which encoding the file *claims* to have for
> that
> >> >> >> field?
> >> >> >>
> >> >> >> And can I get taglib to give me the bytes in the ID3v2.2 Title
> field
> >> >> >> *without* taglib automatically performing some kind of
> >> >> >> character-encoding translation?
> >> >> >> _______________________________________________
> >> >> >> taglib-devel mailing list
> >> >> >> taglib-devel@kde.org
> >> >> >> https://mail.kde.org/mailman/listinfo/taglib-devel
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > taglib-devel mailing list
> >> >> > taglib-devel@kde.org
> >> >> > https://mail.kde.org/mailman/listinfo/taglib-devel
> >> >> >
> >> >> _______________________________________________
> >> >> taglib-devel mailing list
> >> >> taglib-devel@kde.org
> >> >> https://mail.kde.org/mailman/listinfo/taglib-devel
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > taglib-devel mailing list
> >> > taglib-devel@kde.org
> >> > https://mail.kde.org/mailman/listinfo/taglib-devel
> >> >
> >> _______________________________________________
> >> taglib-devel mailing list
> >> taglib-devel@kde.org
> >> https://mail.kde.org/mailman/listinfo/taglib-devel
> >
> >
> >
> > _______________________________________________
> > taglib-devel mailing list
> > taglib-devel@kde.org
> > https://mail.kde.org/mailman/listinfo/taglib-devel
> >
> _______________________________________________
> taglib-devel mailing list
> taglib-devel@kde.org
> https://mail.kde.org/mailman/listinfo/taglib-devel
>

[Attachment #5 (text/html)]

Yes.<div><br></div><div>Usually id3 tag has both id3v1 and id3v2 \
tags.</div><div><br><div>Id3v1 should have all text fields latin1 \
encoded.</div><div>But by historical reasons it usual (thanks to winamp player) has \
local windows encoding.</div>

<div>Because it was only way to save non latin1 strings.</div><div><br></div><div>so \
tagLib has TagLib::ID3v1::Tag::setStringHandler() function to overwrite id3v1 string \
encoder.</div><div><br></div><div><br></div><div>id3v2 has encoding field and can use \
unicode-16 utf-8.</div>

<div>But I saw my own eyes id3v2 fields saved this way.</div><div>I think some \
software translte id3v1 tags to id3v2 as is. </div><div><br></div><div>That&#39;s why \
a told you to check TagLib::String::isLatin1 and then convert to unicode from \
user&#39;s local windows encoding.</div><div><br></div><div><div \
class="gmail_quote">2012/7/16 Christian Convey <span dir="ltr">&lt;<a \
href="mailto:christian.convey@gmail.com" \
target="_blank">christian.convey@gmail.com</a>&gt;</span><br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">So is the following correct regarding libtag, and mp3 files \
using<br> ID3v2.2? (Please forgive me if I&#39;m getting too off-topic of \
libtag.<br> I&#39;ll stop asking whenever you like.)<br>
<br>
The &quot;latin1&quot; coding, strictly speaking, refers to ISO 8859-1.<br>
<br>
What Microsoft calls &quot;cp1250&quot; and &quot;cp1252&quot; are also eight-bit<br>
character encodings. These two encodings are consistent with ISO<br>
8859-1, but they are supersets of 8859-1.<br>
<br>
When an ID3v2.2 tag has a field (such as Title) encoded in cp1250 or<br>
cp1252, the field&#39;s &quot;type&quot; byte will indicate that the \
&quot;latin1&quot;<br> encoding is being used. However, there is not enough \
information in<br> the ID3 metadata to know with certainty whether the actual code \
page<br> is cp1250, cp1252, or something else.<br>
<br>
Is that it?<br>
<br>
Thanks again.<br>
<span class="HOEnZb"><font color="#888888">- Christian<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Sun, Jul 15, 2012 at 12:40 PM,   &lt;<a \
href="mailto:setosha@gmail.com">setosha@gmail.com</a>&gt; wrote:<br> &gt; windows \
cp1250 or cp1252 for german<br> &gt;<br>
&gt;<br>
&gt; 2012/7/15 Christian Convey &lt;<a \
href="mailto:christian.convey@gmail.com">christian.convey@gmail.com</a>&gt;<br> \
&gt;&gt;<br> &gt;&gt; Thanks very much, that&#39;s a big help.<br>
&gt;&gt;<br>
&gt;&gt; Do you happen to know if it&#39;s common for MP3 tagging software to use<br>
&gt;&gt; character encodings *other than* the five valid ID3v2 encodings<br>
&gt;&gt; (latin1, UTF16, UTF16BE, UTF16LE, and UTF8) ?<br>
&gt;&gt;<br>
&gt;&gt; I&#39;m trying to anticipate how many different character encodings \
I&#39;ll<br> &gt;&gt; have to try out when debugging this MP3 file.<br>
&gt;&gt;<br>
&gt;&gt; Thanks,<br>
&gt;&gt; Christian<br>
&gt;&gt;<br>
&gt;&gt; On Sun, Jul 15, 2012 at 11:27 AM,   &lt;<a \
href="mailto:setosha@gmail.com">setosha@gmail.com</a>&gt;<br> &gt;&gt; wrote:<br>
&gt;&gt; &gt; TagLib doesn&#39;t convert strings. It read encoding (String::Type) and \
raw<br> &gt;&gt; &gt; data<br>
&gt;&gt; &gt; (ByteArray) from file.<br>
&gt;&gt; &gt; You can then perform conversion with String::toWString() but before \
it<br> &gt;&gt; &gt; contains raw byte data from file.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; But I can&#39;t find function to get type enum here.<br>
&gt;&gt; &gt; So you can get raw data with String::data(Type t)<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 2012/7/15 Christian Convey &lt;<a \
href="mailto:christian.convey@gmail.com">christian.convey@gmail.com</a>&gt;<br> \
&gt;&gt; &gt;&gt;<br> &gt;&gt; &gt;&gt; Thanks. But this is actually a podcast run \
by someone else:<br> &gt;&gt; &gt;&gt; <a href="http://www.dw.de/dw/0,,2548,00.html" \
target="_blank">http://www.dw.de/dw/0,,2548,00.html</a><br> &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; So actually fixing the problem is outside of my power. What \
I&#39;d like<br> &gt;&gt; &gt;&gt; to do is research the problem with their mp3 files \
carefully, so that<br> &gt;&gt; &gt;&gt; I can tell them precisely with the problem \
is.<br> &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; (For example, &quot;Your mp3 tagging software is claiming that the \
text is<br> &gt;&gt; &gt;&gt; encoded using UTF-8, but it&#39;s actually \
UTF-16.&quot;)<br> &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; On Sun, Jul 15, 2012 at 10:52 AM,   &lt;<a \
href="mailto:setosha@gmail.com">setosha@gmail.com</a>&gt;<br> &gt;&gt; &gt;&gt; \
wrote:<br> &gt;&gt; &gt;&gt; &gt; Most common id3 encoding problem is using local \
8bit win encoding in<br> &gt;&gt; &gt;&gt; &gt; Latin1<br>
&gt;&gt; &gt;&gt; &gt; fields. You can use special Latin1 handler or (better works \
for me)<br> &gt;&gt; &gt;&gt; &gt; if<br>
&gt;&gt; &gt;&gt; &gt; string is in Latin1 convert it to local 8 bit windows \
encoding.<br> &gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; 15.07.2012 21:35  &quot;Christian Convey&quot;<br>
&gt;&gt; &gt;&gt; &gt; &lt;<a \
href="mailto:christian.convey@gmail.com">christian.convey@gmail.com</a>&gt; \
:<br> &gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I&#39;m new to ID3 tag handling. Can you tell me if \
taglib can be used<br> &gt;&gt; &gt;&gt; &gt;&gt; to<br>
&gt;&gt; &gt;&gt; &gt;&gt; solve a particular problem?<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I have MP3 files frm a podcast, and I suspect that \
there&#39;s an<br> &gt;&gt; &gt;&gt; &gt;&gt; inconsistency between the actual \
encoding of the ID3v2.2 Title<br> &gt;&gt; &gt;&gt; &gt;&gt; field,<br>
&gt;&gt; &gt;&gt; &gt;&gt; and the byte that states what encoding is used for that \
string.<br> &gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Can taglib tell me which encoding the file *claims* to \
have for that<br> &gt;&gt; &gt;&gt; &gt;&gt; field?<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; And can I get taglib to give me the bytes in the ID3v2.2 \
Title field<br> &gt;&gt; &gt;&gt; &gt;&gt; *without* taglib automatically performing \
some kind of<br> &gt;&gt; &gt;&gt; &gt;&gt; character-encoding translation?<br>
&gt;&gt; &gt;&gt; &gt;&gt; _______________________________________________<br>
&gt;&gt; &gt;&gt; &gt;&gt; taglib-devel mailing list<br>
&gt;&gt; &gt;&gt; &gt;&gt; <a \
href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br> &gt;&gt; &gt;&gt; \
&gt;&gt; <a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;&gt; \
&gt;&gt; &gt;<br> &gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; _______________________________________________<br>
&gt;&gt; &gt;&gt; &gt; taglib-devel mailing list<br>
&gt;&gt; &gt;&gt; &gt; <a \
href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br> &gt;&gt; &gt;&gt; \
&gt; <a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;&gt; \
&gt;&gt; &gt;<br> &gt;&gt; &gt;&gt; \
_______________________________________________<br> &gt;&gt; &gt;&gt; taglib-devel \
mailing list<br> &gt;&gt; &gt;&gt; <a \
href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br> &gt;&gt; &gt;&gt; <a \
href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;&gt; \
&gt;<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; _______________________________________________<br>
&gt;&gt; &gt; taglib-devel mailing list<br>
&gt;&gt; &gt; <a href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br>
&gt;&gt; &gt; <a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;&gt; \
&gt;<br> &gt;&gt; _______________________________________________<br>
&gt;&gt; taglib-devel mailing list<br>
&gt;&gt; <a href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br>
&gt;&gt; <a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;<br>
&gt;<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; taglib-devel mailing list<br>
&gt; <a href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br>
&gt; <a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> &gt;<br>
_______________________________________________<br>
taglib-devel mailing list<br>
<a href="mailto:taglib-devel@kde.org">taglib-devel@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/taglib-devel" \
target="_blank">https://mail.kde.org/mailman/listinfo/taglib-devel</a><br> \
</div></div></blockquote></div><br></div></div>



_______________________________________________
taglib-devel mailing list
taglib-devel@kde.org
https://mail.kde.org/mailman/listinfo/taglib-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic