[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: \n before <html
From:       David Faure <david.faure () insa-lyon ! fr>
Date:       1999-07-26 8:31:39
[Download RAW message or body]

On Sat, Jul 24, 1999 at 06:32:35PM +0200, Stephan Goetter wrote:
> Hi,
> 
> I try to fix a bug in KMimeMagic.cpp before kde-1.1.2.
> I hope you can help me.
> 
> This is my test file.
> --------
> \n
> <html><head>
> </head><body></body></html>
> --------
> 
> Because the first character is a newline, text/html is not recognized.
> The newline would be "eaten up" (set to \0) in mconvert()
> case STRING:
> /* Null terminate and eat the return */
> p->s[sizeof(p->s) - 1] = '\0';
> if ((rt = strchr(p->s, '\n')) != NULL)
> *rt = '\0';
> return 1;
> 
> Because "\0<html" isn't the same like "<html" mcheck() fails :(

Yes I came accross this before.
But could some HTML expert out there comment on this ? Is it valid HTML
if the document doesn't start with <html> first thing ?
Your analysis on why it fails is right, but see below for more.

> --------
> \n
> <html>
> <head>
> </head><body></body></html>
> --------
> 
> This file works, not because match(), but in a method somewhere called \
> from finishResults() or deeper, text/html is set. Don't know why.

Because there is another check, when the file starts with nothing known.
This second check uses the names[] array to check for some known keywords
in the file : <html>, <head>, <title>, <h1>, <!--, and <!DOCTYPE HTML.
Then, with some ponderation given in the types[] array, ascmagic() \
determines  the mimetype from that. I don't see why the additional linefeed \
between <html> and <head> makes a difference, though. This might be the bug \
you should concentrate on...

> I think the "\n" and "\r\n" should be removed before calling mcheck().
> Perhaps in mconvert() (really eaten up ?)
> No data of type string in magic does contain any \r or \n,
> so I think it's ok.
> 
> What do you think ?
No idea, should be tested.

> BTW: The code in KMimeMagic is really a dirty hack. 
I don't think so. It comes from Apache (and is not far from the code in \
"file"). But I agree, it's not 100% perfect though.

> And the same code is in libkio (2.0) :(
Sure.

-- 
David FAURE
david.faure@insa-lyon.fr, faure@kde.org
http://www.insa-lyon.fr/People/AEDI/dfaure/index.html 
KDE, Making The Future of Computing Available Today


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic