[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    Re: Using KIO to retrive HTTP Headers [GSoC student help request]
From:       Dawit A <adawit () kde ! org>
Date:       2012-07-13 18:49:20
Message-ID: CALa28R4isYn0S-Zpu+rpquFnGN2E6V2exyY6ZuKSdOOQ56JS8g () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Ahh... My fault. I should not have taken back what I said in point #1. It
is indeed the case that the "HTTP-Headers" meta-data is NOT set when the
redirection signal is emitted by kio_http. However, that is not a problem.
Since internal redirection handling is disabled, you get a result signal
right after the redirection signal. That is where you need to look for the
headers. See the changes in attached files.


On Fri, Jul 13, 2012 at 12:44 PM, Aish Raj Dahal <dahalaishraj@gmail.com>wrote:

> On Fri, Jul 13, 2012 at 8:18 PM, Dawit A <adawit@kde.org> wrote:
> >
> >
> > On Fri, Jul 13, 2012 at 2:41 AM, Dawit A <adawit@kde.org> wrote:
> >>
> >>
> >>
> >> On Thu, Jul 12, 2012 at 11:16 PM, Aish Raj Dahal <
> dahalaishraj@gmail.com>
> >> wrote:
> >>>
> >>> On Thu, Jul 12, 2012 at 9:09 PM, David Faure <faure@kde.org> wrote:
> >>> > On Wednesday 11 July 2012 11:53:51 Aish Raj Dahal wrote:
> >>> >> 1) Case One : When mimetype signal emitted by KIO::TransferJob is
> used
> >>> >>
> >>> >> In order to clarify more, let me take an example file
> >>> >>
> >>> >>
> https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt
> >>> >> p.cpp . The given file uses the mimetype signal (at line 44) to get
> >>> >> the
> >>> >> headers as soon as the mimetype is emitted. The catch is, since we
> do
> >>> >> no want the redirected HTTP headers but instead want the original
> HTTP
> >>> >> headers, setRedirectionHandlingEnabled has been set to false. This
> >>> >> program when run, does not emit the mimetype signal as all, and as a
> >>> >> result the qDebugs at line 51 and 52 are never executed . This
> >>> >> behavior is seen not only for URLs which redirect (like
> >>> >> http://www.example.com ) but also for URLs which have no
> redirection
> >>> >> (like http://www.google.com.np) .
> >>> >
> >>> > This is the part that makes no sense to me ;-)
> >>> >
> >>> > redirectionHandlingEnabled is a KIO::SimpleJob setting, the slave has
> >>> > no idea
> >>> > about that setting. If there's no redirection, then none of the code
> in
> >>> > simplejob that checks for redirectionHandlingEnabled actually runs.
> >>> > So it can't possibly make any difference for a URL without
> redirection.
> >>> >
> >>> > I think your testcase is a bit wrong: http://www.google.com.np
> >>> > redirects. I
> >>> > can see it in the konqueror debug output:
> >>> >
> >>> >  KonqRun::slotRedirection: KUrl("http://www.google.com.np") ->
> >>> > KUrl("http://www.google.com.np/")
> >>> >
> >>> > So if you want to test a URL that doesn't redirect, add the trailing
> >>> > slash
> >>> > upfront.
> >>> >
> >>> > If you can confirm this, then we'll be down to: no http headers
> emitted
> >>> > when a
> >>> > redirection happens, which would be a kio_http issue. Dawit?
> >>> >
> >>>
> >>> Thanks a lot for the heads up about the test case :-)
> >>>
> >>> It does indeed run well as expected with
> >>> KUrl("http://www.google.com.np/") as the test URL. However for those
> >>> URLs that do have redirection, no headers were emitted.
> >>>
> >>> Once again, thanks a lot.
> >>
> >>
> >> I will try and clarify some things as much as I can:
> >>
> >> #1. Without some changes in kio_http, you will never see redirection
> >> headers received from HTTP server. This can probably be addressed by
> >> delaying the redirection request until after the HTTP headers have been
> set.
> >> However, the last time I attempted to fix this, it caused a regression.
> See
> >> bug#150904.
> >>
> >>
> >> #2. When a redirection is requested, kio_http will never emit mimeType
> >> signal because it is not yet known. This should be very obvious because
> a
> >> redirection request is the server telling us the actual location of the
> >> content we just requested. As such connecting to KIO's mimeType signal
> for
> >> such circumstances is of no use.
> >>
> >> #3. If you do setRedirectionHandlingEnabled(false) in order to handle
> >> redirections yourself, instead of KIO, then you have to connect to KIO's
> >> redirect signal and retrieve the redirect URL. IOW, you have to do the
> same
> >> thing you are doing in your "output" function from the slot connected
> to the
> >> redirection signals.
> >>
> >> However, I suspect what you want to do is get any and all headers
> >> including those that have to do with redirection requests. If so, then
> we
> >> have to find a way for kio_http to set the HTTP headers before sending
> the
> >> redirection request without causing a regression.
> >
> >
> > Actually I take back what I said in #1 and the last paragraph. I just did
> > some testing and you can indeed retrieve the redirection headers by
> simply
> > connecting to KIO::TransferJob's redirection signal if you disable the
> > internal handling of redirections. All you have to do is check for the
> > "HTTP-Headers" meta-data in the slot connected you connected to the
> > redirection signal.
> >
> > Also, although there are two redirection signals, "redirection" and
> > "permanentRedirection", you need only connect to the "redirection" unless
> > you want to keep track of permanent redirections.
> >
>
>
> Thank you very much for looking into the issue.
>
> Now, as you've suggested using KIO::TransferJob's redirection signal
> to connect to a slot and then query for "HTTP-Headers" metadata , I've
> faced an issue.
> Before I get to the issue here is the pastebin of what I'd doing to
> test it http://paste.kde.org/517220/42196913/ .
> The issue is that although I am able to verify the redirection URL (if
> the site was redirecting of course), querying the KIO::Job for
> "HTTP-Headers" metadata keys left me with a QString("") for result.
> I've tried testing several URLs and had the same issue (which is quite
> strange). I've also tried by setting as well as unsetting the
> PropagateHttpHeader metadeta keys, which again had no effect on the
> result.
>
> I hope you'll look into this matter and provide your valuable guidance.
>
> Thanks once again.
>
> Regards,
> Aish Raj Dahal
>

[Attachment #5 (text/html)]

Ahh... My fault. I should not have taken back what I said in point #1. It is indeed \
the case that the &quot;HTTP-Headers&quot; meta-data is NOT set when the redirection \
signal is emitted by kio_http. However, that is not a problem. Since internal \
redirection handling is disabled, you get a result signal right after the redirection \
signal. That is where you need to look for the headers. See the changes in attached \
files.<div>

<div><br></div><div><br></div><div><div><div><div><div><div class="gmail_quote">On \
Fri, Jul 13, 2012 at 12:44 PM, Aish Raj Dahal <span dir="ltr">&lt;<a \
href="mailto:dahalaishraj@gmail.com" \
target="_blank">dahalaishraj@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Fri, Jul 13, 2012 at \
8:18 PM, Dawit A &lt;<a href="mailto:adawit@kde.org">adawit@kde.org</a>&gt; \
wrote:<br>


&gt;<br>
&gt;<br>
&gt; On Fri, Jul 13, 2012 at 2:41 AM, Dawit A &lt;<a \
href="mailto:adawit@kde.org">adawit@kde.org</a>&gt; wrote:<br> &gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Thu, Jul 12, 2012 at 11:16 PM, Aish Raj Dahal &lt;<a \
href="mailto:dahalaishraj@gmail.com">dahalaishraj@gmail.com</a>&gt;<br> &gt;&gt; \
wrote:<br> &gt;&gt;&gt;<br>
&gt;&gt;&gt; On Thu, Jul 12, 2012 at 9:09 PM, David Faure &lt;<a \
href="mailto:faure@kde.org">faure@kde.org</a>&gt; wrote:<br> &gt;&gt;&gt; &gt; On \
Wednesday 11 July 2012 11:53:51 Aish Raj Dahal wrote:<br> &gt;&gt;&gt; &gt;&gt; 1) \
Case One : When mimetype signal emitted by KIO::TransferJob is used<br> &gt;&gt;&gt; \
&gt;&gt;<br> &gt;&gt;&gt; &gt;&gt; In order to clarify more, let me take an example \
file<br> &gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt; &gt;&gt; <a \
href="https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt" \
target="_blank">https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt</a><br>
 &gt;&gt;&gt; &gt;&gt; p.cpp . The given file uses the mimetype signal (at line 44) \
to get<br> &gt;&gt;&gt; &gt;&gt; the<br>
&gt;&gt;&gt; &gt;&gt; headers as soon as the mimetype is emitted. The catch is, since \
we do<br> &gt;&gt;&gt; &gt;&gt; no want the redirected HTTP headers but instead want \
the original HTTP<br> &gt;&gt;&gt; &gt;&gt; headers, setRedirectionHandlingEnabled \
has been set to false. This<br> &gt;&gt;&gt; &gt;&gt; program when run, does not emit \
the mimetype signal as all, and as a<br> &gt;&gt;&gt; &gt;&gt; result the qDebugs at \
line 51 and 52 are never executed . This<br> &gt;&gt;&gt; &gt;&gt; behavior is seen \
not only for URLs which redirect (like<br> &gt;&gt;&gt; &gt;&gt; <a \
href="http://www.example.com" target="_blank">http://www.example.com</a> ) but also \
for URLs which have no redirection<br> &gt;&gt;&gt; &gt;&gt; (like <a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a>) .<br> \
&gt;&gt;&gt; &gt;<br> &gt;&gt;&gt; &gt; This is the part that makes no sense to me \
;-)<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; redirectionHandlingEnabled is a KIO::SimpleJob setting, the slave \
has<br> &gt;&gt;&gt; &gt; no idea<br>
&gt;&gt;&gt; &gt; about that setting. If there&#39;s no redirection, then none of the \
code in<br> &gt;&gt;&gt; &gt; simplejob that checks for redirectionHandlingEnabled \
actually runs.<br> &gt;&gt;&gt; &gt; So it can&#39;t possibly make any difference for \
a URL without redirection.<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; I think your testcase is a bit wrong: <a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a><br> \
&gt;&gt;&gt; &gt; redirects. I<br> &gt;&gt;&gt; &gt; can see it in the konqueror \
debug output:<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt;   KonqRun::slotRedirection: KUrl(&quot;<a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a>&quot;) \
-&gt;<br> &gt;&gt;&gt; &gt; KUrl(&quot;<a href="http://www.google.com.np/" \
target="_blank">http://www.google.com.np/</a>&quot;)<br> &gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; So if you want to test a URL that doesn&#39;t redirect, add the \
trailing<br> &gt;&gt;&gt; &gt; slash<br>
&gt;&gt;&gt; &gt; upfront.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; If you can confirm this, then we&#39;ll be down to: no http headers \
emitted<br> &gt;&gt;&gt; &gt; when a<br>
&gt;&gt;&gt; &gt; redirection happens, which would be a kio_http issue. Dawit?<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Thanks a lot for the heads up about the test case :-)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; It does indeed run well as expected with<br>
&gt;&gt;&gt; KUrl(&quot;<a href="http://www.google.com.np/" \
target="_blank">http://www.google.com.np/</a>&quot;) as the test URL. However for \
those<br> &gt;&gt;&gt; URLs that do have redirection, no headers were emitted.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Once again, thanks a lot.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; I will try and clarify some things as much as I can:<br>
&gt;&gt;<br>
&gt;&gt; #1. Without some changes in kio_http, you will never see redirection<br>
&gt;&gt; headers received from HTTP server. This can probably be addressed by<br>
&gt;&gt; delaying the redirection request until after the HTTP headers have been \
set.<br> &gt;&gt; However, the last time I attempted to fix this, it caused a \
regression. See<br> &gt;&gt; bug#150904.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; #2. When a redirection is requested, kio_http will never emit mimeType<br>
&gt;&gt; signal because it is not yet known. This should be very obvious because \
a<br> &gt;&gt; redirection request is the server telling us the actual location of \
the<br> &gt;&gt; content we just requested. As such connecting to KIO&#39;s mimeType \
signal for<br> &gt;&gt; such circumstances is of no use.<br>
&gt;&gt;<br>
&gt;&gt; #3. If you do setRedirectionHandlingEnabled(false) in order to handle<br>
&gt;&gt; redirections yourself, instead of KIO, then you have to connect to \
KIO&#39;s<br> &gt;&gt; redirect signal and retrieve the redirect URL. IOW, you have \
to do the same<br> &gt;&gt; thing you are doing in your &quot;output&quot; function \
from the slot connected to the<br> &gt;&gt; redirection signals.<br>
&gt;&gt;<br>
&gt;&gt; However, I suspect what you want to do is get any and all headers<br>
&gt;&gt; including those that have to do with redirection requests. If so, then \
we<br> &gt;&gt; have to find a way for kio_http to set the HTTP headers before \
sending the<br> &gt;&gt; redirection request without causing a regression.<br>
&gt;<br>
&gt;<br>
&gt; Actually I take back what I said in #1 and the last paragraph. I just did<br>
&gt; some testing and you can indeed retrieve the redirection headers by simply<br>
&gt; connecting to KIO::TransferJob&#39;s redirection signal if you disable the<br>
&gt; internal handling of redirections. All you have to do is check for the<br>
&gt; &quot;HTTP-Headers&quot; meta-data in the slot connected you connected to \
the<br> &gt; redirection signal.<br>
&gt;<br>
&gt; Also, although there are two redirection signals, &quot;redirection&quot; \
and<br> &gt; &quot;permanentRedirection&quot;, you need only connect to the \
&quot;redirection&quot; unless<br> &gt; you want to keep track of permanent \
redirections.<br> &gt;<br>
<br>
<br>
</div></div>Thank you very much for looking into the issue.<br>
<br>
Now, as you&#39;ve suggested using KIO::TransferJob&#39;s redirection signal<br>
to connect to a slot and then query for &quot;HTTP-Headers&quot; metadata , \
I&#39;ve<br> faced an issue.<br>
Before I get to the issue here is the pastebin of what I&#39;d doing to<br>
test it <a href="http://paste.kde.org/517220/42196913/" \
target="_blank">http://paste.kde.org/517220/42196913/</a> .<br> The issue is that \
although I am able to verify the redirection URL (if<br> the site was redirecting of \
course), querying the KIO::Job for<br> &quot;HTTP-Headers&quot; metadata keys left me \
with a QString(&quot;&quot;) for result.<br> I&#39;ve tried testing several URLs and \
had the same issue (which is quite<br> strange). I&#39;ve also tried by setting as \
well as unsetting the<br> PropagateHttpHeader metadeta keys, which again had no \
effect on the<br> result.<br>
<br>
I hope you&#39;ll look into this matter and provide your valuable guidance.<br>
<br>
Thanks once again.<br>
<br>
Regards,<br>
Aish Raj Dahal<br>
</blockquote></div><br></div></div></div></div></div></div>

--f46d0446312c1d7be204c4ba8cf4--


["metalinkHttp.h" (text/x-chdr)]

/*
Copyright 2012  Aish Raj Dahal < dahalaishraj at gmail.com >

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

*/

#ifndef showHttp_H
#define showHttp_H

#include <QtCore/QObject>
#include <KIO/AccessManager>
#include <KIO/Job>
#include <KIO/SimpleJob>
#include <KIO/Scheduler>
#include <QtCore/QEventLoop>

class metalinkHttp : public QObject
{
    Q_OBJECT
public:
    metalinkHttp(const KUrl&);
    ~metalinkHttp();
    bool isMetalinkHttp();

private slots:
    void slotHeaderResult(KJob* kjob);
    void checkMetalinkHttp();
    void detectMime(KIO::Job *  job, const QString &  type);
    void slotRedirection(KIO::Job*, const KUrl&);

private:
    KUrl m_Url;
    KUrl m_redirectionUrl;
    bool m_MetalinkHSatus;
    QEventLoop m_loop;
    QMultiMap<QString, QString> m_headerInfo;
    void parseHeaders(const QString&);
    void setMetalinkHSatus();
};



#endif // showHttp_H

["metalinkHttp.cpp" (text/x-c++src)]

/*
Copyright 2012  Aish Raj Dahal < dahalaishraj at gmail.com >

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

*/

#include "metalinkHttp.h"
#include <kdebug.h>
#include <QtCore/QString>

metalinkHttp::metalinkHttp(const KUrl& Url)
    : m_Url(Url),
      m_MetalinkHSatus(false)

{
    checkMetalinkHttp();
}

metalinkHttp::~metalinkHttp()
{

}

void metalinkHttp::checkMetalinkHttp()
{

    KIO::TransferJob *job;
    job = KIO::get(m_Url);
    job->addMetaData("PropagateHttpHeader", "true");
    job->setRedirectionHandlingEnabled(false);
    connect(job, SIGNAL(result(KJob*)), this, SLOT(slotHeaderResult(KJob*)));  // \
Finished  connect(job, SIGNAL(redirection(KIO::Job*,KUrl)), this, \
SLOT(slotRedirection(KIO::Job*,KUrl))); // Redirection  \
connect(job,SIGNAL(mimetype(KIO::Job*,QString)),this,SLOT(detectMime(KIO::Job*,QString))); \
// Mime detection.  qDebug() << " Verifying Metalink/HTTP Status" ;
    m_loop.exec();
}

void metalinkHttp::detectMime(KIO::Job* job, const QString& type)
{
  qDebug() << type ;
  qDebug() << "Mime Type signal recieved" ;
  m_loop.exit();
}

void metalinkHttp::slotHeaderResult(KJob* kjob)
{
    KIO::Job* job = qobject_cast<KIO::Job*>(kjob);
    const QString httpHeaders = job ? job->queryMetaData("HTTP-Headers") : QString();
    parseHeaders(httpHeaders);
    setMetalinkHSatus();

    // Handle the redirection... (Comment out if not desired)
    if (m_redirectionUrl.isValid()) {       
       m_Url = m_redirectionUrl;
       m_redirectionUrl = KUrl();
       checkMetalinkHttp();
    }
}

void metalinkHttp::slotRedirection(KIO::Job* job, const KUrl url)
{
    Q_UNUSED(job)
    m_redirectionUrl = url;    
}

bool metalinkHttp::isMetalinkHttp()
{
    foreach(QString mapval, m_headerInfo) {
        qDebug() << mapval ;
    }

    return m_MetalinkHSatus;
}

void metalinkHttp::parseHeaders(const QString &httpHeader)
{
    QString trimedHeader = httpHeader.mid(httpHeader.indexOf('\n') + 1).trimmed();

    foreach(QString line, trimedHeader.split('\n')) {
        int colon = line.indexOf(':');
        QString headerName = line.left(colon).trimmed();
        QString headerValue = line.mid(colon + 1).trimmed();
        int lessthan_pos = line.indexOf('<');
        if (lessthan_pos >= 0) {
            headerValue = line.mid(lessthan_pos + 1).trimmed();
        }
        m_headerInfo.insertMulti(headerName, headerValue);
    }
}

void metalinkHttp::setMetalinkHSatus()
{
    bool linkStatus, digestStatus;
    linkStatus = digestStatus = false;
    if (m_headerInfo.contains("link")) {
        QList<QString> linkValues = m_headerInfo.values("link");

        foreach(QString linkVal, linkValues) {
            if (linkVal.contains("rel=duplicate")) {
                linkStatus = true;
                break;
            }
        }
    }

    if (m_headerInfo.contains("digest")) {
        QList<QString> digestValues = m_headerInfo.values("digest");

        foreach(QString digestVal, digestValues) {
            if (digestVal.contains("sha-256", Qt::CaseInsensitive)) {
                digestStatus = true;
                break;
            }
        }
    }

    if ((linkStatus) && (digestStatus)) {
        m_MetalinkHSatus = true;
    }

}

#include "metalinkHttp.moc"



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic