[prev in list] [next in list] [prev in thread] [next in thread]
List: kfm-devel
Subject: Re: Using KIO to retrive HTTP Headers [GSoC student help request]
From: Dawit A <adawit () kde ! org>
Date: 2012-07-13 18:49:20
Message-ID: CALa28R4isYn0S-Zpu+rpquFnGN2E6V2exyY6ZuKSdOOQ56JS8g () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Ahh... My fault. I should not have taken back what I said in point #1. It
is indeed the case that the "HTTP-Headers" meta-data is NOT set when the
redirection signal is emitted by kio_http. However, that is not a problem.
Since internal redirection handling is disabled, you get a result signal
right after the redirection signal. That is where you need to look for the
headers. See the changes in attached files.
On Fri, Jul 13, 2012 at 12:44 PM, Aish Raj Dahal <dahalaishraj@gmail.com>wrote:
> On Fri, Jul 13, 2012 at 8:18 PM, Dawit A <adawit@kde.org> wrote:
> >
> >
> > On Fri, Jul 13, 2012 at 2:41 AM, Dawit A <adawit@kde.org> wrote:
> >>
> >>
> >>
> >> On Thu, Jul 12, 2012 at 11:16 PM, Aish Raj Dahal <
> dahalaishraj@gmail.com>
> >> wrote:
> >>>
> >>> On Thu, Jul 12, 2012 at 9:09 PM, David Faure <faure@kde.org> wrote:
> >>> > On Wednesday 11 July 2012 11:53:51 Aish Raj Dahal wrote:
> >>> >> 1) Case One : When mimetype signal emitted by KIO::TransferJob is
> used
> >>> >>
> >>> >> In order to clarify more, let me take an example file
> >>> >>
> >>> >>
> https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt
> >>> >> p.cpp . The given file uses the mimetype signal (at line 44) to get
> >>> >> the
> >>> >> headers as soon as the mimetype is emitted. The catch is, since we
> do
> >>> >> no want the redirected HTTP headers but instead want the original
> HTTP
> >>> >> headers, setRedirectionHandlingEnabled has been set to false. This
> >>> >> program when run, does not emit the mimetype signal as all, and as a
> >>> >> result the qDebugs at line 51 and 52 are never executed . This
> >>> >> behavior is seen not only for URLs which redirect (like
> >>> >> http://www.example.com ) but also for URLs which have no
> redirection
> >>> >> (like http://www.google.com.np) .
> >>> >
> >>> > This is the part that makes no sense to me ;-)
> >>> >
> >>> > redirectionHandlingEnabled is a KIO::SimpleJob setting, the slave has
> >>> > no idea
> >>> > about that setting. If there's no redirection, then none of the code
> in
> >>> > simplejob that checks for redirectionHandlingEnabled actually runs.
> >>> > So it can't possibly make any difference for a URL without
> redirection.
> >>> >
> >>> > I think your testcase is a bit wrong: http://www.google.com.np
> >>> > redirects. I
> >>> > can see it in the konqueror debug output:
> >>> >
> >>> > KonqRun::slotRedirection: KUrl("http://www.google.com.np") ->
> >>> > KUrl("http://www.google.com.np/")
> >>> >
> >>> > So if you want to test a URL that doesn't redirect, add the trailing
> >>> > slash
> >>> > upfront.
> >>> >
> >>> > If you can confirm this, then we'll be down to: no http headers
> emitted
> >>> > when a
> >>> > redirection happens, which would be a kio_http issue. Dawit?
> >>> >
> >>>
> >>> Thanks a lot for the heads up about the test case :-)
> >>>
> >>> It does indeed run well as expected with
> >>> KUrl("http://www.google.com.np/") as the test URL. However for those
> >>> URLs that do have redirection, no headers were emitted.
> >>>
> >>> Once again, thanks a lot.
> >>
> >>
> >> I will try and clarify some things as much as I can:
> >>
> >> #1. Without some changes in kio_http, you will never see redirection
> >> headers received from HTTP server. This can probably be addressed by
> >> delaying the redirection request until after the HTTP headers have been
> set.
> >> However, the last time I attempted to fix this, it caused a regression.
> See
> >> bug#150904.
> >>
> >>
> >> #2. When a redirection is requested, kio_http will never emit mimeType
> >> signal because it is not yet known. This should be very obvious because
> a
> >> redirection request is the server telling us the actual location of the
> >> content we just requested. As such connecting to KIO's mimeType signal
> for
> >> such circumstances is of no use.
> >>
> >> #3. If you do setRedirectionHandlingEnabled(false) in order to handle
> >> redirections yourself, instead of KIO, then you have to connect to KIO's
> >> redirect signal and retrieve the redirect URL. IOW, you have to do the
> same
> >> thing you are doing in your "output" function from the slot connected
> to the
> >> redirection signals.
> >>
> >> However, I suspect what you want to do is get any and all headers
> >> including those that have to do with redirection requests. If so, then
> we
> >> have to find a way for kio_http to set the HTTP headers before sending
> the
> >> redirection request without causing a regression.
> >
> >
> > Actually I take back what I said in #1 and the last paragraph. I just did
> > some testing and you can indeed retrieve the redirection headers by
> simply
> > connecting to KIO::TransferJob's redirection signal if you disable the
> > internal handling of redirections. All you have to do is check for the
> > "HTTP-Headers" meta-data in the slot connected you connected to the
> > redirection signal.
> >
> > Also, although there are two redirection signals, "redirection" and
> > "permanentRedirection", you need only connect to the "redirection" unless
> > you want to keep track of permanent redirections.
> >
>
>
> Thank you very much for looking into the issue.
>
> Now, as you've suggested using KIO::TransferJob's redirection signal
> to connect to a slot and then query for "HTTP-Headers" metadata , I've
> faced an issue.
> Before I get to the issue here is the pastebin of what I'd doing to
> test it http://paste.kde.org/517220/42196913/ .
> The issue is that although I am able to verify the redirection URL (if
> the site was redirecting of course), querying the KIO::Job for
> "HTTP-Headers" metadata keys left me with a QString("") for result.
> I've tried testing several URLs and had the same issue (which is quite
> strange). I've also tried by setting as well as unsetting the
> PropagateHttpHeader metadeta keys, which again had no effect on the
> result.
>
> I hope you'll look into this matter and provide your valuable guidance.
>
> Thanks once again.
>
> Regards,
> Aish Raj Dahal
>
[Attachment #5 (text/html)]
Ahh... My fault. I should not have taken back what I said in point #1. It is indeed \
the case that the "HTTP-Headers" meta-data is NOT set when the redirection \
signal is emitted by kio_http. However, that is not a problem. Since internal \
redirection handling is disabled, you get a result signal right after the redirection \
signal. That is where you need to look for the headers. See the changes in attached \
files.<div>
<div><br></div><div><br></div><div><div><div><div><div><div class="gmail_quote">On \
Fri, Jul 13, 2012 at 12:44 PM, Aish Raj Dahal <span dir="ltr"><<a \
href="mailto:dahalaishraj@gmail.com" \
target="_blank">dahalaishraj@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Fri, Jul 13, 2012 at \
8:18 PM, Dawit A <<a href="mailto:adawit@kde.org">adawit@kde.org</a>> \
wrote:<br>
><br>
><br>
> On Fri, Jul 13, 2012 at 2:41 AM, Dawit A <<a \
href="mailto:adawit@kde.org">adawit@kde.org</a>> wrote:<br> >><br>
>><br>
>><br>
>> On Thu, Jul 12, 2012 at 11:16 PM, Aish Raj Dahal <<a \
href="mailto:dahalaishraj@gmail.com">dahalaishraj@gmail.com</a>><br> >> \
wrote:<br> >>><br>
>>> On Thu, Jul 12, 2012 at 9:09 PM, David Faure <<a \
href="mailto:faure@kde.org">faure@kde.org</a>> wrote:<br> >>> > On \
Wednesday 11 July 2012 11:53:51 Aish Raj Dahal wrote:<br> >>> >> 1) \
Case One : When mimetype signal emitted by KIO::TransferJob is used<br> >>> \
>><br> >>> >> In order to clarify more, let me take an example \
file<br> >>> >><br>
>>> >> <a \
href="https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt" \
target="_blank">https://github.com/ardahal/kio-learner/blob/ard-dev/metalinkHttp/metalinkHtt</a><br>
>>> >> p.cpp . The given file uses the mimetype signal (at line 44) \
to get<br> >>> >> the<br>
>>> >> headers as soon as the mimetype is emitted. The catch is, since \
we do<br> >>> >> no want the redirected HTTP headers but instead want \
the original HTTP<br> >>> >> headers, setRedirectionHandlingEnabled \
has been set to false. This<br> >>> >> program when run, does not emit \
the mimetype signal as all, and as a<br> >>> >> result the qDebugs at \
line 51 and 52 are never executed . This<br> >>> >> behavior is seen \
not only for URLs which redirect (like<br> >>> >> <a \
href="http://www.example.com" target="_blank">http://www.example.com</a> ) but also \
for URLs which have no redirection<br> >>> >> (like <a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a>) .<br> \
>>> ><br> >>> > This is the part that makes no sense to me \
;-)<br> >>> ><br>
>>> > redirectionHandlingEnabled is a KIO::SimpleJob setting, the slave \
has<br> >>> > no idea<br>
>>> > about that setting. If there's no redirection, then none of the \
code in<br> >>> > simplejob that checks for redirectionHandlingEnabled \
actually runs.<br> >>> > So it can't possibly make any difference for \
a URL without redirection.<br> >>> ><br>
>>> > I think your testcase is a bit wrong: <a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a><br> \
>>> > redirects. I<br> >>> > can see it in the konqueror \
debug output:<br> >>> ><br>
>>> > KonqRun::slotRedirection: KUrl("<a \
href="http://www.google.com.np" target="_blank">http://www.google.com.np</a>") \
-><br> >>> > KUrl("<a href="http://www.google.com.np/" \
target="_blank">http://www.google.com.np/</a>")<br> >>> ><br>
>>> > So if you want to test a URL that doesn't redirect, add the \
trailing<br> >>> > slash<br>
>>> > upfront.<br>
>>> ><br>
>>> > If you can confirm this, then we'll be down to: no http headers \
emitted<br> >>> > when a<br>
>>> > redirection happens, which would be a kio_http issue. Dawit?<br>
>>> ><br>
>>><br>
>>> Thanks a lot for the heads up about the test case :-)<br>
>>><br>
>>> It does indeed run well as expected with<br>
>>> KUrl("<a href="http://www.google.com.np/" \
target="_blank">http://www.google.com.np/</a>") as the test URL. However for \
those<br> >>> URLs that do have redirection, no headers were emitted.<br>
>>><br>
>>> Once again, thanks a lot.<br>
>><br>
>><br>
>> I will try and clarify some things as much as I can:<br>
>><br>
>> #1. Without some changes in kio_http, you will never see redirection<br>
>> headers received from HTTP server. This can probably be addressed by<br>
>> delaying the redirection request until after the HTTP headers have been \
set.<br> >> However, the last time I attempted to fix this, it caused a \
regression. See<br> >> bug#150904.<br>
>><br>
>><br>
>> #2. When a redirection is requested, kio_http will never emit mimeType<br>
>> signal because it is not yet known. This should be very obvious because \
a<br> >> redirection request is the server telling us the actual location of \
the<br> >> content we just requested. As such connecting to KIO's mimeType \
signal for<br> >> such circumstances is of no use.<br>
>><br>
>> #3. If you do setRedirectionHandlingEnabled(false) in order to handle<br>
>> redirections yourself, instead of KIO, then you have to connect to \
KIO's<br> >> redirect signal and retrieve the redirect URL. IOW, you have \
to do the same<br> >> thing you are doing in your "output" function \
from the slot connected to the<br> >> redirection signals.<br>
>><br>
>> However, I suspect what you want to do is get any and all headers<br>
>> including those that have to do with redirection requests. If so, then \
we<br> >> have to find a way for kio_http to set the HTTP headers before \
sending the<br> >> redirection request without causing a regression.<br>
><br>
><br>
> Actually I take back what I said in #1 and the last paragraph. I just did<br>
> some testing and you can indeed retrieve the redirection headers by simply<br>
> connecting to KIO::TransferJob's redirection signal if you disable the<br>
> internal handling of redirections. All you have to do is check for the<br>
> "HTTP-Headers" meta-data in the slot connected you connected to \
the<br> > redirection signal.<br>
><br>
> Also, although there are two redirection signals, "redirection" \
and<br> > "permanentRedirection", you need only connect to the \
"redirection" unless<br> > you want to keep track of permanent \
redirections.<br> ><br>
<br>
<br>
</div></div>Thank you very much for looking into the issue.<br>
<br>
Now, as you've suggested using KIO::TransferJob's redirection signal<br>
to connect to a slot and then query for "HTTP-Headers" metadata , \
I've<br> faced an issue.<br>
Before I get to the issue here is the pastebin of what I'd doing to<br>
test it <a href="http://paste.kde.org/517220/42196913/" \
target="_blank">http://paste.kde.org/517220/42196913/</a> .<br> The issue is that \
although I am able to verify the redirection URL (if<br> the site was redirecting of \
course), querying the KIO::Job for<br> "HTTP-Headers" metadata keys left me \
with a QString("") for result.<br> I've tried testing several URLs and \
had the same issue (which is quite<br> strange). I've also tried by setting as \
well as unsetting the<br> PropagateHttpHeader metadeta keys, which again had no \
effect on the<br> result.<br>
<br>
I hope you'll look into this matter and provide your valuable guidance.<br>
<br>
Thanks once again.<br>
<br>
Regards,<br>
Aish Raj Dahal<br>
</blockquote></div><br></div></div></div></div></div></div>
--f46d0446312c1d7be204c4ba8cf4--
["metalinkHttp.h" (text/x-chdr)]
/*
Copyright 2012 Aish Raj Dahal < dahalaishraj at gmail.com >
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef showHttp_H
#define showHttp_H
#include <QtCore/QObject>
#include <KIO/AccessManager>
#include <KIO/Job>
#include <KIO/SimpleJob>
#include <KIO/Scheduler>
#include <QtCore/QEventLoop>
class metalinkHttp : public QObject
{
Q_OBJECT
public:
metalinkHttp(const KUrl&);
~metalinkHttp();
bool isMetalinkHttp();
private slots:
void slotHeaderResult(KJob* kjob);
void checkMetalinkHttp();
void detectMime(KIO::Job * job, const QString & type);
void slotRedirection(KIO::Job*, const KUrl&);
private:
KUrl m_Url;
KUrl m_redirectionUrl;
bool m_MetalinkHSatus;
QEventLoop m_loop;
QMultiMap<QString, QString> m_headerInfo;
void parseHeaders(const QString&);
void setMetalinkHSatus();
};
#endif // showHttp_H
["metalinkHttp.cpp" (text/x-c++src)]
/*
Copyright 2012 Aish Raj Dahal < dahalaishraj at gmail.com >
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "metalinkHttp.h"
#include <kdebug.h>
#include <QtCore/QString>
metalinkHttp::metalinkHttp(const KUrl& Url)
: m_Url(Url),
m_MetalinkHSatus(false)
{
checkMetalinkHttp();
}
metalinkHttp::~metalinkHttp()
{
}
void metalinkHttp::checkMetalinkHttp()
{
KIO::TransferJob *job;
job = KIO::get(m_Url);
job->addMetaData("PropagateHttpHeader", "true");
job->setRedirectionHandlingEnabled(false);
connect(job, SIGNAL(result(KJob*)), this, SLOT(slotHeaderResult(KJob*))); // \
Finished connect(job, SIGNAL(redirection(KIO::Job*,KUrl)), this, \
SLOT(slotRedirection(KIO::Job*,KUrl))); // Redirection \
connect(job,SIGNAL(mimetype(KIO::Job*,QString)),this,SLOT(detectMime(KIO::Job*,QString))); \
// Mime detection. qDebug() << " Verifying Metalink/HTTP Status" ;
m_loop.exec();
}
void metalinkHttp::detectMime(KIO::Job* job, const QString& type)
{
qDebug() << type ;
qDebug() << "Mime Type signal recieved" ;
m_loop.exit();
}
void metalinkHttp::slotHeaderResult(KJob* kjob)
{
KIO::Job* job = qobject_cast<KIO::Job*>(kjob);
const QString httpHeaders = job ? job->queryMetaData("HTTP-Headers") : QString();
parseHeaders(httpHeaders);
setMetalinkHSatus();
// Handle the redirection... (Comment out if not desired)
if (m_redirectionUrl.isValid()) {
m_Url = m_redirectionUrl;
m_redirectionUrl = KUrl();
checkMetalinkHttp();
}
}
void metalinkHttp::slotRedirection(KIO::Job* job, const KUrl url)
{
Q_UNUSED(job)
m_redirectionUrl = url;
}
bool metalinkHttp::isMetalinkHttp()
{
foreach(QString mapval, m_headerInfo) {
qDebug() << mapval ;
}
return m_MetalinkHSatus;
}
void metalinkHttp::parseHeaders(const QString &httpHeader)
{
QString trimedHeader = httpHeader.mid(httpHeader.indexOf('\n') + 1).trimmed();
foreach(QString line, trimedHeader.split('\n')) {
int colon = line.indexOf(':');
QString headerName = line.left(colon).trimmed();
QString headerValue = line.mid(colon + 1).trimmed();
int lessthan_pos = line.indexOf('<');
if (lessthan_pos >= 0) {
headerValue = line.mid(lessthan_pos + 1).trimmed();
}
m_headerInfo.insertMulti(headerName, headerValue);
}
}
void metalinkHttp::setMetalinkHSatus()
{
bool linkStatus, digestStatus;
linkStatus = digestStatus = false;
if (m_headerInfo.contains("link")) {
QList<QString> linkValues = m_headerInfo.values("link");
foreach(QString linkVal, linkValues) {
if (linkVal.contains("rel=duplicate")) {
linkStatus = true;
break;
}
}
}
if (m_headerInfo.contains("digest")) {
QList<QString> digestValues = m_headerInfo.values("digest");
foreach(QString digestVal, digestValues) {
if (digestVal.contains("sha-256", Qt::CaseInsensitive)) {
digestStatus = true;
break;
}
}
}
if ((linkStatus) && (digestStatus)) {
m_MetalinkHSatus = true;
}
}
#include "metalinkHttp.moc"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic