[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-core-devel
Subject:    Re: KIO: Mass Copy of Files from Different Sources to Different
From:       "Dawit A." <adawit () kde ! org>
Date:       2009-08-27 2:37:40
Message-ID: 200908262237.40198.adawit () kde ! org
[Download RAW message or body]

Here is a modified version of the original patch you posted with the fix for 
::doJob I suggested below...

On Tuesday 25 August 2009 10:10:46 Dawit A. wrote:
> After going back and refamiliarizing myself with the scheduler code, I have
> no objection in principle to the patch. Actually, I completely
> misunderstood the whole patch as a changing the scheduler behavior when it
> did not. The scheduler indeed allocates N ioslaves per protocol where N is
> obtained from the "maxInstances" property in the *.protocol files. For
> example, for the protocols in kdelibs this value is set to "3" for http,
> "4" for file and "2" for ftp.
>
> The only issue of concern that needs to be investigated with this change
> AFAIC is one of bottleneck at the scheduler. How does this impact multiple
> applications making requests to distinct remote servers since the limits
> are per protocol and not host ? Is limiting the maximum number of  the file
> ioslave instances to "4" appropriate ? What about http etc etc ? In other
> words we have to test some use cases and if needed adjust these limits
> accordingly. One additional note... The documentation for
> KProtocolInfo::maxSlaves(...), the function the scheduler uses to find out
> the number of ioslaves it can spawn, states that the limits set by this
> value are 'soft' limits and can be exceeded under certain circumstances,
> but I cannot figure out under what circumstance that might happen to
> scheduled jobs in KIO::Scheduler.
>
> Anyhow, the only change that is missing from the patch is the modification
> of KIO::Scheduler::doJob function to remove a job from the scheduled jobs
> list if when it is called just like it is done in the ::scheduleJob
> function now. This then can be used by any download manager that wants to
> control the scheduling of jobs. So by default jobs get scheduled, but
> applications that want more control can simply call
> KIO::Scheduler::doJob(...) directly to bypass the scheduling process and
> ignore the limits set in the *.protocol files...
>
> On Thursday 20 August 2009 11:04:22 David Faure wrote:
> > On Thursday 20 August 2009, Dawit A. wrote:
> > > "Direct scheduling" guarantees that if I make 4 requests one
> > > after the other, all those requests will immediately spawn a kioslave.
> >
> > Which is a dangerous guarantee, if you end up with "FTP server: too many
> > connections". So we need this to be configurable somehow (and I think
> > the somehow is "by the user" rather than "through the kio api", since
> > apps shouldn't have to care whether they're talking to often-limited-FTP
> > or to fish/sftp/smb/file/etc.).
> >
> > > In the scheduled case, no matter what everything is queued and
> > > processed serially one after the other.
> >
> > Are you sure? That would be bad, we don't want that, we want "process
> > things at the pace of N slaves per protocol or host". But this might be
> > what maelcum started working on and that isn't finished, though.
> > In that case, we have to postpone this change indeed.
> >
> > For kio_file I certainly want the behavior where N is quite big, so that
> > things happen in parallel -- but not too much, cf the initial post in
> > this thread. N should be maybe 50, but not infinite.
> >
> > > Well the "not waiting much" is a very important point here. And as an
> > > example I will give you a download manager that provides the user the
> > > option to set the number of connections to make per site. How would it
> > > be able to do that if the default is changed to "Scheduled" and
> > > "Direct" is completely gone ?
> >
> > By having that option being honoured by the scheduler ;-)
> >
> > But this looks like the heart of our disagreement here; I'm thinking of a
> > situation where the kio scheduler would implement that, and therefore
> > -all- apps benefit from "N connections per site" limitations (which would
> > fix a very old kde bug), rather than the user _having_ to use a download
> > manager in order to get that feature.
> >
> > > In addition to the fact that KIO::Scheduler::scheduleJob does not allow
> > > anything but a SimpleJob (dunno why that restriction) to be scheduled,
> >
> > Because only a SimpleJob is talking to an actual kioslave.
> > A CopyJob is a high-level job which takes care of subjobs: one for doing
> > a stat (file or directory?), one for doing a recursive listing of that
> > directory, one job for copying each file, one mkdir job per directory to
> > create, etc. That "high-level" job is totally unknown to the scheduler,
> > since the scheduler is about giving slaves to jobs ;) But the highlevel
> > CopyJob doesn't need any slave for itself, it's just a qobject with
> > timers and slots and subjobs.
> >
> > > the function name itself is the source of confusion. It would have been
> > > better if the function was named queueJobs or something along that line
> > > and allowed you to provide the maximum number of ioslaves that can be
> > > spawned to service any request per site/location.
> >
> > As I said, I don't think that decision belongs to the application. It
> > belongs to the KDE configuration (i.e. the user, with sensible defaults).
> > The fact that ftp.foo.org only accepts 1 connection per IP is not
> > something that all apps should have to care about. Or the fact that the
> > KIO scheduler shouldn't see 150 incoming jobs at the same time. We can
> > take care of such problems inside KIO, without duplicating logic in the
> > apps.
> >
> > An app which -wants- to control this, like a download manager, should of
> > course be able to tune settings in the kio scheduler, I'm not rejecting
> > that possibilty of course, but this is only for special kinds of apps,
> > not something all apps should have to do.
> >
> > Oh and with my patch all kio jobs automatically get scheduled so
> > scheduleJob() becomes completely internal, no need to call it from
> > anywhere, and therefore its name doesn't really matter :-))

["schedule_all_modified.diff" (text/x-patch)]

Index: job.cpp
===================================================================
--- job.cpp	(revision 1015276)
+++ job.cpp	(working copy)
@@ -308,6 +308,7 @@
     }
 
     Scheduler::doJob(q);
+    Scheduler::scheduleJob(q); // experiment
 }
 
 
Index: scheduler.cpp
===================================================================
--- scheduler.cpp	(revision 1015276)
+++ scheduler.cpp	(working copy)
@@ -382,9 +382,12 @@
     }
 }
 
-void SchedulerPrivate::doJob(SimpleJob *job) {
+void SchedulerPrivate::doJob(SimpleJob *job) {  
     JobData jobData;
-    jobData.protocol = KProtocolManager::slaveProtocol(job->url(), jobData.proxy);
+    jobData.protocol = KProtocolManager::slaveProtocol(job->url(), jobData.proxy);    
+    ProtocolInfo *protInfo = protInfoDict.get(jobData.protocol);
+    if (protInfo)
+      protInfo->joblist.removeOne(job);
 //    kDebug(7006) << "protocol=" << jobData->protocol;
     if (jobCommand(job) == CMD_GET)
     {
@@ -401,15 +404,17 @@
 }
 
 void SchedulerPrivate::scheduleJob(SimpleJob *job) {
+
     newJobs.removeOne(job);
     const JobData& jobData = extraJobData.value(job);
 
     QString protocol = jobData.protocol;
 //    kDebug(7006) << "protocol=" << protocol;
     ProtocolInfo *protInfo = protInfoDict.get(protocol);
-    protInfo->joblist.append(job);
-
-    slaveTimer.start(0);
+    if (!protInfo->joblist.contains(job)) { // scheduleJob already called for this job?
+        protInfo->joblist.append(job);
+        slaveTimer.start(0);
+    }
 }
 
 void SchedulerPrivate::cancelJob(SimpleJob *job) {


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic