From kde-core-devel Fri Aug 28 21:25:54 2009 From: "Dawit A." Date: Fri, 28 Aug 2009 21:25:54 +0000 To: kde-core-devel Subject: RFC: KIO::Scheduler changes... [PATCH included] Message-Id: <200908281725.54124.adawit () kde ! org> X-MARC-Message: https://marc.info/?l=kde-core-devel&m=125149481030576 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--Boundary-00=_itEmKmSQ0phKHtB" --Boundary-00=_itEmKmSQ0phKHtB Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This is a follow up on the the recent discussions about what KIO::Scheduler does by default and how we can go about improving the current default behavior to resolve some long standing issues in KDE as well as limit the number new ioslaves (read: new connections) that get spawned. For further details please read the the threads linked below or the explanation at the bottom of this email. The patches provided here are my attempt to address the current short comings of the scheduler and are based on a patch David posted in [1] with the following additions/modifications: 1.) Add support for optional host based limits to the .protocol files (maxInstancesPerHost) and if set use them in KIO::Scheduler::scheduleJob. 2.) Modify the job.cpp class to ensure all redirected requests are re- scheduled. 3.) Remove scheduled jobs if they are to be used in connection-oriented scheduling mode, KIO::Scheduler::assignJobToSlave. --------------------------------------------------------------------------------------------------------------------------------------------- What follows is a brief explanation of how jobs are currently handled by the scheduler and how they will be handled when the attached patches are applied for those that do not want to be bothered to read the other threads linked below. First, for the benefit of those not familiar with the inner workings of KIO, here is roughly how a job is handled by KIO internally: 1.) A call to KIO::get, KIO::post, KIO::mimetype comes in... 2.) A job is created and of it is of type SimpleJob, it is handed over to KIO::Scheduler... 3.) KIO::Scheduler simply checks to see if there is an idle ioslave that can be reused to handle the request. If one is found, it is reused. 4.) If none was found, the scheduler will always attempt to create a brand new ioslave. No limits on the number of ioslaves it will create. The issue with this is that the scheduler by default does not have any limits on the number ioslaves (read: new connections) that it spawns. Instead it forces you to call an additional function, KIO::Scheduler::scheduleJob, if you want to limit the number ioslave instances. In practise the problems with this default behavior only affects users under certain circumstances where servers actually put a limit on the number of connections they allow from a given address. This is specially true for more than a few ftp servers out there. In fact, there is a long standing FTP related bug vs Konqueror caused by this very issue. However, just because users do not directly see the side effects does not mean there is no problem with what the scheduler does by default. Creating too many connections by spawning many ioslaves is a costly affair to both to the client and server machines since it will unnecessarily result in the creation of many sockets. Hence, we end up using too many resources. Okay so what does this attach patch do ? 1.) Schedule all simple jobs created through KIO:: by default. 2.) Add an optional "maxInstancesPerHost" property to the *.protocol files that is honored when jobs are scheduled which is now the default. KIO::scheduleJob currently honors the existing "maxInstances" property set in each *.protocol file as a 'soft limit'. However, if #1 is made the default, then this will cause a bottle neck at the scheduler since the "maxInstances" property is per protocol (ftp, http). For example, for the ioslaves included in kdelibs, these value is set to 4, 3, and 2 in file, http and ftp protocol files respectively. Having only this many instances of ioslaves KDE wide is not sufficient. Hence the new property. After that it is only a matter of setting both "maxInstances=" and "maxInstancesPerHost=" proerties to some sane default values so that the scheduler can do a better job of scheduling requests per host. For example, fot the http protocol the "maxInstances" property can be raised to "15", 5x as much as its original value, and "maxInstancesPerHost" set to "3". This means that there cannot be more than 15 instances of the http ioslave at any given time and no more than 3 connection to any given host. All jobs will be queued until this criteria is fulfilled... What issues remain ? 1.) Unfortunately the patch to job.cpp will break applications that might want to have direct control over the number of ioslaves (read: connections) , e.g. a download manager. The change to scheduling jobs by default will make direct control impossible since all jobs will be queued. So, an additional function is required in KIO::Scheduler to remove the scheduled job from the queue or such applications need to call KIO::Scheduler::doJob again and we can add the remove from queue check there ??? Somehow the last one seems very inefficient to me, doJob(...), scheduleJob(...) and then doJob(...) again... 2.) Should the other scheduling modes, besides "scheduled mode" be forced to honor the "maxInstances" property ? My own view is no since the default does and the other modes are meant to give the users of KIO complete control to schedule their jobs how they see fit. Anyhow, comments, suggestions, improvements and everything in between is welcome... [1] http://lists.kde.org/?t=124531188100002&r=1&w=2 [2] http://lists.kde.org/?t=125042610900001&r=1&w=2 --Boundary-00=_itEmKmSQ0phKHtB Content-Type: text/x-patch; charset="UTF-8"; name="scheduler.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="scheduler.patch" Index: kio/scheduler.cpp =================================================================== --- kio/scheduler.cpp (revision 1016510) +++ kio/scheduler.cpp (working copy) @@ -162,13 +162,17 @@ class KIO::SchedulerPrivate::ProtocolInfo { public: - ProtocolInfo() : maxSlaves(1), skipCount(0) + ProtocolInfo() : maxSlaves(1), maxSlavesPerHost(0), skipCount(0) { } ~ProtocolInfo() { - qDeleteAll(allSlaves()); + qDeleteAll(joblist.begin(), joblist.end()); + qDeleteAll(activeSlaves.begin(), activeSlaves.end()); + qDeleteAll(idleSlaves.begin(), idleSlaves.end()); + qDeleteAll(coSlaves.begin(), coSlaves.end()); + qDeleteAll(coIdleSlaves.begin(), coIdleSlaves.end()); } // bad performance, but will change this later @@ -181,12 +185,30 @@ return ret; } + int activeSlavesCountFor(const QString& host) const + { + int count = 0; + + if (!host.isEmpty()) + { + QListIterator it (activeSlaves); + while (it.hasNext()) + { + if (host == it.next()->host()) + count++; + } + } + + return count; + } + QList joblist; SlaveList activeSlaves; SlaveList idleSlaves; CoSlaveMap coSlaves; SlaveList coIdleSlaves; int maxSlaves; + int maxSlavesPerHost; int skipCount; QString protocol; }; @@ -200,6 +222,7 @@ info = new ProtocolInfo; info->protocol = protocol; info->maxSlaves = KProtocolInfo::maxSlaves( protocol ); + info->maxSlavesPerHost = KProtocolInfo::maxSlavesPerHost( protocol ); insert(protocol, info); } @@ -382,9 +405,9 @@ } } -void SchedulerPrivate::doJob(SimpleJob *job) { +void SchedulerPrivate::doJob(SimpleJob *job) { JobData jobData; - jobData.protocol = KProtocolManager::slaveProtocol(job->url(), jobData.proxy); + jobData.protocol = KProtocolManager::slaveProtocol(job->url(), jobData.proxy); // kDebug(7006) << "protocol=" << jobData->protocol; if (jobCommand(job) == CMD_GET) { @@ -401,15 +424,17 @@ } void SchedulerPrivate::scheduleJob(SimpleJob *job) { + newJobs.removeOne(job); const JobData& jobData = extraJobData.value(job); QString protocol = jobData.protocol; // kDebug(7006) << "protocol=" << protocol; ProtocolInfo *protInfo = protInfoDict.get(protocol); - protInfo->joblist.append(job); - - slaveTimer.start(0); + if (!protInfo->joblist.contains(job)) { // scheduleJob already called for this job? + protInfo->joblist.append(job); + slaveTimer.start(0); + } } void SchedulerPrivate::cancelJob(SimpleJob *job) { @@ -520,7 +545,7 @@ SimpleJob *job = 0; Slave *slave = 0; - + if (protInfo->skipCount > 2) { bool dummy; @@ -528,7 +553,10 @@ // 2 times in a row. The protInfo->skipCount = 0; job = protInfo->joblist.at(0); - slave = findIdleSlave(protInfo, job, dummy ); + + if (protInfo->maxSlavesPerHost < 1 || + protInfo->maxSlavesPerHost > protInfo->activeSlavesCountFor(job->url().host())) + slave = findIdleSlave(protInfo, job, dummy ); } else { @@ -538,7 +566,11 @@ for(int i = 0; (i < protInfo->joblist.count()) && (i < 10); i++) { job = protInfo->joblist.at(i); - slave = findIdleSlave(protInfo, job, exact); + + if (protInfo->maxSlavesPerHost < 1 || + protInfo->maxSlavesPerHost > protInfo->activeSlavesCountFor(job->url().host())) + slave = findIdleSlave(protInfo, job, exact); + if (!firstSlave) { firstJob = job; @@ -561,10 +593,20 @@ if (!slave) { - if ( protInfo->maxSlaves > static_cast(protInfo->activeSlaves.count()) ) + KUrl url (job->url()); + +// kDebug() << "# of slave allowed : " << protInfo->maxSlaves; +// kDebug() << "# of active ioslaves: " << protInfo->activeSlaves.count(); +// kDebug() << "# of active ioslaves allowed per host: " << protInfo->maxSlavesPerHost; +// if (!url.isLocalFile()) +// kDebug() << "# of active ioslaves for " << url.host() << ": " << protInfo->activeSlavesCountFor(url.host()); + + if ( protInfo->maxSlaves > protInfo->activeSlaves.count() && + (protInfo->maxSlavesPerHost < 1 || + protInfo->maxSlavesPerHost > protInfo->activeSlavesCountFor(url.host())) ) { newSlave = true; - slave = createSlave(protInfo, job, job->url()); + slave = createSlave(protInfo, job, url); if (!slave) slaveTimer.start(0); } @@ -955,16 +997,15 @@ { // kDebug(7006) << "_assignJobToSlave( " << job << ", " << slave << ")"; QString dummy; - if ((slave->slaveProtocol() != KProtocolManager::slaveProtocol( job->url(), dummy )) - || - (!newJobs.removeAll(job))) + ProtocolInfo *protInfo = protInfoDict.get(slave->protocol()); + if ((slave->slaveProtocol() != KProtocolManager::slaveProtocol( job->url(), dummy )) || + (!protInfo->joblist.removeAll(job) && !newJobs.removeAll(job))) { kDebug(7006) << "_assignJobToSlave(): ERROR, nonmatching or unknown job."; job->kill(); return false; } - ProtocolInfo *protInfo = protInfoDict.get(slave->protocol()); JobList *list = protInfo->coSlaves.value(slave); assert(list); if (!list) --Boundary-00=_itEmKmSQ0phKHtB Content-Type: text/x-patch; charset="UTF-8"; name="job.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="job.patch" Index: kio/job.cpp =================================================================== --- kio/job.cpp (revision 1016510) +++ kio/job.cpp (working copy) @@ -308,6 +308,7 @@ } Scheduler::doJob(q); + Scheduler::scheduleJob(q); } @@ -627,6 +628,7 @@ // Return slave to the scheduler d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } @@ -834,6 +836,7 @@ // Return slave to the scheduler d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } @@ -999,6 +1002,7 @@ // Return slave to the scheduler d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } @@ -1597,6 +1601,7 @@ // Return slave to the scheduler d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } @@ -2420,6 +2425,7 @@ // Return slave to the scheduler d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } @@ -2671,6 +2677,7 @@ d->m_url = d->m_waitQueue.first().url; d->slaveDone(); Scheduler::doJob(this); + Scheduler::scheduleJob(this); } } } --Boundary-00=_itEmKmSQ0phKHtB Content-Type: text/x-patch; charset="UTF-8"; name="kprotocolinfo.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="kprotocolinfo.patch" Index: kprotocolinfo_p.h =================================================================== --- kprotocolinfo_p.h (revision 1016419) +++ kprotocolinfo_p.h (working copy) @@ -58,6 +58,7 @@ //KUrl::URIMode uriMode; QStringList capabilities; QString proxyProtocol; + int maxSlavesPerHost; }; Index: kprotocolinfo.cpp =================================================================== --- kprotocolinfo.cpp (revision 1016419) +++ kprotocolinfo.cpp (working copy) @@ -69,6 +69,7 @@ m_icon = config.readEntry( "Icon" ); m_config = config.readEntry( "config", m_name ); m_maxSlaves = config.readEntry( "maxInstances", 1); + d->maxSlavesPerHost = config.readEntry( "maxInstancesPerHost", 0); QString tmp = config.readEntry( "input" ); if ( tmp == "filesystem" ) @@ -151,7 +152,7 @@ >> d->capabilities >> d->proxyProtocol >> i_canRenameFromFile >> i_canRenameToFile >> i_canDeleteRecursive >> i_fileNameUsedForCopying - >> d->archiveMimetype; + >> d->archiveMimetype >> d->maxSlavesPerHost; m_inputType = (Type) i_inputType; m_outputType = (Type) i_outputType; @@ -230,7 +231,7 @@ << capabilities << proxyProtocol << i_canRenameFromFile << i_canRenameToFile << i_canDeleteRecursive << i_fileNameUsedForCopying - << archiveMimetype; + << archiveMimetype << maxSlavesPerHost; } @@ -282,6 +283,15 @@ return prot->m_maxSlaves; } +int KProtocolInfo::maxSlavesPerHost( const QString& _protocol ) +{ + KProtocolInfo::Ptr prot = KProtocolInfoFactory::self()->findProtocol(_protocol); + if ( !prot ) + return 0; + + return prot->d_func()->maxSlavesPerHost; +} + bool KProtocolInfo::determineMimetypeFromExtension( const QString &_protocol ) { KProtocolInfo::Ptr prot = KProtocolInfoFactory::self()->findProtocol( _protocol ); Index: ksycoca.cpp =================================================================== --- ksycoca.cpp (revision 1016419) +++ ksycoca.cpp (working copy) @@ -55,7 +55,7 @@ * If the existing file is outdated, it will not get read * but instead we'll ask kded to regenerate a new one... */ -#define KSYCOCA_VERSION 143 +#define KSYCOCA_VERSION 144 /** * Sycoca file name, used internally (by kbuildsycoca) Index: kprotocolinfo.h =================================================================== --- kprotocolinfo.h (revision 1016419) +++ kprotocolinfo.h (working copy) @@ -218,7 +218,19 @@ */ static int maxSlaves( const QString& protocol ); + /** + * Returns the limit on the number of slaves for this protocol per host. + * + * This corresponds to the "maxInstancesPerHost=" field in the protocol description file. + * The default is 0 which means there is no per host limit. + * + * @param protocol the protocol to check + * @return the maximum number of slaves, or 1 if unknown + */ + static int maxSlavesPerHost( const QString& protocol ); + + /** * Returns whether mimetypes can be determined based on extension for this * protocol. For some protocols, e.g. http, the filename extension in the URL * can not be trusted to truly reflect the file type. --Boundary-00=_itEmKmSQ0phKHtB--