[prev in list] [next in list] [prev in thread] [next in thread]
List: xen-users
Subject: Re: [Xen-users] whether xen scheduler supports preemption
From: <zhangwqh () 126 ! com>
Date: 2013-06-30 15:00:48
Message-ID: 63677370.c228.13f9599c076.Coremail.zhangwqh () 126 ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
[Attachment #4 (text/plain)]
Thank you very much for your guidance!
At 2013-06-27 18:30:05,"Dario Faggioli" <dario.faggioli@citrix.com> wrote:
> So, first of all... Can you use plain text instead of HTML for e-mails?
>
> On mer, 2013-06-26 at 21:16 +0800, ÕÅΰ wrote:
> > Thank you very much for your detail explanation! See below.
> >
> You're welcome. Although, at this point, I'm curious about why you're
> interested in this... What is it that you want to achieve?
At first, I have a wrong understanding for xen scheduler preemption. I thought it did \
not support preemption. Last week, my advisor corrects my thought. So I want to know \
if a system supports preemption, the code which key part need to do the modification. \
At first, I add something in xen scheduler(only simple). My modification will bring \
some virtual machines starvation. Now I want to decrease the starvation. I need to \
add some other things. I meet a serious problem, in the schedule() or \
csched_schedule() function, if access the variable csched_dom structure, the system \
will automatically reboot. Eg, if add the printk("The domain weight is %d", \
sdom->weight); in the csched_schedule() or schedule(), the system will automatically \
reboot and can not enter the system. Do you know why? It is very strange. In these \
two functions, I can successfully access the variable of csched_vcpu structure and \
csched_private.
>
> > > ... Yes, that is at least most of it. In fact, when a vcpu wakes up, it
> > > is added to a specific runq, and the 'tickling' mechanism is there right
> > > to ensure that the said vcpu starts to run as soon as possible, either
> > > if there are idle pcpus, or the running vcpus have lower priority, the
> > > latter case being the definition of preemption.
> > When a vcpu wakes up, it is added to a specific runq. Whether the specific runq \
> > is the runnable queue?
> Well, the vcpu wakes-up, so yes, it is the runnable queue of a specific
> pCPU. Which 'specific pCPU' depends, and I suggest you looking more
> deeply in the scheduler code. From the top of my head, I'd say it is the
> runqueue of the pCPU where the vCPU was when it went to sleep.
>
> > either if there are idle pcpus, or the running vcpus have lower priority?
> >
> In credit1, it works like this:
> - you (the vCPU) wake-up and I (Xen scheduler) queue you on the runq
> of the pCPU when you where before going to sleep;
> - if that pCPU is busy, I poke other pCPUs to see if you can run there
> (that's the meaning of 'tickling');
> - if the above is not possible, I check if preemption is required. If
> yes, I preempt the vCPU running on the runq, if not, you have to wait
> for your turn (or for some other pCPU becoming idle and picking you
> up) in the runq.
>
> Does that make sense?
>
Yes, now it make sense. Thank you very much for trying to let me understand what you \
said.
> > I do not understand your meaning. You mean that if there are idle pcpus, the \
> > waked up vcpu will be scheduled on the idle pcpus to run.
> For sure, the scheduler will try as hard as he can to achieve this, yes.
>
> > If not, it will preempted the current running vcpus if the waked up vcpu has the \
> > higher priority compared to the the current vcpu. Whether my understanding is \
> > right?
> I believe it is. Actually, I believe this is either the definition or,
> in any case, the only sensible thing that a reasonable enough
> preemptible scheduler should do. :-)
>
> For the deep technicalities of how this is implemented in credit1,
> please refer to my hopefully accurate explanation above, or, even
> better, to sched_credit.c.
>
> > > If you, for instance, avoid raising the SCHEDULE_SOFTIRQ for busy
> > > pcpus
> > > (I would still tickle the idle ones, or you'll get funny results! :-O),
> > > you definitely are making the (credit) scheduler less preemptible.
> > I can not understand here. still tickle the idle ones, or you'll get funny \
> > results! What's the meaning?
> The meaning is that, given the explanation above, inhibiting preemption
> by, for instance, not tickling the busy pCPUs might actually work. On
> the other hand, if you have idle pCPUs, having them running the woken-up
> task is not a preemption, right? Well, if you do not tickle those pCPUs
> you won't get there, and you not only will get rid of peemption on busy
> pCPUs, you will also have idle pCPUs that remains idle, even if there
> are vCPUs waiting to be executed.
>
> This means you're killing not only preemption, but also work
> conserving-ness, and that might not be among your original goals (or was
> it?).
>
> > > Of course, wake-ups is not the only cause of SCHEDULE_SOFTIRQ being
> > > raised. E.g., it fires periodically at the scheduling time slice
> > > boundaries. If you want to avoid vcpus being interrupted by others with
> > > higher priority for this case too, you probably have more paths to tweak
> > > than just the csched_vcpu_wake() function.
> > >
> > Yes, I can not remember the number of raising SCHEDULE_SOFTIRQ interrupt. Long \
> > time ago, I check the places of raising SCHEDULE_SOFTIRQ interrupt. It is about \
> > seven places.
> Fine. Then, to be sure, I'd check all of them and see what they end up
> doing. I know they're all calling csched_schedule(), what I mean is I'd
> check the conditions and the parameters, to verify which ones of these 7
> possible situations could lead to preemption.
>
> What you can be quite sure of, is ha there's not going to be a
> preemption without a call to csched_schedule() being involved, so you
> may even try to instrument the code at that level.. It really all
> depends on your final purpose.
>
> > > And here I'm failing at understanding what you mean again... When a
> > > SCHEDULE_SOFTIRQ is raised for a given pcpu, that pcpu will deal with
> > > it, well, ASAP (look at how softirqs & tasklets work in the hypervisor
> > > source code). What do you mean by "give up the physical cpu"?
> > I mean after raising the SCHEDULE_SOFTIRQ interrupt, the handler function \
> > schedule() will execute in time or need to wait the current vcpu scheduled out. \
> > Which part decides the priority among them?
> Mmm... I spot some confusion here. Why the scheduling out of a vcpu
> should be involved in all this? I mean, raising a SCHEDULE_SOFTIRQ and,
> most important, handling it, happens in Xen code. That means there is a
> pCPU executing hypervisor code, independently of which one is the vCPU
> that is or was running on that same pCPU. Well, this same hypervisor
> code will get to execute, at some point, csched_schedule(), make the
> scheduling decision and, if that is the case, dschedule the running vCPU
> and schedule another one (and here you are a preemption).
>
> Actually, we really can't wait for a vCPU to be descheduled to execute
> the Xen scheduler, since it's the Xen scheduler itself that deschedules
> vCPUs! :-O
>
> Perhaps, with "scheduled out" you mean something like block, i.e., you
> want to know if Xen is able to interrupt the vCPUs or if it always run
> them to completion or blocking. In which case, the former, we interrupt
> the vCPUs, just like an (preemptible) OS scheduler interrupts OS's
> tasks. Whether or not that will result in a preemption, it depends both
> on the scheduler and on the circumstances.
>
> Sounds better now?
>
> > Can you give me some guidance, where is the code for softirqs & tasklets.
> >
> Well, grep and find are usually good friends, when the question is where
> is the code! :-P
>
> Both
>
> $ grep tasklet xen.git/xen/*
>
> and
>
> $ grep softirq xen.git/xen/*
>
> Produce a lot of output here. Also, I'd try something like that... You
> know, programmers usually have quite few fantasy
>
> $ find ./xen.git/xen/ -iname tasklet*
> ./xen/include/xen/tasklet.h
> ./xen/common/tasklet.c
>
> $ find ./xen.git/xen/ -iname softirq*
> ./xen/include/asm-x86/softirq.h
> ./xen/include/xen/softirq.h
> ./xen/include/asm-arm/softirq.h
> ./xen/common/softirq.c
>
> > Another question:
> > In the schedule() function of schedule.c file, at first, it will set the flag \
> > tasklet_work_scheduled according to whether has the tasklet_work. What is the \
> > tasklet work?
> After having inspected at least some of the sources above, look for the
> do_tasklet() function, and revise what it does. If it's the concept of
> tasklet and softirq that you're unfamiliar with, well, very quickly it's
> just one way of deferring work in an OS (or, in out case, an hypervisor,
> but still).
>
> Linux makes uses of these kind of things pretty heavily (although the
> names, the implementation, and the number of different variants of them
> changes with kernel versions). I trust/hope you can find enough
> documentation about that on line. :-)
>
> > In the csched_schedule() of sched_credit.c file, it will give the idle vcpu \
> > boost priority if the tasklet_work_scheduled is set. I have some difficult for \
> > understanding this part. Maybe my confusion is not knowing the tasklet work. Can \
> > you give some explanation why designing like this?
> Again, tasklet is deferred work. That means there is this pretty
> function you want to call, but you can call it right now. Typical
> example is because you have interrupt disabled and the pretty function
> in question wants interrupt enabled, or it is you that you don't want to
> keep interrupts disabled for too long, or any other reason.
>
> Ok, what you do is to make a note about calling that function later, and
> that's exactly what tasklet does. The reason why we execute them in idle
> domain's context is, well, because we have to execute them
> somewhere! :-)
>
> Seriously, our scheduler schedules vCPUs, not 'functions', so you either
> call a function from where you are (and we already said you can't) or,
> when you're done, the scheduler will pick a vCPU and get on with it, and
> your function will never be called. What we hence do is making sure it
> is one of the idle domain's vCPUs that is scheduled, as well as making
> sure that such vCPU will call your function as part of 'its workload'.
>
> Check out the idle_loop() function, it's in xen/arch/x86/domain.c.
Thank you very much once again for your detail description!
>
> Regards,
> Dario
>
>
[Attachment #5 (text/html)]
<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><br><span \
style="color: rgb(255, 0, 0);">Thank you very much for your \
guidance!</span><br><pre>At 2013-06-27 18:30:05,"Dario Faggioli" <dario.faggioli@citrix.com> wrote:
>So, first of all... Can you use plain text instead of HTML for e-mails?
>
>On mer, 2013-06-26 at 21:16 +0800, ÕÅΰ wrote:
>> Thank you very much for your detail explanation! See below.
>>
>You're welcome. Although, at this point, I'm curious about why you're
>interested in this... What is it that you want to achieve?</pre><pre \
style="color: rgb(255, 0, 0);">At first, I have a wrong understanding for xen \
scheduler preemption. I thought it did not support preemption. Last week, my advisor \
corrects my thought. So I want to know if a system supports preemption, the code \
which key part need to do the modification. At first, I add something in xen \
scheduler(only simple). My modification will bring some virtual machines starvation. \
Now I want to decrease the starvation. I need to add some other things. I meet a \
serious problem, in the schedule() or csched_schedule() function, if access the \
variable csched_dom structure, the system will automatically reboot. Eg, if add the \
printk("The domain weight is %d", sdom->weight); in the csched_schedule() or \
schedule(), the system will automatically reboot and can not enter the system. Do you \
know why? It is very strange. In these two functions, I can successfully access the \
variable of csched_vcpu structure and csched_private.</pre><pre>> \
>> >... Yes, that is at least most of \
it. In fact, when a vcpu wakes up, it \
>> >is added to a specific runq, and the 'tickling' mechanism is there right
>> >to ensure that the said vcpu starts to run as soon as possible, either
>> >if there are idle pcpus, or the running vcpus have lower priority, the
>> >latter case being the definition of preemption.
>> When a vcpu wakes up, it is added&nb \
sp;to a specific runq. Whether the specific runq is the runnable queue?
>>
>Well, the vcpu wakes-up, so yes, it is the runnable queue of a specific
>pCPU. Which 'specific pCPU' depends, and I suggest you looking more
>deeply in the scheduler code. From the top of my head, I'd say it is the
>runqueue of the pCPU where the vCPU was when it went to sleep.
>
>> either if there are idle pcpus, or the running vcpus have lower priority?
>>
>In credit1, it works like this:
> - you (the vCPU) wake-up and I (Xen scheduler) queue you on the runq
> of the pCPU when you where before going to sleep;
> - if that pCPU is busy, I poke other pCPUs to see if you can run there
> (that's the meaning of 'tickling');
> - if the above is not possible, I check if preemption is required. If
> yes, I preempt the vCPU running on the runq, if not, you have to wait
> for your turn (or for some other pCPU becoming idle and picking you
> up) in the runq.
>
>Does that make sense?
></pre><pre style="color: rgb(255, 0, 0);">Yes, now it make sense. Thank you very \
much for trying to let me understand what you said. </pre><pre> \
>> I do not understand your meaning. You  \
;mean that if there are idle pcpus, the waked& \
nbsp;up vcpu will be scheduled on the idle pcpus to run.
>>
>For sure, the scheduler will try as hard as he can to achieve this, yes.
>
>> If not, it will preempted the current  \
;running vcpus if the waked up vcpu has the&nb \
sp;higher priority compared to the the current vcpu. Whether my understanding is right?
>>
>I believe it is. Actually, I believe this is either the definition or,
>in any case, the only sensible thing that a reasonable enough
>preemptible scheduler should do. :-)
>
>For the deep technicalities of how this is implemented in credit1,
>please refer to my hopefully accurate explanation above, or, even
>better, to sched_credit.c.
>
>> > If you, for instance, avoid raising the SCHEDULE_SOFTIRQ for busy
>> > pcpus
>> > (I would still tickle the idle ones, or you'll get funny results! :-O),
>> > you definitely are making the (credit) scheduler less preemptible.
>> I can not understand here. still tickle&n \
bsp;the idle ones, or you'll get funny results! What's the meaning?
>>
>The meaning is that, given the explanation above, inhibiting preemption
>by, for instance, not tickling the busy pCPUs might actually work. On
>the other hand, if you have idle pCPUs, having them running the woken-up
>task is not a preemption, right? Well, if you do not tickle those pCPUs
>you won't get there, and you not only will get rid of peemption on busy
>pCPUs, you will also have idle pCPUs that remains idle, even if there
>are vCPUs waiting to be executed.
>
>This means you're killing not only preemption, but also work
>conserving-ness, and that might not be among your original goals (or was
>it?).
>
>> >Of course, wake-ups is not the only cause of SCHEDULE_SOFTIRQ being
>> >raised. E.g., it fires periodically at the scheduling time slice
>> >boundaries. If you want to avoid vcpus being interrupted by others with
>> >higher priority for this case too, you probably have more paths to tweak
>> >than just the csched_vcpu_wake() function.
>> >
>> Yes, I can not remember the number of \
raising SCHEDULE_SOFTIRQ interrupt. Long time ago,  \
;I check the places of raising SCHEDULE_SOFTIRQ interrupt. It is about seven places.
>>
>Fine. Then, to be sure, I'd check all of them and see what they end up
>doing. I know they're all calling csched_schedule(), what I mean is I'd
>check the conditions and the parameters, to verify which ones of these 7
>possible situations could lead to preemption.
>
>What you can be quite sure of, is ha there's not going to be a
>preemption without a call to csched_schedule() being involved, so you
>may even try to instrument the code at that level.. It really all
>depends on your final purpose.
>
>> >And here I'm failing at understanding what you mean again... When a
>> >SCHEDULE_SOFTIRQ is raised for a given pcpu, that pcpu will deal with
>> >it, well, ASAP (look at how softirqs & tasklets work in the hypervisor
>> >source code). What do you mean by "give up the physical cpu"?
>> I mean after raising the SCHEDULE_SOFTIRQ  \
;interrupt, the handler function schedule() will execute \
in time or need to wait the current vcpu \
scheduled out. Which part decides the priority among them?
>>
>Mmm... I spot some confusion here. Why the scheduling out of a vcpu
>should be involved in all this? I mean, raising a SCHEDULE_SOFTIRQ and,
>most important, handling it, happens in Xen code. That means there is a
>pCPU executing hypervisor code, independently of which one is the vCPU
>that is or was running on that same pCPU. Well, this same hypervisor
>code will get to execute, at some point, csched_schedule(), make the
>scheduling decision and, if that is the case, dschedule the running vCPU
>and schedule another one (and here you are a preemption).
>
>Actually, we really can't wait for a vCPU to be descheduled to execute
>the Xen scheduler, since it's the Xen scheduler itself that deschedules
>vCPUs! :-O
>
>Perhaps, with "scheduled out" you mean something like block, i.e., you
>want to know if Xen is able to interrupt the vCPUs or if it always run
>them to completion or blocking. In which case, the former, we interrupt
>the vCPUs, just like an (preemptible) OS scheduler interrupts OS's
>tasks. Whether or not that will result in a preemption, it depends both
>on the scheduler and on the circumstances.
>
>Sounds better now?
>
>> Can you give me some guidance, where is the code for softirqs & tasklets.
>>
>Well, grep and find are usually good friends, when the question is where
>is the code! :-P
>
>Both
>
>$ grep tasklet xen.git/xen/*
>
>and
>
>$ grep softirq xen.git/xen/*
>
>Produce a lot of output here. Also, I'd try something like that... You
>know, programmers usually have quite few fantasy
>
>$ find ./xen.git/xen/ -iname tasklet*
>./xen/include/xen/tasklet.h
>./xen/common/tasklet.c
>
>$ find ./xen.git/xen/ -iname softirq*
>./xen/include/asm-x86/softirq.h
>./xen/include/xen/softirq.h
>./xen/include/asm-arm/softirq.h
>./xen/common/softirq.c
>
>> Another question:
>> In the schedule() function of schedule.c f \
ile, at first, it will set the flag tasklet_wo \
rk_scheduled according to whether has the tasklet_work. What is the tasklet work?
>>
>After having inspected at least some of the sources above, look for the
>do_tasklet() function, and revise what it does. If it's the concept of
>tasklet and softirq that you're unfamiliar with, well, very quickly it's
>just one way of deferring work in an OS (or, in out case, an hypervisor,
>but still).
>
>Linux makes uses of these kind of things pretty heavily (although the
>names, the implementation, and the number of different variants of them
>changes with kernel versions). I trust/hope you can find enough
>documentation about that on line. :-)
>
>> In the csched_schedule() of sched_credit.c&nbs \
p;file, it will give the idle vcpu boost priority if the tasklet_work_scheduled is set.
>> I have some difficult for understanding t \
his part. Maybe my confusion is not knowing th \
e tasklet work. Can you give some explanation why designing like this?
>>
>Again, tasklet is deferred work. That means there is this pretty
>function you want to call, but you can call it right now. Typical
>example is because you have interrupt disabled and the pretty function
>in question wants interrupt enabled, or it is you that you don't want to
>keep interrupts disabled for too long, or any other reason.
>
>Ok, what you do is to make a note about calling that function later, and
>that's exactly what tasklet does. The reason why we execute them in idle
>domain's context is, well, because we have to execute them
>somewhere! :-)
>
>Seriously, our scheduler schedules vCPUs, not 'functions', so you either
>call a function from where you are (and we already said you can't) or,
>when you're done, the scheduler will pick a vCPU and get on with it, and
>your function will never be called. What we hence do is making sure it
>is one of the idle domain's vCPUs that is scheduled, as well as making
>sure that such vCPU will call your function as part of 'its workload'.
>
>Check out the idle_loop() function, it's in xen/arch/x86/domain.c.</pre><pre \
style="color: rgb(255, 0, 0);">Thank you very much once again for your detail \
description!</pre><pre> >
>Regards,
>Dario
>
>
</pre></div><br><br><span title="neteasefooter"><span \
id="netease_mail_footer"></span></span>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic