[prev in list] [next in list] [prev in thread] [next in thread] 

List:       tortoisesvn-dev
Subject:    Statsdlg - first patch: data gathering upgrade
From:       "Andreas Nicolai" <Andreas.Nicolai () gmx ! net>
Date:       2007-10-07 15:37:36
Message-ID: op.tzt20ycjt8lo91 () helium
[Download RAW message or body]

Hi there,

while I'm hacking away on the stats dialog, I created (+attached) the  
first patch that includes the reworking of the stats data gathering  
algorithm.

The patch only affects the files: StatGraphDlg.h and StatGraphDlg.cpp and  
is created against revision 10908.

Here's a brief review of the code changes:

1. week count:
old: The previous implementation took the first date and the last date in  
the array as time span.
new: The new implementation searches for min and max dates, then aligns  
the earliest date with a date at the begin of the corresponding week, then  
this date is stored in a new member variable m_minDate.

2. data gathering:
old: the previous implementation was implemented such that a lot of binary  
searches (using lower_bound) were executed for _each_ commit. This caused  
the noticable delay when opening the stats dialog for large number of  
revisions (e.g. try "Show all" in the TSVN repository and open the stats  
dialog). Also, reoccuring weeks due to later import of revision histories  
would be treated as new weeks and thus not giving the correct stats.

new: The new implementation loops over all weeks in the intervals  
determined in GetWeekCount() and stores for each week/interval the number  
of commits and file changes per author, it also keeps track of the total  
commit count and total file change count. At the same time the commits for  
each author are stored in a mapping. Then a list of author names is  
created and the list is sorted based on commit count. For that purpose I  
wrote a binary predicate class MoreCommitsThan to be able to compare  
authors based on their commit count. As a result, all the sorting during  
the data gathering is no longer necessary and the time expensive  
CountCommits() function can be removed alltogether. Further, the required  
stats are obtained for the min/max author (first and last in the sorted  
list) and the dialogs statistics can be shown.

I documented the new code fairly detailed so it shouldn't be too hard to  
follow (I hope).

Just one thing I noted... Because of the aligning to begin/end of the  
week, revision intervals that start in the middle of a week and end in the  
middle of the week may actually be reported as one week longer than the  
time span actually is. However, if I don't align the interval with the  
start of the week, the weekly interval may actuall start on a Wednesday  
and last until next weeks Tuesday. For a different revision range (maybe  
including the previous 200 revs) the interval may be between Friday and  
next weeks Thursday. This, however, results in different min/max commit  
and file changes counts. So I guess I don't get around the aligning part,  
and for the improved data gathering algorithm I need the m_minDate.


Design questions:
1. The data structures created in the ShowStats() dialog need to be used  
in the other statistics functions as well. Re-gathering the data would be  
a waste of time, so I would propose making these variables member  
variables of the dialog that get populated when the dialog is first shown.  
All other statistics views can then use the information and  
obtain/calculate specific other data. Would that make sense having these  
mappings and lists as member variables?

2. The maps for the commit and file change data is currently of type:  
map<int, map<stdstring, LONG> >  so that data can be accessed by:

LONG commits = commitsPerAuthorAndWeek[week_nr][author_name];

However, the memory needed for storing the data could be reduced if  
instead of strings the authors would be identified by a number that and  
the name/number connection is made via yet another mapping. So, the  
statement above would look like:

LONG commits = commitsPerAuthorAndWeek[week_nr][authorNumber[author_name]];

Since the memory footprint of the statistics dialog is rather low compared  
to the log dialog, I would probably postpone this upgrade until later.  
Also, it would hurt readibility of the code, so I'd prefer the way data is  
stored now. What are your thoughts on this?


Bye,
Andreas

-- 
Andreas Nicolai                         anicolai@syr.edu
PhD Candidate, M.A.M.E                  (315) 443-2641
Syracuse University
151 Link Hall
Syracuse, NY, 13244
["statsdlg_improvement_1.zip" (statsdlg_improvement_1.zip)]

PKuYG7EFcY<statsdlg_improvement_1.patchio۸
?0}خTq=sxuxNR4N- \
Pl:*K$'ͶofDHv"K$'$t#9^,TIh+Vo \
|zڵTkZm~uk6wk쾨}޽{Reqp.c׋S>?cQ\?o8H'@9kWkjhc0 \
Ԉk@$q;ejΜGg]xhjw !c'l \
|ᄈ,YFʉw	`gs!x49>cYOh1YYUxmZKP
 Y͝E ͞iGdLPV!Wk`C'B˶ƁJg̛Euf3$gg \
y3!20?j\ @_ptYߵYG;Wq\Uil	gm@VꒊMn}%lxl|tH;xb్Ң
 3u S?N]qx7
T'w@{|oύga`/;yů]߲qvzY ?lZ:ݑMŦptK7!
XW)'?!uqB.-f \
i`H>PB6UtRu)վ|ɕW&R J/CO ;ㄧh Owq \
[n._(LM$7aCG(|ce5f:D``<#GPePZ \
?ïnǝUY^v8Yv=#9X3L\,:hA \
Z;Xڅ:bl؇e\q//{#.d[OοF \
W+.CL8!ǽK<ǝo4NT~[]:/W<dG)('<wx37bc,q"@ \
GD-˦B?bKDƙm`-VPOVɜP61]*aCհ`/`σNf7\!mn$ek70X \
o+ x6DLA1iQDDBI \
1t<}[>s\:VTBE9Op:WoZ-d7GK`q</S. \
|O I%y@\bhz1A-5aP@wb˵y@oQ6)Gw:MXn]:$cz*mӬ@AWO5 \
Q Oi 8=] ":Vy \
gJPS7PYpk&Lsn2m6fbP+ET͌*ԙl[:>	B5ǣ(g \
$ʽXqgj}6ufal \
jBrIYޗf*6E-IRBMNYg(OX8F!(eM \
o*'ri2W:;O*fn{pp=A \
HTB^:C}᫾w	C"{l*=4j[ LIVw&^ \
hY}d\,dS! \
15i7M&YL:()qN2e=V	:OGt(NG-	k'%<5U&$pBjCn)mYpEAH~44"[l@ꦵ>TգEo<$dQ3
 @;	:.dxRNQu4K
f-Q4RZuT"Nք}O R*v(crX^'HTb<F		U*Ti7
q|wJ(eQoa}RIN\HOJg&IXWDFR*!x$>hZΈv+k0wcBے}AClxClau\s> \
- br{LgsoCɛ۸T|jľv}\f_7Ld"-N#` y})
͵bbxIu}ʚ7Տ8!LpuGJU%^k&7O*l6 \
^B9C*uIL'vד^\՘0<SiYkR8FȘ:͒8Fb8h8Y}*	O 㒪 ѩR? \
0ۛ	1Lm-LTcv %[EZ%JD;HI \
,⢁Y0>iʹz \
Ft1#$2jhHSPqtܭ,<	n}-@*ԩޖ\ѿikHg"qB9gk2

> *ĴV<ʃYY,3CK Xmca7qbKiYPt
# rjFOr򹝍rOI'.ZyN#p>`٘39YkhsDm4,6rYքW
&EÛ+F-,;WD.Xo
a|97-g3Vq"
0xLwxt/l6LlyLױhh=ś<ڦu]~?![^￟:^]귀,Y@ \
q\ߟ܀w~P(v^$L?Х)E^}g޼fWo_9m)Gbt{ \
s+o̹zGxkyN/,(%Q짰6k"u>$8-B"˼}Sm^^w*L̍_cFJTY<z:4VtKҗ3PN \
'5Wi*O}WSQBj@$ \
P&:~_b<&Fy7nY$n;C:iT〭Η(E4 
$O-zFݎQ3'+9^ߦ@;kZӂv*6Q)6ڏ	bݍ5[Kyy~}1ۣtNj	ți\Acz]Rad9+R4eyDx~
 X.ϻH
Ijz'{o bV[cy@%{(f' MDu_
&l`u/)moT!o'K0"8ZL)1a#bD2qȭ~'ȹ> \
_`d'ײ|q<I(P%nuA* \
]dy	`&!H-AZzl|C|	K  aq \
cȩB8F{@֍g):^1By.	}? \
yf3./zGG`Xgbx1<u~}|N2-1jpܝO%ZȳkW-Xr\	E&KR[vmk \
Y[jdψJ \
u#++Yx=N?>Ʌ.91,OX~;!~FmReT&>s"W>L5ml!ܨٿ:_~ek^E@W=ucs<cN \
DzzT$&>C%zw@Nй%M?|_{wJ]bHExxdPKuYG7EFcY< \
statsdlg_improvement_1.patchPKJ:



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tortoisesvn.tigris.org
For additional commands, e-mail: dev-help@tortoisesvn.tigris.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic