[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] 2.0b8_ma3 leafnode very slow



Stefan Wiens <s.wi@xxxxxxx> writes:

> Libc asks isatty() to find the appropriate buffering mode.

Might be, GNU libc 2.1.3 and FreeBSD 4.3 libc (I did not check others)
are documented like this, however, setting a 16 kB buffer helped speed
things up a little here (GNU libc), for whatever reason, I don't know.

> When connected via inetd(8), it chooses full buffering (4096 chars).
> Maybe some inetd replacement makes stdout look like a tty?

My xinetd doesn't, isatty(0) -> 0. 

It seems relying on libc to set fully-buffered mode is not a good idea.

> But in most cases it's completely useless. When parsing .overview, one
> could at no cost check the lines for ascending order and skip
> sorting. Besides that, when adding new lines, some kind of insertion
> sort is more efficient than qsort(). I am currently testing a separate
> program for updating .overview files (after fetchnews). It takes the
> same time as fixxover(), but with only 5% of its memory usage.

What is the highest memory usage you see in fixxover that makes you
optimize this function which is sort of a one-shot use, runs for a
couple of seconds?

> The same considerations apply for active information. Because
> groupinfo is already sorted (normally), qsort() is overkill.
> With 20,000 groups, qsort() needs more than 200,000 strcmp() calls
> where 20,000 would be sufficient.

My current version calls mergegroups less often than _ma4, which helps
already. I believe the readlocalgroups() in rereadactive() is the
culprit here. I'll see what I can do to cut the qsort invocations down.

I'd rather not see what happens when a newly-fetched active is sorted
with a "weaker" sort. You cannot rely on your servers sending sorted
lists. (t-online, for instance, doesn't).

It might be worthwhile looking at a natural mergesort to cut the best
(and common) case to O(n), taking advantage of "almost sorted"
condition, and in the same run restrict the worst case to O(n log n) --
quick sort has O(n^2) worst-case. Not all implementations use a median
selection to avoid this case.

Mergesort OTOH requires an additional pointer array.

That would certainly upset you :-) (although my newsgroup names usually
take much more space than the pointers to the names).

A heap sort variant (bottom-up, weak-heap sort) might be another
candidate, taking less additional memory than mergesort but is less
simple than mergesort is. If you find a BSD or MIT-licensed variant of
weak-heapsort, best used with qsort-style arguments, give it a shot and
report back.

Finally, there's radixsort, but that only suits for replacing qsort
where it calls upon strcmp or strcasecmp.

-- 
Matthias Andree

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list