[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Bandwidth limiter for fetch downloads?



Dieter Rohlfing wrote:

[Why is fetchnews so slow?]

> IMHO it's the way fetch communicates with the news-server. After it
> knows which articles are new, it sends for each new article a HEADER and
> - if the filtering doesn't fail - a BODY command to the news-server.
> 
> I use fetch with the delaybody option. Many of my news-groups are high
> traffic and I read only a small percentage of the articles. So, I'm
> mainly interested in the headers.
> 
> Instead of issueing for each article a HEADER command, you can send a
> XOVER command to the news-server and you'll get the most important
> headers (not all headers) for a specified range of articles, that means
> 1 command for N articles. The bigger N, the bigger the performance gain.

You mean that you write only the XOVER headers to a file?

Personally, I think that filtering based only on the XOVER headers is
unsatisfactory for me. I read alt.religion.scientology which is currently
overwhelmed by roughly a thousand forged postings per day (so-called
"sporgeries"), and the only way to reliably kill them is by killing on
NNTP-Posting-Host. Therefore, I am very reluctant to use XOVER information
only for filtering. (BTW, leafnode+ works exactly like this.)

If you think that filtering speed is unsatisfactory in 1.10b2, you
should, however, have a look at 1.9.3b6. It uses PCRE instead of libc
regexp which increases filtering speed by at least a factor of three.
(I didn't do benchmarks, but previously, running applyfilter on my
alt.religion.scientology spool would take at least half an hour whereas
now it takes roughly five minutes. I did some other optimizations in
applyfilter as well, so I guess that about half of the speed increase
is due to PCRE.) Furthermore, the regexp's of PCRE are documented :-)

Another reason that fetchnews is so slow is that it issues a HEADER command,
waits for receiving it, then issues a BODY command, waits for receiving
it, issues the next HEADER command etc. It should be much faster first
issuing all the HEADER commands and then all the BODY commands, possibly
with some overlap. The next major version increase of fetchnews will
hopefully do this (and possibly also parallelize getting news, although
I've heard that some providers don't like this at all and drop the
connection when they detect it).

--Cornelius.

-- 
/* Cornelius Krasel, U Wuerzburg, Dept. of Pharmacology, Versbacher Str. 9 */
/* D-97078 Wuerzburg, Germany   email: phak004@xxxxxxxxxxxxxxxxxxxxxx  SP4 */
/* "Science is the game we play with God to find out what His rules are."  */

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list