[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Filtering revisited.

On Fri, 1 Oct 1999, Joerg Dietrich <uzsv7x@xxxxxxxxxxx> wrote:
> IMHO filtering should be done with the newsreader and not with any
> tool belonging to the server. The only exception maybe a the
> aforementioned single-user dial-up box with a *slow* newsfeed.

Given enough "spam" and worthless postings in a newsgroup, or
large-enough articles and a small-enough proportion of interesting
articles, and any newsfeed can become unnecessarily slow.

Consider someone who wants only certain specific articles from
binary-only newsgroups.

Consider the fate of the poor fellow who
wants to download alt.sex.stories.* without downloading JPEGs, GIFs,
and "SEE OUR XXX HOT ADULY WEBSITE!!1!" messages.
    maxcrosspost = 4
alone gets most of this case, but that can be harmful in other
newsgroups where you large cross-posts are reasonable.

> > (2)  Allow for the current regular-expression matching to be
> >      wrapped by some sort of new filtering language that
> >      implements nesting, boolean logic, etc.
> Sounds very slow. Exactly the opposite of what you want if you
> filter with fetchnews.
> To emphasize it again: If you ask me increasing the speed of
> fetchnews is the top priority task in leafnode development.

But the important speed to me and others is connect time, not CPU
time, and disk space may be important too.  With a 56K modem, that's
about 5.6 kilobytes per second at most.  For example, suppose the
average article in a given newsgroup is 112KB long.  That's 20 seconds
per article at least.  If the filter gets rid of only 10% of all
articles, for example, filtering saves time if it takes less than 2
second per article on average.  It's utterly absurd to think that

    /^Newsgroups: .*alt\.sex/ &&
    (/^Newsgroups: [^,]*,[^,]*,[^,]*,/ ||
     /^Lines: .$/ ||
     /^Lines: 1.$/ ||
     /^Subject: .*\.(jpg|htm|html|img|video|rm)/)

would take more than milliseconds on an in-memory header on any
remotely modern machine.

Hell, I have a script that traverses my entire alt/sex/stories
directory tree, and searches for that and more, and (if not matched)
checks the entire article body for certain patterns, and it deletes
hundreds of files per second on an AMD 486-2.

                   *** NEW HOME E-MAIL ADDRESS ***
Tim McDaniel (home); Reply-To: tmcd@xxxxxxxx; 
if that fail, my work addresses are  tmcd@xxxxxxxxxxxxxx and tmcd@xxxxxxxxxxx
tmcd@xxxxxxxxxxxxxxxxx is a lie; tmcd@xxxxxxx is old and will go away.

leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list