[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Filtering revisited.



krasel@xxxxxxxxxxxxxxxxxxxxxxxxxxxx (Cornelius Krasel) writes:

> [ ... ]
> 
> With regard to Lloyd's proposal, I would prefer a version of Leafnode
> in which I could hook cleanfeed which probably comes closest to point
> 3 on his list. However, so far I have been unable to exactly find out
> how cleanfeed works :-)

I checked it out, and it seems to go something like this:

Cleanfeed has two modes, one mode where it gets invoked whenever the
newsfeed software (for example, INN) needs it.  This seems rather
inefficient to me, but perhaps there are INN-specific protocols that I
don't understand which would make this run faster than it seems.

Another mode used is "standalone mode".  Cleanfeed uses that with
newsservers such as Typhoon, and this one seems the most promising for
use with leafnode.  In standalone mode, cleanfeed is running as a
separate process and it expects to get articles piped to it via stdin.
It then writes its results back to the newsserver by means of another
pipe via its stdout.

In order to use cleanfeed in this fashion with leafnode, there would
only need to be a way to optionally pipe articles through another
process which would pipe back the filtering results.  In this way,
leafnode itself wouldn't need to be modified very much, and there
would be the added benefit of people plugging in their own filters if
they don't like cleanfeed for some reason ... and of other people
not plugging in *any* filters if they prefer not to do filtering


The fact that cleanfeed runs as a separate filter process means that
its startup time (which could be significant because it's written in
Perl) would only be incurred once, and in a separate process to the
leafnode process.

Some small amount of further investigation would be necessary to
determine the exact information that cleanfeed sends back through its
stdout pipe, and then any leafnode changes (adding the capability to
optionally pipe articles through a filter) should be fairly quick and
straightforward to add.

And yes, this indeed is close to my option number 3.


And as a general response to some of the points that have been raised
in response to my original message, I also consider article filtering
to be a very worthwhile process, and I agree that in the way that I
and probably many people would do filtering, the cpu time and clock
time needed to do this filtering tends to be far less than the
resources (time included) needed to process the articles that are
normally filtered out.  This effect is even more pronounced for people
like me with slow upstream connections.

I only am downloading news onto my personal computer at home (a Linux
box).  If I was managing the news at an ISP or some other kind of
multi-user site, I wouldn't do the same kind of filtering, but nor
would I object to using a version of leafnode that has the
*capabilities* to sophisticated filtering ... I just would opt out of
using most of it under the multi-user case.

In no way was I advocating re-writing leafnode so that everyone has
to use some sort of expensive and complicated filtering mechanism,
whether they like it or not.  All 4 of my options were just that:
*options*, and people who wouldn't want to make use of the filtering
would never have to incur the costs of doing so ... and those who
would want filtering could then make use of it.


I wrote a Perl-based newsgroup downloader (a poor-man's `leafnode')
which filters out a goodly percentage of spam, but only on the results
of `xhdr' and `xover' results.  On my 56K line, this filtering
decreased my network utilization by something like 80 percent over the
non-filtering case.  For me, this shows that filtering is quite
worthwhile.


I'll look more into the details of cleanfeed and post my results a
little later.  Perhaps this "optional standalone filter" mechanism
would indeed be useful for leafnode.

-- 
 Lloyd Zusman
 ljz@xxxxxxxxxx

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list