[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[leafnode-list] Filtering revisited.
Filtering is very useful for those of us who use leafnode to manage a
small news spool behind a relatively slow connection to an upstream
server. However, the currently available filtering capabilities in
leafnode don't allow me to take full advantage of the filtering that
would be ideal for me.
I wrote and submitted a score-based filtering module for leafnode as a
proof of concept several months ago, and this module was able to do
the more sophisticated filtering that I would like to use. I based
the scoring capabilities very loosely on those used by `slrnpull',
which I've been using instead of leafnode lately solely because of its
better filtering. I'd like to make use of leafnode, however, since it
allows me to merge upstream servers, and `slrnpull' does not support
this.
I'm wondering, therefore, if perhaps we could re-open some discussions
about ways that we could enhance the filtering capabilities that
leafnode offers. I tend to like score-based systems, although I'd be
in favor of anything that would allow me to do the following kind of
filtering (this is an actual subset of some of filtering I do today
via `slrnpull', except I changed specific names of specific people
to avoid causing possible offense):
For all newsgroups:
Always avoid these articles:
From:.*person_a@xxxxxxxxxx
or From:.*person_b@xxxxxxxxxx
or Subject:.*FREE\>
or Subject:.*fast.?money
or Subject:.*\$MONEY
or Subject:.*\$\$
or Number of lines in article exceeds 500
If an article has made it this far then
do these further checks:
For newsgroup news.software.readers:
Avoid all articles unless ...
Subject:.*slrn
or Subject:.*gnus
or From:.*certain_person_a
or From:.*certain_person_b
or (From contains certain.domain AND subject contains "hate")
For other newsgroups, similar but not identical logic.
As I mentioned, I've been able to encode this sort of thing using the
scoring capabilities of `slrnpull', but only a subset of this
filtering is possible using the regex-based matching capabilities that
leafnode currently offers.
So ... I'd like to offer some suggestions about some possible methods
for more sophisticated scoring within leafnode, and perhaps we could
discuss the merits and drawbacks each as a possible leafnode
enhancement in the future:
(1) Score-based filtering rules, with the current filtering
capabilities being a subset of this so that existing
filter files will still work (I already wrote and
submitted a prototype of this a few months ago).
(2) Allow for the current regular-expression matching to be wrapped
by some sort of new filtering language that implements nesting,
boolean logic, etc.
(3) Allow for pluggable, optional filtering modules that could
be written in C and dynamically linked with the leafnode
exectuables. A library of convenience routines could be
supplied to aid the the writers of these optional modules
to make it easier to do things like locating and extracting
information about headers, article size, newsgroups, etc. This
would allow the leafnode administrators to create arbitrarily
sophisticated filtering mechanisms that run relatively quickly.
(4) Allow for some sort of embedded scripting language to be
optionally built into leafnode ... python comes to mind, as does
perl. This would allow for scripts to be written that serve the
same function as the pluggable, dynamically linked modules I
described above in (3). This would run a bit slower than the
approach that uses the dynamically linked C modules, but it would
probably be easier to use.
At any rate, these four things come to mind, but I'm sure that there
also are other possibilities which could be just as useful.
What do all you folks think? I'm optimistic that we could agree on
something that would be quite useful and not all that hard to
implement.
--
Lloyd Zusman
ljz@xxxxxxxxxx
--
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list