[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[leafnode-list] Filtering revisited.



Filtering is very useful for those of us who use leafnode to manage a
small news spool behind a relatively slow connection to an upstream
server.  However, the currently available filtering capabilities in
leafnode don't allow me to take full advantage of the filtering that
would be ideal for me.

I wrote and submitted a score-based filtering module for leafnode as a
proof of concept several months ago, and this module was able to do
the more sophisticated filtering that I would like to use.  I based
the scoring capabilities very loosely on those used by `slrnpull',
which I've been using instead of leafnode lately solely because of its
better filtering.  I'd like to make use of leafnode, however, since it
allows me to merge upstream servers, and `slrnpull' does not support
this.

I'm wondering, therefore, if perhaps we could re-open some discussions
about ways that we could enhance the filtering capabilities that
leafnode offers.  I tend to like score-based systems, although I'd be
in favor of anything that would allow me to do the following kind of
filtering (this is an actual subset of some of filtering I do today
via `slrnpull', except I changed specific names of specific people
to avoid causing possible offense):

  For all newsgroups:

    Always avoid these articles:

        From:.*person_a@xxxxxxxxxx
    or  From:.*person_b@xxxxxxxxxx
    or  Subject:.*FREE\>
    or  Subject:.*fast.?money
    or  Subject:.*\$MONEY
    or  Subject:.*\$\$
    or  Number of lines in article exceeds 500

  If an article has made it this far then
  do these further checks:

   For newsgroup news.software.readers:

     Avoid all articles unless ...

         Subject:.*slrn
     or  Subject:.*gnus
     or  From:.*certain_person_a
     or  From:.*certain_person_b
     or  (From contains certain.domain AND subject contains "hate")

   For other newsgroups, similar but not identical logic.

As I mentioned, I've been able to encode this sort of thing using the
scoring capabilities of `slrnpull', but only a subset of this
filtering is possible using the regex-based matching capabilities that
leafnode currently offers.
     
So ... I'd like to offer some suggestions about some possible methods
for more sophisticated scoring within leafnode, and perhaps we could
discuss the merits and drawbacks each as a possible leafnode
enhancement in the future:

(1)  Score-based filtering rules, with the current filtering
     capabilities being a subset of this so that existing
     filter files will still work (I already wrote and
     submitted a prototype of this a few months ago).

(2)  Allow for the current regular-expression matching to be wrapped
     by some sort of new filtering language that implements nesting,
     boolean logic, etc.

(3)  Allow for pluggable, optional filtering modules that could
     be written in C and dynamically linked with the leafnode
     exectuables.  A library of convenience routines could be
     supplied to aid the the writers of these optional modules
     to make it easier to do things like locating and extracting
     information about headers, article size, newsgroups, etc.  This
     would allow the leafnode administrators to create arbitrarily
     sophisticated filtering mechanisms that run relatively quickly.

(4)  Allow for some sort of embedded scripting language to be
     optionally built into leafnode ... python comes to mind, as does
     perl.  This would allow for scripts to be written that serve the
     same function as the pluggable, dynamically linked modules I
     described above in (3).  This would run a bit slower than the
     approach that uses the dynamically linked C modules, but it would
     probably be easier to use.

At any rate, these four things come to mind, but I'm sure that there
also are other possibilities which could be just as useful.

What do all you folks think?  I'm optimistic that we could agree on
something that would be quite useful and not all that hard to
implement.

-- 
 Lloyd Zusman
 ljz@xxxxxxxxxx

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list