[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Filtering revisited.



Lloyd Zusman wrote (already some time ago):

> I'm wondering, therefore, if perhaps we could re-open some discussions
> about ways that we could enhance the filtering capabilities that
> leafnode offers.  I tend to like score-based systems, although I'd be
> in favor of anything that would allow me to do the following kind of
> filtering (this is an actual subset of some of filtering I do today
> via `slrnpull', except I changed specific names of specific people
> to avoid causing possible offense):
> 
>   For all newsgroups:
> 
>     Always avoid these articles:
> 
>         From:.*person_a@xxxxxxxxxx
>     or  From:.*person_b@xxxxxxxxxx
>     or  Subject:.*FREE\>
>     or  Subject:.*fast.?money
>     or  Subject:.*\$MONEY
>     or  Subject:.*\$\$
>     or  Number of lines in article exceeds 500
> 
>   If an article has made it this far then
>   do these further checks:
> 
>    For newsgroup news.software.readers:
> 
>      Avoid all articles unless ...
> 
>          Subject:.*slrn
>      or  Subject:.*gnus
>      or  From:.*certain_person_a
>      or  From:.*certain_person_b
>      or  (From contains certain.domain AND subject contains "hate")
> 
>    For other newsgroups, similar but not identical logic.

There are two things that bothered me with Lloyd's first patch. First,
the flex source code didn't work on all machines tested. If I recall
correctly, I couldn't get it to work on Solaris with GNU flex. Therefore
it might be desirable (if cumbersome) to use a handwritten parser instead.

Second (I guess I won't make friends with this one :-), I had an awfully
hard time remembering the syntax of the new filterfile (the brackets were
awfully confusing to me). Therefore, I would like to propose a different
one. I suggest that an expression should look like the following (borrowed
from tin):

Newsgroups: [newsgroups glob]
Regex: [some arbitrary regex]
Action: kill|keep|[number]

An example for Lloyd's wishes above:

# For all newsgroups
Newsgroups: *
Regex: From:.*person_a@xxxxxxxxxx
Action: kill
Newsgroups: *
Regex: From:.*person_b@xxxxxxxxxx
Action: kill

and so on (one might consider instead to construct syntax like this:

Newsgroups: *
Regex: From:.*person_a@xxxxxxxxxx
Regex: From:.*person_b@xxxxxxxxxx
Regex: Subject:.*FREE\>
Action: kill

but it will be more difficult to program this).

# For newsgroup news.software.readers:

Newsgroup: news.software.readers
Regex: Subject:.*slrn
Action: keep
Newsgroup: news.software.readers
Regex: Subject:.*gnus
Action: keep
Newsgroup: news.software.readers
Regex: *
Action: kill

More complicated stuff could be achieved by scoring. For example,
consider a hypothetical newsgroup "leafnode.local" which contains
lots of boring things which you don't want to read. However, you
want to read everything about scoring but not if it comes from
foo@bar because he is an obnoxious guy. There are at least two
possibilities to do this.

1) use only "kill" and "keep"

# first, kill everything from foo@bar
Newsgroup: leafnode.local
Regex: From: foo@bar
Action: kill
# second, keep everything else about scoring
Newsgroup: leafnode.local
Regex: Subject:.*scoring
Action: keep
# finally, kill the rest
Newsgroup: leafnode.local
Regex: *
Action: kill

2) use scoring

# first, score everything in the newsgroup below zero
Newsgroup: leafnode.local
Regex: *
Action: -1
# second, score everything from foo@bar even lower
Newsgroup: leafnode.local
Regex: *
Action: -5
# third, score everything about scoring up
Newsgroup: leafnode.local
Regex: Subject:.*scoring
Action: +2

That would leave the usual article in "leafnode.local" at -1, articles
from foo@bar in "leafnode.local" at -6 except when they are about scoring
(at -4) and all other articles in "leafnode.local" about scoring at +1.

IMO, this format has these advantages:

1) A parser for this should be very easy to program. No need for lex.
2) The format is (to me, of course your mileage may vary :-) quite
   self-explanatory.
3) More flexibility and speed than with the current setup (because
   currently, you use each filter for all newsgroups)

--Cornelius.

-- 
/* Cornelius Krasel, U Wuerzburg, Dept. of Pharmacology, Versbacher Str. 9 */
/* D-97078 Wuerzburg, Germany   email: phak004@xxxxxxxxxxxxxxxxxxxxxx  SP4 */
/* "Science is the game we play with God to find out what His rules are."  */

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list