[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [leafnode-list] Filtering revisited.
Lloyd Zusman wrote (already some time ago):
> I'm wondering, therefore, if perhaps we could re-open some discussions
> about ways that we could enhance the filtering capabilities that
> leafnode offers. I tend to like score-based systems, although I'd be
> in favor of anything that would allow me to do the following kind of
> filtering (this is an actual subset of some of filtering I do today
> via `slrnpull', except I changed specific names of specific people
> to avoid causing possible offense):
> For all newsgroups:
> Always avoid these articles:
> or From:.*person_b@xxxxxxxxxx
> or Subject:.*FREE\>
> or Subject:.*fast.?money
> or Subject:.*\$MONEY
> or Subject:.*\$\$
> or Number of lines in article exceeds 500
> If an article has made it this far then
> do these further checks:
> For newsgroup news.software.readers:
> Avoid all articles unless ...
> or Subject:.*gnus
> or From:.*certain_person_a
> or From:.*certain_person_b
> or (From contains certain.domain AND subject contains "hate")
> For other newsgroups, similar but not identical logic.
There are two things that bothered me with Lloyd's first patch. First,
the flex source code didn't work on all machines tested. If I recall
correctly, I couldn't get it to work on Solaris with GNU flex. Therefore
it might be desirable (if cumbersome) to use a handwritten parser instead.
Second (I guess I won't make friends with this one :-), I had an awfully
hard time remembering the syntax of the new filterfile (the brackets were
awfully confusing to me). Therefore, I would like to propose a different
one. I suggest that an expression should look like the following (borrowed
Newsgroups: [newsgroups glob]
Regex: [some arbitrary regex]
An example for Lloyd's wishes above:
# For all newsgroups
and so on (one might consider instead to construct syntax like this:
but it will be more difficult to program this).
# For newsgroup news.software.readers:
More complicated stuff could be achieved by scoring. For example,
consider a hypothetical newsgroup "leafnode.local" which contains
lots of boring things which you don't want to read. However, you
want to read everything about scoring but not if it comes from
foo@bar because he is an obnoxious guy. There are at least two
possibilities to do this.
1) use only "kill" and "keep"
# first, kill everything from foo@bar
Regex: From: foo@bar
# second, keep everything else about scoring
# finally, kill the rest
2) use scoring
# first, score everything in the newsgroup below zero
# second, score everything from foo@bar even lower
# third, score everything about scoring up
That would leave the usual article in "leafnode.local" at -1, articles
from foo@bar in "leafnode.local" at -6 except when they are about scoring
(at -4) and all other articles in "leafnode.local" about scoring at +1.
IMO, this format has these advantages:
1) A parser for this should be very easy to program. No need for lex.
2) The format is (to me, of course your mileage may vary :-) quite
3) More flexibility and speed than with the current setup (because
currently, you use each filter for all newsgroups)
/* Cornelius Krasel, U Wuerzburg, Dept. of Pharmacology, Versbacher Str. 9 */
/* D-97078 Wuerzburg, Germany email: phak004@xxxxxxxxxxxxxxxxxxxxxx SP4 */
/* "Science is the game we play with God to find out what His rules are." */
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list