[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[leafnode-list] Overzealous filtering: Configuration hints



I am currently researching a report that leafnode was overzealously
filtering stuff. I have recently added more verbose XOVER messages, and
it turned out that most articles were discarded only after the download.

I have received the filter file of the user and take the freedom to
quote three lines here:

newsgroups = *
pattern = ^Subject:.*[\.](?i)mp3|mpg|mpeg|avi|wmv|rm|asf
action = kill

This filter may look OK at first glance, but is not. Particularly the
"rm" part will catch a lot of otherwise unsuspicous headers.

There are some things to note here:

The | operator switches either side. There is no limiter. I. e. the
filter above could also have been written as (I'm omitting the action
lines, you need to write them!):

newsgroups = *
pattern = ^Subject:.*[\.](?i)mp3
action = kill
pattern = mpg|mpeg|avi|wmv|rm|asf
action = kill

Note also that leafnode does NOT restrict the patterns to the "contents"
of a header, so if the NAME of a header contains "rm", it is also
matched.

The correct pattern line would read (you can omit the backslash in
character classes, that is, inside square brackets):

pattern = ^Subject:.*[.](?i)(mp3|mpg|mpeg|avi|wmv|rm|asf)

or, slightly more efficient, because non-capturing:

pattern = ^Subject:.*[.](?i:mp3|mpg|mpeg|avi|wmv|rm|asf)

So: mind your parentheses when using the alternation operator in PCRE.

-- 
Matthias Andree

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list