[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[leafnode-list] Re: got some lua scripting examples?



On Mon-2009/04/13-08:47 Troy Piggins wrote:

> Finally have been able to compile leafnode 2 with --enable-lua with
> these latest couple of versions.  But I have no idea about the lua
> scripting language yet.
>
> Just curious what scripts have been used so far.  What do you this are
> most useful?
>
> In particular I think it'd be useful for filtering/scoring based on
> body content.  What do you think?  Examples?  Pointers?

Actually, leafnode2 comes with a Lua module called "ln2_distmod.lua"
("leafnode2 distribution module"), which is configured using variables
in "scripthooks.lua" installed as "scripthooks.lua.dist".  Normally
you would copy "scripthooks.lua.dist" to "scripthooks.lua" and edit to
taste.

The module works by searching an article and score it up/down based on
pattern specifications.  For scoring on the body, see variable
"ScoredMatches".  It is a nested table structure.  Every top-level entry
in it can be recognized by the leading >>["group-pattern"]<<, which
tells the Lua code which group the following rules apply to.

Example:

    ["^de%.etc%.finanz"] = {
        function(g, a, s)
            if art_search(a, search_body, {
                "www%.Gamblingworld%.de",
            }) then return 5*reject_cutoff else return 0 end
        end,
        function(g, a, s)
            if art_search(a, "X-Newsreader", {"aktienboard.forums",}) or
               art_search(a, "User-Agent", {"aktienboard.forums",}) or
               art_search(a, "Message-ID", {"@msgid%.aktienboard%.com>",})
               then return 5*reject_cutoff else return 0 end
        end,
        function(g, a, s)
            if art_search(a, "from", {
                "TruckerMagazin@googlemail%.com",
                "PokerKarte@googlemail%.com", "roulette_magazin@yahoo%.",
            }) then return 5*reject_cutoff else return 0 end
        end,
    },
 
This is an exerpt of what I do to the poor articles in group
"de.etc.finanz".  The table entry consists of a number of Lua functions,
which call "art_search" to find certain characteristics of an article.
Each of them is itself called with three arguments: the current group
("g"), current article ("a") and the body of the article ("s").
"art_search()" uses only the "a" argument, it contains everything it
needs, including the article headers and body.  Note the search term
"search_body": it instructs "art_search()" to find patterns in the body,
in this case mentioning the spammers site gamblingworld.de.  If it is
found, matching articles are scored down to five times the reject cutoff
threshold.  I put it so high in order to have matching articles
cancelled even when other patterns score it up.

For spam-removal I run articles through bogofilter, which can be enabled
in scripthooks.lua.  For this you need a bogofilter token database
accessible to the "news" system-user.  So basically I have "knockout"
patterns in "ScoredMatches" and the bogofilter.  In order to avoid
articles becoming "victim" to aggressive filtering, there are a number
of ways:  groups can be made "immune" by switching off bogofiltering for
them, by setting their "Scorelimit" limit very high, by biasing them in
group specific "ScoredMatches" entries and so on.  You can also make
groups more sensitive using the same mechanisms.

If you want to see what the filters do and to catch the ones that
shouldn't disappear, you'd need a bunch of local groups where the
loosers are collected.  I have this in "/etc/leafnode/local.groups":

local.archive.default	y	local archive, Xpost everything else here
local.archive.freebsd	y	local archive, Xpost freebsd-stuff here
local.archive.leafnode	y	local archive, Xpost leafnode-stuff here
local.archive.linux	y	local archive, Xpost linux-stuff here
local.archive.tronix	y	local archive, Xpost electronix-stuff here
local.test	y	local test group
local.spam	y	local container for spam caught on USENET
local.reject	y	local container for spam caught on USENET

"ln2_distmod.lua" will put rejected articles into "local.reject" and
spammish articles above the rejection threshold into "local.spam".
I also have a cron-job expiring these groups much more aggressively than
the others, while local.archive.* is not expired at all.

If you don't want scoring this way, you can also use keyword searches.
It uses the same article searching, but you get to specify the groups
you want matching articles to be moved to.  I use this feature for
archiving, not spam scoring.  The configuration variable in
scripthooks.lua is called "KeywordSearches".  It too is a table, indexed
by group names.  The entries are lists, whose entries are headed off by
a string specifying the group(s) where matching articles should go and
then any number of search patterns given to art_search(), which are
or'ed for determining a match.  Keywords allow to change the receiving
group, and it is possible to even save entire threads after an article
was found.

Documentation is in "README-scripting.txt" and "README-tech.txt" in the
distribution.  The configuration file scripthooks.lua is heavily
documented as well.  The "standard" book on Lua is "PIL":  "Programming
in Lua" by Roberto Ierusalimschy, but there have been a number of books
put out recently.  Lua is a little like javascript and easy to learn.

I hope this was helpful and not "too much"  8-)


clemens

-- 
_______________________________________________
leafnode-list mailing list
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
https://www.dt.e-technik.uni-dortmund.de/mailman/listinfo/leafnode-list
http://leafnode.sourceforge.net/