[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Filtering revisited.



On Fri, Oct 01, 1999 at 06:21:31AM +0100, Mark Brown wrote:
> On Fri, Oct 01, 1999 at 04:29:05AM +0200, Joerg Dietrich wrote:
> > effective for small leaf sites. These sites are admittedly often dial-up
> > boxes used by single person. This is the only case in which filtering
> > by the news service  providing package can be useful. In all other cases
> > different users will have different criteria to unselect articles.
> 
> I can imagine that in a lot of other cases sufficient communication
> between the users is possible to work out some mutually acceptable
> arrangement ("You don't read alt.sex.hamsters, so leave it alone." or
> "Joe is a real idiot." "Sure is - let's plonk him.").

Yes, you can block entire newsgroups with the current leafnode filter.
No, you don't want to do this. Offering a group, subscribing to it and
filtering it by downloading all headers and checking the newsgroups:
line would probably qualify you as idiot of the week (sorry, if anybody
does so). Leafnode currently has no sensible mechanism of blocking
entire groups (although a comparison in the groupinfo algorithm with a
blacklist should be relatively easy to implement).
	Community plonks may occur from time to time but they are very
unlikely. I probably couldn't even agree with my girl-friend (she
doesn't read a lot of news, so I don't know) on whom to plonk.
	Exceptions may always occur, but filtering as I understand it is
aimed at single-user dial-up boxes.

> > I believe, that if Leafnode was able to either open multiple connections
> > , and/or fetch asynchronously, or use UUCP, filtering would be
> > superflous because the filtering process would consume more time than
> > the fetching of the article itself. 
[...]
> The main problem in Leafnode is that it doesn't preload the connection.
> At present, it sends a command to the server, processes the response,
> sends the next command and so on.  Even in the simple, non-filtering
> case this is suboptimal - there is a pause between each article being
> sent where the connection idles.  A faster system which several other
> news pullers implement is to commands in advance, before the server is
> ready to read them.  That way you minimise the time the server takes to
> respond - it can be processing a command while the previous results are
> still being transmitted over the network.

Didn't I just say this? Maybe we have a misunderstanding about what
asynchronous fetching is.

> This would produce a speed improvement for all users - it is possible to
> saturate the modem with this method.  It would also mean that instead of
> making the connection idle any filtering can take place while waiting
> for the next batch of data to download which (given the typical relative
> costs of bandwidth and CPU) should give plenty of time for even a very
> complex set of filters to run.

Ooh yes, I had a misconception here. You're right. Thanks for clarifying
this. 
	This turns my whole argumantation around. We need a faster
fetchnews in order not to slow down the fetching (doesn't shift the
priorities :-), provided the filtering is not too complex.

> > > (2)  Allow for the current regular-expression matching to be wrapped
> > >      by some sort of new filtering language that implements nesting,
> > >      boolean logic, etc.
> one I use) it's a trivial computation - match on the regexp and then add
> an appropriate value to the score.  If after applying all filters the

Of course, boolean logic and integer computations are fast, that's what
computer's are all about :-) Problems could arise if you have a lot of
(very complex) regexp's. They are slow. I don't dare to make any further
predictions here. We probably would have to try it andrun some
benchmarks.

[plugable filter module]
> > This sounds interesting. It would leave the true server/fetchnews code
> > intact and would allow everybody to do what he wants to do. If any
> > filtering at all, than this is IMHO the method of choice. The existing
> > filtering could be used as a module and everybody would be happy :-)
> 
> You'd probably want a standard set and some idiot proof way of
> configuring them - one of the really nice things about Leafnode right
> now is that it's utterly simple to get working.  You would probably also

I don't think there is an idiot proof way of filtering at all. The
current regexp's are very complicated.

[same in perl]
> Remember that perl is compliled at startup, so if you keep the
> same copy of the script running throughout the download the overhead is
> pretty low.

Forgot about that. You're right.

> In any case, filtering doesn't need to be fast - it just needs to be
> faster than the download.

If you wait for batches to arrive, yes; if filtering delays the issue of
the next NNTP command (as it is now), no.

> You know you want to make an anonymous CVS repository :-) .

I wholeheartedly second this!

Rgds,
	Jo:rg

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list