[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [leafnode-list] Filtering revisited.
On Fri, Oct 01, 1999 at 06:21:31AM +0100, Mark Brown wrote:
> On Fri, Oct 01, 1999 at 04:29:05AM +0200, Joerg Dietrich wrote:
> > effective for small leaf sites. These sites are admittedly often dial-up
> > boxes used by single person. This is the only case in which filtering
> > by the news service providing package can be useful. In all other cases
> > different users will have different criteria to unselect articles.
>
> I can imagine that in a lot of other cases sufficient communication
> between the users is possible to work out some mutually acceptable
> arrangement ("You don't read alt.sex.hamsters, so leave it alone." or
> "Joe is a real idiot." "Sure is - let's plonk him.").
Yes, you can block entire newsgroups with the current leafnode filter.
No, you don't want to do this. Offering a group, subscribing to it and
filtering it by downloading all headers and checking the newsgroups:
line would probably qualify you as idiot of the week (sorry, if anybody
does so). Leafnode currently has no sensible mechanism of blocking
entire groups (although a comparison in the groupinfo algorithm with a
blacklist should be relatively easy to implement).
Community plonks may occur from time to time but they are very
unlikely. I probably couldn't even agree with my girl-friend (she
doesn't read a lot of news, so I don't know) on whom to plonk.
Exceptions may always occur, but filtering as I understand it is
aimed at single-user dial-up boxes.
> > I believe, that if Leafnode was able to either open multiple connections
> > , and/or fetch asynchronously, or use UUCP, filtering would be
> > superflous because the filtering process would consume more time than
> > the fetching of the article itself.
[...]
> The main problem in Leafnode is that it doesn't preload the connection.
> At present, it sends a command to the server, processes the response,
> sends the next command and so on. Even in the simple, non-filtering
> case this is suboptimal - there is a pause between each article being
> sent where the connection idles. A faster system which several other
> news pullers implement is to commands in advance, before the server is
> ready to read them. That way you minimise the time the server takes to
> respond - it can be processing a command while the previous results are
> still being transmitted over the network.
Didn't I just say this? Maybe we have a misunderstanding about what
asynchronous fetching is.
> This would produce a speed improvement for all users - it is possible to
> saturate the modem with this method. It would also mean that instead of
> making the connection idle any filtering can take place while waiting
> for the next batch of data to download which (given the typical relative
> costs of bandwidth and CPU) should give plenty of time for even a very
> complex set of filters to run.
Ooh yes, I had a misconception here. You're right. Thanks for clarifying
this.
This turns my whole argumantation around. We need a faster
fetchnews in order not to slow down the fetching (doesn't shift the
priorities :-), provided the filtering is not too complex.
> > > (2) Allow for the current regular-expression matching to be wrapped
> > > by some sort of new filtering language that implements nesting,
> > > boolean logic, etc.
> one I use) it's a trivial computation - match on the regexp and then add
> an appropriate value to the score. If after applying all filters the
Of course, boolean logic and integer computations are fast, that's what
computer's are all about :-) Problems could arise if you have a lot of
(very complex) regexp's. They are slow. I don't dare to make any further
predictions here. We probably would have to try it andrun some
benchmarks.
[plugable filter module]
> > This sounds interesting. It would leave the true server/fetchnews code
> > intact and would allow everybody to do what he wants to do. If any
> > filtering at all, than this is IMHO the method of choice. The existing
> > filtering could be used as a module and everybody would be happy :-)
>
> You'd probably want a standard set and some idiot proof way of
> configuring them - one of the really nice things about Leafnode right
> now is that it's utterly simple to get working. You would probably also
I don't think there is an idiot proof way of filtering at all. The
current regexp's are very complicated.
[same in perl]
> Remember that perl is compliled at startup, so if you keep the
> same copy of the script running throughout the download the overhead is
> pretty low.
Forgot about that. You're right.
> In any case, filtering doesn't need to be fast - it just needs to be
> faster than the download.
If you wait for batches to arrive, yes; if filtering delays the issue of
the next NNTP command (as it is now), no.
> You know you want to make an anonymous CVS repository :-) .
I wholeheartedly second this!
Rgds,
Jo:rg
--
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list