[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Fetchnews: Falling back and regrouping.



I've been a bit slow responding to this - sorry.  

I just added an 80 gig drive for my news spool.  Unfortunately, this box's
motherboard has a legacy BIOS, so I've been fighting lockups and other
disk size issues.  I think I've finally got it whipped, although I now
know _far_ more then I've ever wanted to know about GRUB, the GRand
Unified Bootloader that RedHat is using instead of good old LILO.

At least now I'll be able to wait just a wee bit longer before I need to  
force a drastic expire of everything in the spool.

Also, my incoming mail server was periodically down because of this, so I
may have missed something in this thread.  (Yes, I am currently running
mail and news on the same server.  Foolish, I know...)  If anyone has
posted relating to this and it bounced back, please resend it to me.  I
DID receive two posts from jom1@xxxxxxxxxx, one on this thread, and one
about setting a negative maxfetch to fetch only the OLDEST articles rather
than the newest.

So, back into the fray...

On Tue, 26 Mar 2002, Matthias Andree wrote:

> Michael O'Quinn schrieb am Montag, den 25. März 2002:
> 
> > I am of course referring to the /var/spool/news/leaf.node/SERVERNAME file.
> > 
> > I can trace most of my heartache and headache with leafnode since I first
> > started using it to this single file.
> > 
> > To wit:
> > 
> > (1) When fetchnews for some unknown reason terminates before the end of
> > it's run, the SERVERNAME file is not updated.
> 
> True.
> 
> > (2) When fetchnews manages to complete it's run, but for some reason skips
> > 10,000 or so articles in one group (that I've confirmed ARE there), the
> > SERVERNAME file is not updated.
> 
> I have yet to see why, because on my machine, this does not show.

NewsPlex may be suspect here -- the logs seem to be inconclusive, at least
to me.  I've recently posted the output from a recent run.  Perhaps that
will shed some light.

Still, regardless of WHY it terminates, the WHAT of not updating the state
info is still a problem.

> 
> > (3) When I terminate fetchnews early via <CTRL-C>, the SERVERNAME file is
> > not updated.
> 
> Not easily fixed. 

Why?

I'd think you could just trap for the SIG, and get on with saving the 
current state before exiting.

> In this case, however, the SERVERNAME~ file should be
> there and contain some useful information, can you verify that on your
> machine?

When I press <CTRL-C> the old one is not deleted, updated, nor touched in
any way.

> 
> > (4) When a group has been fetched in the past, but is not being fetched in 
> > this run, that group is DELETED ENTIRELY from the SERVERNAME file.
> 
> True. The reasoning behind this is: presume you don't read that group
> for another four weeks. Say, in these four weeks, 28,000 new articles
> arrive. If we kept this state information, we would fetch all 28,000 new
> articles (or however many of them remain on the server) on the next run,
> because no initiallimit would apply. artlimit is just a protection
> against runaway fetchmail.

Which is exactly the behavior I would expect and want.  If I don't want
all 28,000 articles I will set maxfetch to something lower.  Is that not
what it's for?  This is, as well as I can tell, undocumented behavior.  
If you REALLY think it's necessary to automagically delete state info from
the SERVERNAME file, make it configurable so such bad behavior can be 
turned off.

BTW, by "artlimit" do you mean "maxfetch"?

> 
> > (5) No history file ala INN is produced.  Producing a history file
> > containing the message-id's of all retrieved messages would help a lot
> > with the broken SERVERNAME file, but this isn't done.
> 
> True. However, history files reek of file locking, and avoiding file
> locking as far as possible is not a bad thing. Admittedly, if nntpd was
> to NOT use this file, it would not matter. However, I'd not use a plain
> text file for this, but something like gdbm or Berkeley db
> (www.sleepycat.com), and I'm not introducing anything like this into 1.9
> any more. Effectively, 1.9.20.rel is ready and will be released soon,
> without any further changes.

Introducing a history database would be a major change, one I certainly 
wouldn't expect without a lot of soul-searching.  It would have it's 
benefits, but it may not be entirely necessary if the SERVERNAME file was 
working better, since that (sort of) keeps track of what articles have and 
have not been retrieved.

I completely agree that such a major change is something for another 
discussion and time.

> 
> > (6) Since the SERVERNAME file is not getting updated and no history file
> > is produced, the only state information about a group is contained within
> > the articles themselves.  Once the articles are expired, all state
> > information from the upstream server about that group is lost.
> 
> That's true. 

O.K.

> It'd be useful to set maxage from the expire time
> automatically, and I can envision that feature for 1.9.21.

Huh?

> 
> > (7) When the news spool fills up, the SERVERNAME file falls off the
> > data-bus.  [This already been fixed with a patch against 1.9.20.rc9.  I
> > haven't applied this patch, nor have I had to opportunity to test it, so
> > I'm not sure how it currently acts when the spool fills up.]
> 
> The patch makes sure that the old SERVERNAME file remains in place.
> 
> I have thought if I should read both the (older) SERVERNAME and the
> complete lines from the (newer) SERVERNAME~ file and take whichever has
> the higher number. This will not be implemented in 1.9.20, it may be for
> 1.9.21, I have yet to decide this.

That sounds like a good step in the right direction.

How about this:

Create the SERVERNAME~ file with a lot of extra zeros, like INN, and then 
update it in place after each article, or each ten, or each <Configurable 
Number>.  This allows you have very frequent updates without the major 
penalty of re-writing the entire file each time.

Then, at the end of the run, roll the updates into the SERVERNAME file and 
delete SERVERNAME~

Finally, if the SERVERNAME~ file exists at the beginning if a run, 
assume we crashed and roll in the updated immediately.

Optionally, to avoid the complexity of updating the numbers in place, just 
rewrite the entire SERVERNAME~ file, but also roll it into the SERVERNAME 
file after each group.  That way it never grows very big.  But there is 
the extra overhead of creating, deleting, opening, writing, closing, etc. 
on each update.  Not as efficient, but perhaps a good first round 
compromise.

Having thought about this a few days, maybe the easiest way would be to 
every X articles, where x is configurable, write a new line to SERVERNAME~ 
with the current state for the current group.  Then, when rolling 
SERVERNAME~ into SERVERNAME, ignore every entry for each group except 
for the last one.  And, of course, fetchnews should check when it starts 
to see if there are any SERVERNAME~ files to roll over (which would mean 
it crashed last time) and do so immediately, before starting to fetch 
articles.

This should be not too difficult to program, would be robust, and wouldn't 
hurt performance very much.

Consider this all to be stuff get brain-juices flowing.  Let me know what 
you think of the various possibilities.

> 
> > I have verified (1), (2) and (3) under 1.9.20.rc9.  I haven't seen a
> > complete enough run under 1.9.20.rc9 to cause (4) to kick in, so I can't
> > say if it still does it.  I think you said it wasn't going to be fixed any
> > time soon.
> 
> Ad 4: If at all, for reasons given above.

Well, it's going to be a while now before my spool fill up again.  *grin*

> 
> > I don't know.  Maybe I'm trying to make leafnode do more than I should.  
> > But I'm only downloading a few groups -- a couple dozen at the most, and
> > often only one or two.  Some of them are just very active groups with a
> > lot of articles, so the download can run all night.
> > 
> > Is it reasonable for a program to run for hours and hours, and NOT
> > periodically save it's state, so that it can gracefully pick up where it
> > left off should the need arise?
> 
> Certainly, checkpointing the state would be helpful, but then again,
> leafnode assumes that you do NOT expire faster than you download. 

Well, assume is a dirty word in some lexicons.  Seriously, all software is
used in ways the original creators never imagined.  This is actually a
reasonable usage pattern for a small personal leaf node news server.  Not
for a large site with many users, of course.

> As
> written above, I can imagine deriving a per-group maxage setting from
> the (per-group) expire time.

I don't understand the last sentence?

> 
> > I am more than willing to help hammer this out and then help debug the
> > changes.  But, I need to see the the problem of (extremely) sloppy state
> > saving being dealt with seriously to be willing to do that.
> 
> Ok, here's the plan.
> 
> 1.9.20 will try hard to keep at least the old state. 1.9.19 can, on
> write errors ("disk full" is one such condition) kill the SERVERINFO
> file without notice. 1.9.20 will at least leave the old SERVERINFO where
> it is. 

Good.

> On a spontaneous abort that is not handled, like SIGSEGV, the new
> SERVERINFO~ will be shattered, probably empty, because there is no
> chance to flush the stdio buffers. However, since that file is not
> written too often, I'll make it unbuffered or line buffered, and
> together with the "SERVERNAME~" merge-in, this will save most of the
> state that is important to you. The patch against 1.9.20 is mostly
> trivial and will appear alongside the 1.9.20 release.

It sound like we are moving in the right direction!

Thanks,
Michael O'Quinn










-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list