[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Fetchnews run showing Spontaneous Early Termination.



On Fri, 29 Mar 2002, Matthias Andree wrote:

> Michael O'Quinn schrieb am Montag, den 25. März 2002:
> 
> > Here's the tail end of to console output of a recent run (fetchnews
> > -nvvvv & debugmode = 1), edited for paranoia as before:
> > 
> > =====[ BEGIN ]==========[ BEGIN ]==========[ BEGIN ]=====
> > 
> > {GROUP 1}: receiving article 59883 (2228 more up in the air)
> > ..saw header From:
> > ..saw header Newsgroups:
> > ..saw header Subject:
> > ..saw header Message-ID:
> > ..saw header Date:
> > ..saw header Lines:
> > ..saw header Path:
> > ..saw header Xref:
> > storing <{SOME MESSAGE-ID}>: 
> > {A DIFFERENT GROUP},{GROUP 1},{YET ANOTHER GROUP}
> > . discarding unknown group {A DIFFERENT GROUP}
> > .as article 116171 in {GROUP 1}
> > .as article 27937 in {YET ANOTHER GROUP}
> > Disconnected from {SERVERNAME}.
> > [root@{LEAFNODE SERVER} news]# article /var/spool/news/{GROUP 1}/116171 contained illegal headers
> > 
> > =====[  END  ]==========[  END  ]==========[  END  ]=====
> > 
> > Article 116171 in {GROUP 1} is no longer there.  But, article 27937 in
> > {YET ANOTHER GROUP} is still present, and it is a zero length file with a
> > hard link count of 1.
> 
> Leafnode is being somewhat unelegant here. To understand why it does
> what you see, I need to explain the technical reasons somewhat.
> 
> When storing an article (in leafnode 1.9), leafnode proceeds as follows:
> 
> 1. write the header except the Xref: line (which is filtered, because it
>    is ours anyhow)
> 2. create the hard links, and as these are created, create the Xref:
>    line in memory
> 3. write the XRef: line
> 4. unless delaybody is set: write an empty line and the body.
> 
> However, when we detect write errors, we'd need to trace back the whole
> XRef: path, which I found was too much code to be reasonable. I decided
> to simply empty the file (truncate to 0 bytes) and unlink the message.id
> file -- and let texpire collect the remaining hard links instead. The
> nntpd (leafnode) is aware of this convention and will treat 0 byte files
> as nonexistent.

That's pretty much what I thought was happening.  As long as none of the 
programs in the leafnode suite choke on those zero length files and they 
are expired properly. it's a simple and efficient solution.

> 
> True, the message-id file is deleted.
> 
> > I believe the fact that illegal headers error was output AFTER the shell
> > prompt returned indicates it came from the XOVER process that was spawned  
> 
> "illegal headers" may also mean that the XOVER updater was not able to
> extract any header information -- which is quite normal for 0-length
> files.

That's probably a minor bug.  The XOVER updater should know a little
better what to do with a zero length file.  Certainly not critical, 
though.  

> 
> > Note also that the unknown group message is just one in a long string on
> > this run -- it's not unique to this article.
> 
> That means that you don't create links in all groups for crosspostings,
> and fetchnews has ignored the post for groups that are not marked
> interesting.

Actually, I have set create_all_links = 1, so I assume this means the 
unknown group is simply not in groupinfo, meaning that it not in my 
upstream server's active file.

> > Mar 25 21:23:44 {LEAFNODE SERVER} fetchnews[10013]: updated 142 lines in /var/spool/news/{GROUP 1}/.overview
> > Mar 25 21:23:45 {LEAFNODE SERVER} fetchnews[10013]: wrote /var/spool/news/{GROUP 1}/.overview
> > 
> > =====[  END  ]==========[  END  ]==========[  END  ]=====
> > 
> > Notice how it plowed through all the groups after the first one without 
> > any replies or anything being output to the console.
> 
> Yes. That was visible from former logs you showed, and it may hint me to
> a synchronization problem.

O.K.

What does that mean?  And is there anything I can do at this point to help 
further track this down?

Late Breaking News:

I've been fooling with the upstream NewsPlex server today.  I've brought
it down several times while fetchnews was running, and it caused exactly
the same behavior.  The log instantly ran through all the remaining groups
but no action was taken, basically nothing showed up on the console, I got
the same funny messages from the 0 length files, and SERVERNAME wasn't
updated.  This is with rc9.

I think this pretty much points to the cause of my disconnects - it's 
probably not fetchnews' fault.

HOW WELL fetchnews handles the disconnects is still an issue, of course.

> 
> > To summarize:
> > 
> > (1) fetchnews choked and died on article number (remote) 59883 / (local)
> > 116171.
> > 
> > (2) There were around 2200 more articles to be fetched when this happened.
> > 
> > (3) The log shows fetchnews just kinda bombing through the rest of 
> > the groups after that, but not actually doing anything.
> 
> Well, I'm not sure what it does, because the logging was severed during
> the *rc* series after complaints about excessive logging. I must ask you
> to retry this with 1.9.20 after it is released.

I will, but not until I can disable this new
implicit-maxage-derived-from-groupexpire bug.

Michael O'Quinn





-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list