[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Fetchnews run showing Spontaneous Early Termination.



Michael O'Quinn schrieb am Montag, den 25. März 2002:

> Here's the tail end of to console output of a recent run (fetchnews
> -nvvvv & debugmode = 1), edited for paranoia as before:
> 
> =====[ BEGIN ]==========[ BEGIN ]==========[ BEGIN ]=====
> 
> {GROUP 1}: receiving article 59883 (2228 more up in the air)
> ..saw header From:
> ..saw header Newsgroups:
> ..saw header Subject:
> ..saw header Message-ID:
> ..saw header Date:
> ..saw header Lines:
> ..saw header Path:
> ..saw header Xref:
> storing <{SOME MESSAGE-ID}>: 
> {A DIFFERENT GROUP},{GROUP 1},{YET ANOTHER GROUP}
> . discarding unknown group {A DIFFERENT GROUP}
> .as article 116171 in {GROUP 1}
> .as article 27937 in {YET ANOTHER GROUP}
> Disconnected from {SERVERNAME}.
> [root@{LEAFNODE SERVER} news]# article /var/spool/news/{GROUP 1}/116171 contained illegal headers
> 
> =====[  END  ]==========[  END  ]==========[  END  ]=====
> 
> Article 116171 in {GROUP 1} is no longer there.  But, article 27937 in
> {YET ANOTHER GROUP} is still present, and it is a zero length file with a
> hard link count of 1.

Leafnode is being somewhat unelegant here. To understand why it does
what you see, I need to explain the technical reasons somewhat.

When storing an article (in leafnode 1.9), leafnode proceeds as follows:

1. write the header except the Xref: line (which is filtered, because it
   is ours anyhow)
2. create the hard links, and as these are created, create the Xref:
   line in memory
3. write the XRef: line
4. unless delaybody is set: write an empty line and the body.

However, when we detect write errors, we'd need to trace back the whole
XRef: path, which I found was too much code to be reasonable. I decided
to simply empty the file (truncate to 0 bytes) and unlink the message.id
file -- and let texpire collect the remaining hard links instead. The
nntpd (leafnode) is aware of this convention and will treat 0 byte files
as nonexistent.

> The file is not presently in the message.id database, as indicated by
> "find /var/spool/news/message.id |grep {SOME MESSAGE-ID}".  (A very

True, the message-id file is deleted.

> I believe the fact that illegal headers error was output AFTER the shell
> prompt returned indicates it came from the XOVER process that was spawned  

"illegal headers" may also mean that the XOVER updater was not able to
extract any header information -- which is quite normal for 0-length
files.

> Note also that the unknown group message is just one in a long string on
> this run -- it's not unique to this article.

That means that you don't create links in all groups for crosspostings,
and fetchnews has ignored the post for groups that are not marked
interesting.

> As you can see from the first line, there are several thousand articles 
> that did not get retrieved.

> 
> Now the log:
> 
> =====[ BEGIN ]==========[ BEGIN ]==========[ BEGIN ]=====
> 
> Mar 25 20:56:18 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 1}
> Mar 25 20:56:18 {LEAFNODE SERVER} fetchnews[9276]: <211 22349 41249 63597 {GROUP 1} selected
> Mar 25 20:56:18 {LEAFNODE SERVER} fetchnews[9276]: {GROUP 1}: considering articles 42400-63597
> Mar 25 20:56:18 {LEAFNODE SERVER} fetchnews[9276]: >XOVER 42400-63597
> Mar 25 20:56:20 {LEAFNODE SERVER} fetchnews[9276]: <224 Overview information follows
> Mar 25 20:57:00 {LEAFNODE SERVER} fetchnews[9276]: >BODY 59710
> Mar 25 20:57:06 {LEAFNODE SERVER} fetchnews[9276]: >HEAD 59711
>     .
>     .
>     .
> Mar 25 21:13:13 {LEAFNODE SERVER} fetchnews[9276]: >HEAD 59883
> Mar 25 21:13:14 {LEAFNODE SERVER} fetchnews[9276]: >BODY 59883
> Mar 25 21:23:17 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 2}
> Mar 25 21:23:17 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 3}
> Mar 25 21:23:17 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 4}
> Mar 25 21:23:17 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 5}
> Mar 25 21:23:17 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 6}
> Mar 25 21:23:18 {LEAFNODE SERVER} fetchnews[9276]: >GROUP {GROUP 7}
> Mar 25 21:23:18 {LEAFNODE SERVER} fetchnews[9276]: >QUIT
> Mar 25 21:23:21 {LEAFNODE SERVER} fetchnews[10013]: Process forked.
> Mar 25 21:23:22 {LEAFNODE SERVER} fetchnews[10013]: reading XOVER info from /var/spool/news/{GROUP 1}/116030
>     .
>     .
>     .
> Mar 25 21:23:44 {LEAFNODE SERVER} fetchnews[10013]: reading XOVER info from /var/spool/news/{GROUP 1}/116170
> Mar 25 21:23:44 {LEAFNODE SERVER} fetchnews[10013]: reading XOVER info from /var/spool/news/{GROUP 1}/116171
> Mar 25 21:23:44 {LEAFNODE SERVER} fetchnews[10013]: illegal article: /var/spool/news/{GROUP 1}/116171: Is a directory

That message looks bogus, I think I fixed that in 2.0b, and did not
backport the fix to 1.9.

> Mar 25 21:23:44 {LEAFNODE SERVER} fetchnews[10013]: updated 142 lines in /var/spool/news/{GROUP 1}/.overview
> Mar 25 21:23:45 {LEAFNODE SERVER} fetchnews[10013]: wrote /var/spool/news/{GROUP 1}/.overview
> 
> =====[  END  ]==========[  END  ]==========[  END  ]=====
> 
> Notice how it plowed through all the groups after the first one without 
> any replies or anything being output to the console.

Yes. That was visible from former logs you showed, and it may hint me to
a synchronization problem.

> To summarize:
> 
> (1) fetchnews choked and died on article number (remote) 59883 / (local)
> 116171.
> 
> (2) There were around 2200 more articles to be fetched when this happened.
> 
> (3) The log shows fetchnews just kinda bombing through the rest of 
> the groups after that, but not actually doing anything.

Well, I'm not sure what it does, because the logging was severed during
the *rc* series after complaints about excessive logging. I must ask you
to retry this with 1.9.20 after it is released.

> (4) The console output doesn't even show that, it just shows that it all 
> ended (mostly) normally with remote article # 59883.
> 
> (5) The XOVER process seemed to choke on whatever article 116171 actually
> got saved as (file or directory), and appeared to produced two different
> error messages in complaint.
> 
> (6) That same article still exists in /var/spool/news/{YET ANOTHER GROUP}
> and is a zero length file.
> 
> (7) The article is not in /var/spool/news/message.id/*, at least not where
> it's supposed to be, IE filed away with the messaged-id as its file name..
> 
> (8) The zero length file in {YET ANOTHER GROUP} has a hard link count of
> 1.  
> 
> (9) The hard link count of 1 seems to indicate that the XOVER process
> deleted it from /var/spool/news/{GROUP 1}/ and from
> /var/spool/news/message.id/*/<{SOME MESSAGE-ID}>, but left it in
> /var/spool/news/{YET ANOTHER GROUP}.  
> 
> (10) Alternatively, it might have never even been stored in
> /var/spool/news/message.id/*/<{SOME MESSAGE-ID}>.

...or deleted from there by fetchnews itself after a write error.

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list