[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] Perhaps slightly OT, but a leafnode



On Sun, 29 Dec 2002, Anders Jarnberg wrote:

> On 29 Dec 2002 15:14:57 +0100
> "clemens fischer" <ino-qc@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Anders Jarnberg <linuxdude@xxxxxxxxxxxx>:
> >> Env: SuSe 8.0, leafnode 2.0b8_ma9, T1 connection, /var is 72% used.
> >72% isn't very helpful.  you need a few megabytes free space in your
> >news-spool.
> 
> anders@newsbox:~> df -h /var
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/hdd5              23G   17G  6.6G  72% /var
> 
> >
> >> Update: I just tried going into 
> >> /var/spool/news/alt/binaries/multimedia/babylon5 and remove files by
> >> hand, but I can't even do an ls, since the system than hangs as above.
> >> I'm now thinking maybe it's not a leafnode problem, but can anyone
> >> give me any advice how to either find out what's wrong or give ideas
> >> for fixing it ? I'd rather not reinstall /var...
> >
> >can you do an ls(1) elsewhere?  what about inodes and swap (try df(1))?
> 
> Yes, I can do an ls in /var/spool/news/alt/binaries/multimedia but not
> below that.
> 
> anders@newsbox:~> df -i /var
> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
> /dev/hdd5            4294967295       0 4294967295    0% /var
> 
> I'm not sure what that indicates though...
> 
> I also just did a "du" and got the following:
> 141     ./alt/tv/babylon-5
> 141     ./alt/tv
> du: `./alt/binaries/multimedia/babylon5/43411': Permission denied
> du: `./alt/binaries/multimedia/babylon5/43412': Permission denied
> and then the system is hung big-time.
> 
> I tried to remove the files indicated above (43411&43412) but then
> the system hangs again. Is there anything more I can do that might
> help solve the situation, or should I just wipe it all and start over ?

Well, FWIW I was experiencing similar problems a while back, and I too 
thought they were being caused by leafnode.  It turns out in my case that 
I had exactly ONE corrupt sector on my HD, but it happened to be sitting 
in a meta-data area (where the directories and similar meta-data are 
stored) so it was affecting a bunch of things quite bizarrely.  I too 
was getting to know that reset button quite well.

I wound up losing the entire news spool, but I was lucky in that the only
thing on that particular partition was /var/spool/news, so the news spool
was the only thing lost.  I tried for days to get a working backup, but in
the end it just wasn't worth the time it was taking.

Have you tried the basics, like taking the system to single-user more, 
unmounting /var, and running fsck on it?  It may simply be some corruption 
in the meta-data on your HD that fsck will fix (maybe with some data loss, 
that being the lesser of two evils compared to regular system crashes).

Running badblocks (IN READ ONLY MODE!!!) on the affected partition might
reveal a physical problem with the drive.  It may return a false NEGATIVE,
since the read-only test isn't all that exhaustive, and might miss a
subtle error, but if you get any positive response, replace that drive,
and quickly.  I would run badblocks (READONLY!!!) BEFORE I ran fsck, 
because I've seen fsck REALLY hose a disk when there are corrupt sectors 
on the drive.  

If there are bad sectors, I would hack up everything I can, then use fsck 
to try to repair some damage and see if anything else can be backed up.  
This tends to be rather time-consuming, so if you already HAVE a decent 
backup...

Are you getting any messages in your system logs about I/O problems on
your HD?  Here is one example from /var/log/kernel of the errors I was
getting...

Sep 16 07:14:07 shire kernel: hdc: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Sep 16 07:14:07 shire kernel: hdc: dma_intr: error=0x40 { 
UncorrectableError }, LBAsect=935119, sector=934992
Sep 16 07:14:07 shire kernel: end_request: I/O error, dev 16:01 (hdc), 
sector 934992
Sep 16 07:14:07 shire kernel: vs-13070: reiserfs_read_inode2: i/o failure 
occurred trying to find stat data of [78163 852469 0x0 SD

BTW, are you using reiserfs?  (If you don't know, you aren't.)  In my 
experience it's easier to recover from HD problems with more traditional 
filesystems such as ext2 or ext3 than with reiserfs.

I don't know if this helps, but it might be someplace to start.  Good
luck!

Michael O'Quinn

-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list