[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] fetchnews in parallel?



On Wed, 10 Jul 2002, Witold Wladyslaw Wojciech Wilk wrote:

> Dnia Wed, 10 Jul 2002 11:39:00 -0500 niejaki(a) "Bulgrien, Kevin"
> <Kevin.Bulgrien@xxxxxxxxxxxxxxxxxx> napisal(a):
> 
> > > > Thrashing contention is considerable.
> > > 
> > > I don't get the meaning here :( You mean data corruption? or loss of
> > > articles?
> > 
> > Thrashing can mean this:
> > 
> > Hard drive heads can only be in one place at a time...  I-cut-
> 
> oh yeah. a random seek/write. ok, now I get it :)
> 
> > Of course for this to be truly a problem, the incoming data
> > would have to be coming in faster than the drive can handle.
> 
> yes, with a slow ata a 4-pipe write WOULD be a problem. but for a 2 disk
> scsi raid it would be child play, they propably wouldn't have seen any
> difference. but as I've said - this option could defined in config. even -
> could be a ./configure --with-simultanous-write-pipes or something.


Um, what does the number of pipes have to do with it?  I think the
limiting factor here is most likely going to be the bandwidth of the
incoming connection.  No amount of parallelism is going to allow you to
exceed the bandwidth of your Internet connection, no matter what.

For example, my DSL connection is rated at 768 kbps incoming, and in Real
Life it maxes out at about 680 kbps.  (TCP overhead and such eats up a
percentage).  I've established this max speed with several speed
benchmarking services, and via the firewall's real time bandwidth meter.

When fetchnews is fetching over that line, it typically runs about 200-400
kbps.  I have another program on another Linux box that DOES do parallel
news fetches (using NewsPlex), and it always maxes out the line at
680kbps.

In neither case does disk bandwidth even remotely become a problem.  The
machine doing the parallel fetches is a P-133, something like 32 or 64MB
memory, and an old ATA 6 gig drive.  Even if the drive is capable of ATA
33, the motherboard isn't, and the drive is 5400 RPM. 

Even when my Internet bandwidth is totally maxed out, the disk on either
machine barely flickers.

Now when I am retrieving this cached news across my internal 10 Base TX
network, server limitations do become an issue.  But surprisingly, it's
still not disk bandwidth, but CPU on the server.

I did have a problem with thrashing on the NewsPlex box.  It wasn't a disk
problem, though - any drive(s) would have done the same.  It turned out
that I didn't have enough memory (NewsPlex is a REAL memory pig), and when
fetching with 4 streams it wound up swaping so much it 100% thrashed the
swap partition on the disk.  When I threw some memory at the problem, it
went away.  (Mostly, it still crops up on very long runs.  I suspect it's
a memory leak in NewsPlex.)

So, IMHO talking about disk bandwidth is kind of a moot issue, given that
leafnode's target audience is most likely NOT going to have multi-megabyte
Internet bandwidth.


Michael O'Quinn


-- 
leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list