[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [leafnode-list] fetchnews in parallel?

Dnia Tue, 9 Jul 2002 13:31:21 +0200 niejaki(a) Matthias Andree
<matthias.andree@xxxxxxxxxxxxxxxxxxxx> napisal(a):

> > The problem propably would be those file formats - more memory, isn't
> > it?
> Not necessarily. An on-disk data base with fine grained locking should
> be sufficient and actually change memory usage: less data in memory, but
> more program in memory (the data base software ;-)

it just depends on how many servers fetchnews is fetching from
simultanously ;)

> > (just popped into my mind)
> > also - prebuffering in leafnode - if data comes in faster than we can
> > write it - leafnode does prebuffering inside of itself? if it does the
> Not really. Reading and writing are lock-step operations as of now, and
> especially on safe filesystems (that excludes ext2fs on Linux), it can
> be somewhat slow.

yes, it would be, BUT if it should happen - then it could work as a
"burst" buffer. it would be working on the assumtion, that You cannot
further slow down a slow disk ;)

> > buffer must be 8kB... so if it would be set higher or there could be a
> > config option - this one could help. Right now I'm having a maxtor
> > d540x
> There's also the kernel TCP receive buffer, one might think that this
> also helps.

propably yes, but this buffer is buffering for the whole
input/output/forward, or on every connection... hmmm... I'll check into
that in linux-2.4.18/Documentation or somewhereelse this night.

> OTOH, I'm running leafnode on a rather fast Fujitsu MAH3182MP (U160 SCSI
> drive) that supports tagged command queueing, unlike ATA drives (NOTE:
> some BSDs support ATA tagged command queueing, but Maxtor ATA drives do
> not).

yeah something like that would help some. but on the other side - how many
rpm's does your fujitsu have? 10krpm? just that helps a lot, not including
the pros of the scsi interface over ide.

second of all - not everybody makes a small lan/home server based on a 4
raided 15krpm cheetahs :)

> > the only problem that I see right now is "what happens if they cut the
> > power...". 
> When you pipeline writes, how do you propagate errors back to know where
> to pick up?

hmmmm....again I'm going into theory mode ;) lets make some steps:

1) fetchnews connects to every news.server.com
2) fetchnews downloads all the article headers for every groups that is
marked as interesting
3) for every server it disconnects after finishing that task
4) after the last header download it gets nvoked again. this time it has
marked articles to fetch. how about making a buffer divided into pipes,
lets say ahh.. 4 of them.
5) fetchenws writes into a file what is going to be downloaded from where,
etc, in what order should it be done (now that one is something that will
have to be discussed also, but maybe some other time, when I'll finally
get time to do more than just reply to mail ;) )
6) fetchnews starts downloading, buffering it in 4 pipes with simultanous
disk access (the number of pipes could be set via vconfig, just like the
size of every pipe)
7) fetchnews starts downloading. he doesnt even have to write what does he
wrote already, about that late. now:

8a) we complete without error. fetchnews check if every article from the
list is downloaded and then deletes that list. werre safe, aren't we?

8b) we lose the power. system reboots. fetchnews starts and sees that the
file with the files IS in the direcotory. then check which articles are
present and downloaded. when it sees that in a place the download ends,
than it discards th last downloaded downloading it again, just in case,
and then download the rest. go back to 8a. or 8b ;)


Its thje thing that just went into my mind propably there's a better way
for it. Now to sleep ;)

|GIT d- s+:- a--- C++ UL++++ P+ L+++ E- W N++ o? K? w-- !O !M !V|
|_PS+ PE+++ Y+ PGP !t !5 !X R+ !tv b++++ !DI D+++ G e- h! r- y++|

leafnode-list@xxxxxxxxxxxxxxxxxxxxxxxxxxxx -- mailing list for leafnode
To unsubscribe, send mail with "unsubscribe" in the subject to the list