<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A Million Little Files</title>
	<atom:link href="http://stuartsierra.com/2008/04/24/a-million-little-files/feed" rel="self" type="application/rss+xml" />
	<link>http://stuartsierra.com/2008/04/24/a-million-little-files</link>
	<description>From programming to everything else</description>
	<lastBuildDate>Thu, 02 Sep 2010 08:06:42 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: wenxiu</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43211</link>
		<dc:creator>wenxiu</dc:creator>
		<pubDate>Thu, 02 Sep 2010 08:06:42 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43211</guid>
		<description>oh. sorry. ignore my last post, it is just a warning. it works!  thanks~
BTW, do you happen to have a tool to convert multiple files from a directory to a single sequence file directly? I mean, to save the outside tar compress and un-compress cost.</description>
		<content:encoded><![CDATA[<p>oh. sorry. ignore my last post, it is just a warning. it works!  thanks~<br />
BTW, do you happen to have a tool to convert multiple files from a directory to a single sequence file directly? I mean, to save the outside tar compress and un-compress cost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wenxiu</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43210</link>
		<dc:creator>wenxiu</dc:creator>
		<pubDate>Thu, 02 Sep 2010 07:48:21 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43210</guid>
		<description>Sorry, how to make it work? I got an error:

prod2@ot-9h30d06:~/QSense/hadoop/hadoop-0.20.2/tar-to-seq$ java -jar tar-to-seq.jar ../change.tar.gz seq_file
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).
log4j:WARN Please initialize the log4j system properly.</description>
		<content:encoded><![CDATA[<p>Sorry, how to make it work? I got an error:</p>
<p>prod2@ot-9h30d06:~/QSense/hadoop/hadoop-0.20.2/tar-to-seq$ java -jar tar-to-seq.jar ../change.tar.gz seq_file<br />
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).<br />
log4j:WARN Please initialize the log4j system properly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jan</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43107</link>
		<dc:creator>Jan</dc:creator>
		<pubDate>Fri, 04 Jun 2010 15:19:51 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43107</guid>
		<description>Hi,

I tried to create a seq file with text files inside, using your tool but unfortunately when I later open it in HADOOP, I get different content:
...6b 20 74 68 65 20 61 72 65 61 73 20 77 68 65 72 65 20 77 65 20 65 73 74 69 6d 61 74 65 64 20 74 68 65 20 74 72 61 69 6c 20 75 70 70 65 72 2d 6c 69 6d 69 74 20 66 72 6f 6d 20 74 68 65 20 73 74 61 6e 64 61 72 64 20 64 65 76 69 61 74 69 6f 6e 20 6f 66 20 61 6e 20 31 31 20 c3 97 20 31 31 20 70 69 78 65 6c 20 61 70 65 72 74 75 7...

What could be the reason?

Thank you,
Jan</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I tried to create a seq file with text files inside, using your tool but unfortunately when I later open it in HADOOP, I get different content:<br />
&#8230;6b 20 74 68 65 20 61 72 65 61 73 20 77 68 65 72 65 20 77 65 20 65 73 74 69 6d 61 74 65 64 20 74 68 65 20 74 72 61 69 6c 20 75 70 70 65 72 2d 6c 69 6d 69 74 20 66 72 6f 6d 20 74 68 65 20 73 74 61 6e 64 61 72 64 20 64 65 76 69 61 74 69 6f 6e 20 6f 66 20 61 6e 20 31 31 20 c3 97 20 31 31 20 70 69 78 65 6c 20 61 70 65 72 74 75 7&#8230;</p>
<p>What could be the reason?</p>
<p>Thank you,<br />
Jan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Case for Babar: A Tool for Creating Hadoop Sequence Files &#171; Ryan Balfanz</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43072</link>
		<dc:creator>The Case for Babar: A Tool for Creating Hadoop Sequence Files &#171; Ryan Balfanz</dc:creator>
		<pubDate>Mon, 22 Feb 2010 08:38:15 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43072</guid>
		<description>[...] are they large, but we have a lot of them. We are using Hadoop after all. Stuart Sierra&#8217;s Tar-to-Seq utility was working quite nicely until this new input set, as those files were much [...]</description>
		<content:encoded><![CDATA[<p>[...] are they large, but we have a lot of them. We are using Hadoop after all. Stuart Sierra&#8217;s Tar-to-Seq utility was working quite nicely until this new input set, as those files were much [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42900</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Tue, 10 Nov 2009 10:52:25 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42900</guid>
		<description>Stuart,

Thanks for posting this. It will help me a lot! It would be nice to see some options to set SequenceFile.CompressionType, or perhaps guess based on the incoming extension like you do in openInputFile().

Thanks,
Ryan</description>
		<content:encoded><![CDATA[<p>Stuart,</p>
<p>Thanks for posting this. It will help me a lot! It would be nice to see some options to set SequenceFile.CompressionType, or perhaps guess based on the incoming extension like you do in openInputFile().</p>
<p>Thanks,<br />
Ryan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42136</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Tue, 10 Feb 2009 16:11:48 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42136</guid>
		<description>Mark,

I&#039;m afraid I don&#039;t know.  From my understanding of HDFS, it depends on a lot of factors -- the size of the files, the bandwidth to and within the cluster, and the hardware itself.  Try the Hadoop mailing list.  One thing I do know is that copying lots of small files to HDFS will be slower than copying a few big files.

-Stuart</description>
		<content:encoded><![CDATA[<p>Mark,</p>
<p>I&#8217;m afraid I don&#8217;t know.  From my understanding of HDFS, it depends on a lot of factors &#8212; the size of the files, the bandwidth to and within the cluster, and the hardware itself.  Try the Hadoop mailing list.  One thing I do know is that copying lots of small files to HDFS will be slower than copying a few big files.</p>
<p>-Stuart</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Kerzner</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42135</link>
		<dc:creator>Mark Kerzner</dc:creator>
		<pubDate>Tue, 10 Feb 2009 02:46:09 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42135</guid>
		<description>Stuart,

I have written to code to write from the PC file system to HDFS, and I also noticed that it is very slow. Instead of 40M/sec, as promised by the Tom White&#039;s book, it seems to be 40 sec/Meg. Your tars would work about 5 times faster. But still, why is it so slow? Is there a way to speed this up?

Thanks!</description>
		<content:encoded><![CDATA[<p>Stuart,</p>
<p>I have written to code to write from the PC file system to HDFS, and I also noticed that it is very slow. Instead of 40M/sec, as promised by the Tom White&#8217;s book, it seems to be 40 sec/Meg. Your tars would work about 5 times faster. But still, why is it so slow? Is there a way to speed this up?</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cloudera Hadoop &#38; Big Data Blog &#187; Blog Archive &#187; The Small Files Problem</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42125</link>
		<dc:creator>Cloudera Hadoop &#38; Big Data Blog &#187; Blog Archive &#187; The Small Files Problem</dc:creator>
		<pubDate>Mon, 02 Feb 2009 16:11:12 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42125</guid>
		<description>[...] to create a collection of SequenceFiles in parallel. (Stuart Sierra has written a very useful post about converting a tar file into a SequenceFile &#8212; tools like this are very useful, and it [...]</description>
		<content:encoded><![CDATA[<p>[...] to create a collection of SequenceFiles in parallel. (Stuart Sierra has written a very useful post about converting a tar file into a SequenceFile &#8212; tools like this are very useful, and it [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42120</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Thu, 29 Jan 2009 15:23:41 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42120</guid>
		<description>You don&#039;t need to do anything special.  The code I posted here produces Hadoop &lt;a href=&quot;http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html rel=&quot;nofollow&quot;&gt;SequenceFile&lt;/a&gt;s.  You can use the built-in Hadoop class SequenceFile.Reader to read them.  Normally, all you need to do is:

&lt;pre&gt;
yourJobConf.setInputFormat(SequenceFileInputFormat.class);
&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>You don&#8217;t need to do anything special.  The code I posted here produces Hadoop <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html rel="nofollow">SequenceFile</a>s.  You can use the built-in Hadoop class SequenceFile.Reader to read them.  Normally, all you need to do is:</p>
<pre>
yourJobConf.setInputFormat(SequenceFileInputFormat.class);
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rasit</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-42119</link>
		<dc:creator>Rasit</dc:creator>
		<pubDate>Thu, 29 Jan 2009 08:38:41 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-42119</guid>
		<description>Stuart, I mean, which InputReader should I use (InputReader which sends key-value pairs to Mapper class.).
Do hadoop offer some? or should I extend one of existing?</description>
		<content:encoded><![CDATA[<p>Stuart, I mean, which InputReader should I use (InputReader which sends key-value pairs to Mapper class.).<br />
Do hadoop offer some? or should I extend one of existing?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
