<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A Million Little Files</title>
	<atom:link href="http://stuartsierra.com/2008/04/24/a-million-little-files/feed" rel="self" type="application/rss+xml" />
	<link>http://stuartsierra.com/2008/04/24/a-million-little-files</link>
	<description>From programming to everything else</description>
	<lastBuildDate>Sat, 04 Feb 2012 20:39:31 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: net_ma</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44385</link>
		<dc:creator>net_ma</dc:creator>
		<pubDate>Wed, 18 Jan 2012 15:32:43 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44385</guid>
		<description>Hi Stuart,

I downloaded your tool. But when I tried to convert a tar file, I got the following error.

The tar file I tried to convert is about 4.5GB in size and contains about 200 files.

Could you tell me what I should do?

Thank you.

java -jar tar-to-seq.jar /home/hduser/sample-archive/2011/a-250.tar a-250.seq
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).
log4j:WARN Please initialize the log4j system properly.
Exception in thread &quot;main&quot; java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2786)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:78)
	at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1224)
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1247)
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1297)
	at org.altlaw.hadoop.TarToSeqFile.execute(TarToSeqFile.java:95)
	at org.altlaw.hadoop.TarToSeqFile.main(TarToSeqFile.java:165)</description>
		<content:encoded><![CDATA[<p>Hi Stuart,</p>
<p>I downloaded your tool. But when I tried to convert a tar file, I got the following error.</p>
<p>The tar file I tried to convert is about 4.5GB in size and contains about 200 files.</p>
<p>Could you tell me what I should do?</p>
<p>Thank you.</p>
<p>java -jar tar-to-seq.jar /home/hduser/sample-archive/2011/a-250.tar a-250.seq<br />
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).<br />
log4j:WARN Please initialize the log4j system properly.<br />
Exception in thread &#8220;main&#8221; java.lang.OutOfMemoryError: Java heap space<br />
	at java.util.Arrays.copyOf(Arrays.java:2786)<br />
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)<br />
	at java.io.DataOutputStream.write(DataOutputStream.java:90)<br />
	at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:78)<br />
	at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)<br />
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)<br />
	at java.io.DataOutputStream.write(DataOutputStream.java:90)<br />
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1224)<br />
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1247)<br />
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1297)<br />
	at org.altlaw.hadoop.TarToSeqFile.execute(TarToSeqFile.java:95)<br />
	at org.altlaw.hadoop.TarToSeqFile.main(TarToSeqFile.java:165)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44235</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Fri, 05 Aug 2011 21:28:44 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44235</guid>
		<description>Chris: the source code is included in the .tar.gz download. Apache license.</description>
		<content:encoded><![CDATA[<p>Chris: the source code is included in the .tar.gz download. Apache license.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44233</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Fri, 05 Aug 2011 19:39:06 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44233</guid>
		<description>Would you be willing to share the source code?  It would be very helpful if i could rewrite it to have your Seq file as Key: Filename Value: File Text instead of a BytesWritable for the Value.</description>
		<content:encoded><![CDATA[<p>Would you be willing to share the source code?  It would be very helpful if i could rewrite it to have your Seq file as Key: Filename Value: File Text instead of a BytesWritable for the Value.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44160</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Wed, 09 Mar 2011 22:25:29 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44160</guid>
		<description>Sorry, I don&#039;t know anything else about Hadoop Streaming.</description>
		<content:encoded><![CDATA[<p>Sorry, I don&#8217;t know anything else about Hadoop Streaming.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bowen</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44159</link>
		<dc:creator>Bowen</dc:creator>
		<pubDate>Wed, 09 Mar 2011 21:41:08 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44159</guid>
		<description>Thanks for the prompt reply.

So is there anyway for Hadoop Streaming to take binary files as input? Currently, I have to first download those files from HDFS to the local machine, and then process. It&#039;s super slow...

Thanks,
Bowen</description>
		<content:encoded><![CDATA[<p>Thanks for the prompt reply.</p>
<p>So is there anyway for Hadoop Streaming to take binary files as input? Currently, I have to first download those files from HDFS to the local machine, and then process. It&#8217;s super slow&#8230;</p>
<p>Thanks,<br />
Bowen</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44158</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Wed, 09 Mar 2011 13:52:38 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44158</guid>
		<description>The file contents are stored as a normal Hadoop BytesWritable object.  You would access it as you would any other Hadoop Writable datatype.  But if I recall correctly, Hadoop Streaming only supports text.</description>
		<content:encoded><![CDATA[<p>The file contents are stored as a normal Hadoop BytesWritable object.  You would access it as you would any other Hadoop Writable datatype.  But if I recall correctly, Hadoop Streaming only supports text.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bowen</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44157</link>
		<dc:creator>Bowen</dc:creator>
		<pubDate>Wed, 09 Mar 2011 05:26:01 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44157</guid>
		<description>Stuart,

I&#039;m using Hadoop Streaming (C code) to process binary input files. After using your code to get the sequence file, how to read the data in my C code? Should I read it byte by byte?

Thanks,
Bowen</description>
		<content:encoded><![CDATA[<p>Stuart,</p>
<p>I&#8217;m using Hadoop Streaming (C code) to process binary input files. After using your code to get the sequence file, how to read the data in my C code? Should I read it byte by byte?</p>
<p>Thanks,<br />
Bowen</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hadoop binary files processing entroduced by image duplicates finder &#171; eldad levy&#039;s playground</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-44116</link>
		<dc:creator>Hadoop binary files processing entroduced by image duplicates finder &#171; eldad levy&#039;s playground</dc:creator>
		<pubDate>Sat, 05 Feb 2011 09:26:08 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-44116</guid>
		<description>[...] all the images as a tar file and using the tool written by Stuart Sierra to convert it to a sequence [...]</description>
		<content:encoded><![CDATA[<p>[...] all the images as a tar file and using the tool written by Stuart Sierra to convert it to a sequence [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wenxiu</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43211</link>
		<dc:creator>wenxiu</dc:creator>
		<pubDate>Thu, 02 Sep 2010 08:06:42 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43211</guid>
		<description>oh. sorry. ignore my last post, it is just a warning. it works!  thanks~
BTW, do you happen to have a tool to convert multiple files from a directory to a single sequence file directly? I mean, to save the outside tar compress and un-compress cost.</description>
		<content:encoded><![CDATA[<p>oh. sorry. ignore my last post, it is just a warning. it works!  thanks~<br />
BTW, do you happen to have a tool to convert multiple files from a directory to a single sequence file directly? I mean, to save the outside tar compress and un-compress cost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wenxiu</title>
		<link>http://stuartsierra.com/2008/04/24/a-million-little-files/comment-page-1#comment-43210</link>
		<dc:creator>wenxiu</dc:creator>
		<pubDate>Thu, 02 Sep 2010 07:48:21 +0000</pubDate>
		<guid isPermaLink="false">http://stuartsierra.com/?p=151#comment-43210</guid>
		<description>Sorry, how to make it work? I got an error:

prod2@ot-9h30d06:~/QSense/hadoop/hadoop-0.20.2/tar-to-seq$ java -jar tar-to-seq.jar ../change.tar.gz seq_file
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).
log4j:WARN Please initialize the log4j system properly.</description>
		<content:encoded><![CDATA[<p>Sorry, how to make it work? I got an error:</p>
<p>prod2@ot-9h30d06:~/QSense/hadoop/hadoop-0.20.2/tar-to-seq$ java -jar tar-to-seq.jar ../change.tar.gz seq_file<br />
log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration).<br />
log4j:WARN Please initialize the log4j system properly.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

