Daily Archives: April 24, 2008

A Million Little Files

My PC-oriented brain says it’s easier to work with a million small files than one gigantic file. Hadoop says the opposite — big files are stored contiguously on disk, so they can be read/written efficiently. UNIX tar files work on … Continue reading

Posted in Programming | Tagged | 20 Comments