Discussion thread on LinkedIn Group:
Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co.
Project Gutenberg (approximately 30,000 books)
Wikipedia (full download)
Datasets available through Amazon, such as the Human Genome Project and US Census Database