Table of Contents
http://www.philwhln.com/how-to-get-experience-working-with-large-datasets
www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public -- very good collection of links
http://www.kdnuggets.com/datasets/
http://www.researchpipeline.com/mediawiki/index.php?title=Main_Page
http://www.google.com/publicdata/directory
https://delicious.com/pskomoroch/dataset
https://delicious.com/judell/publicdata?networkaddconfirm=judell
Amazon provides following data sets : ENSEMBL Annotated Gnome data, US Census data, UniGene, Freebase dump
Data transfer is 'free' within Amazon eco system (within the same zone)
InfoChimps has data marketplace with a wide variety of data sets.
open source data portal platform
data sets available on datahub.io from ckan.org
http://snap.stanford.edu/data/index.html
Crowd sourced flight data http://openflights.org/
http://stat-computing.org/dataexpo/2009/the-data.html
OpenStreetMap is a free worldwide map, created by people users. The geo and map data is available for download.
openstreet.org
http://www.naturalearthdata.com/downloads/
http://data.geocomm.com/drg/index.html
Available from http://libremap.org/
Web data crawl data linky
variety of data available from http://www.freebase.com/
http://blog.stackoverflow.com/category/cc-wiki-dump/
proceedings from Statistical machine Translation
www.google.com/googlebooks/uspto.html
http://datacatalog.worldbank.org/
http://phpartners.org/health_stats.html
http://projectreporter.nih.gov/reporter.cfm
http://data.un.org/Explorer.aspx