OCC Plans to Host Common Crawl’s Data
The Open Cloud Consortium is in discussions with Common Crawl about publicly hosting data from their crawler on the Open Science Data Cloud. We’re very excited about this potential partnership with Common Crawl’s team. We think our similar approaches to community based science and sharing public information are a natural fit.
Common Crawl Foundation is a California 501(c)3 registered non-profit founded by Gil Elbaz with the goal of democratizing access to web information by producing and maintaining an open repository of web crawl data that is universally accessible. Common Crawl’s vision is of a truly open web that allows universal open access to information and enables greater innovation in research, business and education.
From Common Crawl’s site:
As the largest and most diverse collection of information in human history, the web grants us tremendous insight if we can only understand it better. For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.
Stay tuned for more news as our partnership with Common Crawl develops.