Benjamin F. Kuo interviews Gil Elbaz in this article about his new funding of $25M for his open database platform, providing data as a service for localisation: Gil Elbaz On Factual’s New Warchest
What’s behind the idea? Using machine learning for information extraction from web crawls data:
Gil Elbaz: The way we do things, we are assimilating millions of sources of information into one, aggregated database. Those sources can be an end user directly, through crowdsourcing, or it can be based on a partnership–where we’re working with partners who are providing us with data from crowdsourced or other sources. That data can also be via web crawls, where we are applying natural language, machine learning, and other techniques to extract or validate facts from web pages. In order to assimilate and aggregate that information, it requires a high degree of data cleaning and resolution, which you might also know as de-duping.