{{short description|Knowledge repository integrating open datasets}} {{Infobox website | name = Data Commons | logo = Data Commons logo 2025.png | logo_size = | logo_alt = | logo_caption = | screenshot = Data Commons screenshot.png | screenshot_size = | screenshot_alt = Screenshot of a query in Data Commons | collapsible = <!-- set as "on", "y", etc, otherwise omit/leave blank. Does nothing for mobile users. --> | collapsetext = <!-- collapsible area's heading (default "Screenshot"); omit/leave blank if collapsible not set --> | caption = Results for a query in Data Commons | former_name = | company_type = | type = <!-- or: | website_type = --> | language = | language_count = | language_footnote = | founded = | predecessor = | headquarters = <!-- or: | location = --> | location_city = | location_country = <!-- or: | country = --> | country_of_origin = | owner = <!-- or: | owners = --> | author = <!-- or: | authors / creator / creators --> | founder = Ramanathan V. Guha | parent = Google | url = {{URL|Datacommons.org}} | ipv6 = | advertising = | commercial = <!-- "Yes", "No" or leave blank --> | registration = <!-- or: | reg = --> | num_users = <!-- or: | users = --> | launch_date = {{start date and age|2018|05}} | current_status = | content_license = | footnotes = | key_people = Prem Ramaswami (Head of Data Commons) }} '''Data Commons''' is an open-source platform<ref>{{cite web |title=Custom Data Commons |url=https://docs.datacommons.org/custom_dc/ |website=Docs - Data Commons |access-date=16 July 2024}}</ref> created by Google<ref name="Google 0923">{{cite news |title=Data Commons is using AI to make the world's public data more accessible and helpful |url=https://blog.google/technology/ai/google-data-commons-ai/ |access-date=16 July 2024 |work=Google |date=13 September 2023 |language=en-us}}</ref> that provides an open knowledge graph, combining economic, scientific and other public datasets into a unified view.<ref name=":0">{{Citation|last1=Fensel|first1=Dieter|title=Introduction: What Is a Knowledge Graph?|date=2020|url=http://link.springer.com/10.1007/978-3-030-37439-6_1|work=Knowledge Graphs|pages=1–10|place=Cham|publisher=Springer International Publishing|language=en|doi=10.1007/978-3-030-37439-6_1|isbn=978-3-030-37438-9|access-date=2020-10-16|last2=Şimşek|first2=Umutcan|last3=Angele|first3=Kevin|last4=Huaman|first4=Elwin|last5=Kärle|first5=Elias|last6=Panasiuk|first6=Oleksandra|last7=Toma|first7=Ioan|last8=Umbrich|first8=Jürgen|last9=Wahler|first9=Alexander|s2cid=213620389|author-link=Dieter Fensel|url-access=subscription}}</ref> Ramanathan V. Guha, a creator of web standards including RDF,<ref>{{cite journal |last1=Guns |first1=Raf |date=2013 |title=Tracing the origins of the semantic web |journal=Journal of the American Society for Information Science and Technology |volume=64 |issue=10 |pages=2173–2181 |doi=10.1002/asi.22907 |hdl-access=free |hdl=10067/1111170151162165141}}</ref> RSS, and Schema.org,<ref>{{cite news |last1=Funke |first1=Daniel |date=7 December 2017 |title=This website helps you find related fact checks - and it was built by a 17-year-old |url=https://www.poynter.org/fact-checking/2017/this-website-helps-you-find-related-fact-checks-%C2%97-and-it-was-built-by-a-17-year-old/ |access-date=16 July 2024 |work=Poynter}}</ref> founded the project,<ref>{{Cite web |last=Guha |first=Ramanathan V. |author-link=Ramanathan V. Guha |date=15 October 2020 |title=Data Commons, now accessible on Google Search |url=https://docs.datacommons.org/2020/10/15/search_launch.html |access-date=2020-10-16 |website=docs.datacommons.org}}</ref> which is now led by Prem Ramaswami.<ref>{{cite news |last1=O'Donnell |first1=James |date=12 September 2024 |title=Google's new tool lets large language models fact-check their responses |url=https://www.technologyreview.com/2024/09/12/1103926/googles-new-tool-lets-large-language-models-fact-check-their-responses/ |access-date=17 September 2024 |work=MIT Technology Review |language=en}}</ref>

The Data Commons website was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network.<ref>{{cite web |url=http://www.datacommons.org/factcheck/ |title=Fact Checks |date=29 March 2019 |website=datacommons.org |access-date=14 October 2020}}</ref><ref>{{Cite book|last1=Jiang|first1=Shan|last2=Baumgartner|first2=Simon|last3=Ittycheriah|first3=Abe|last4=Yu|first4=Cong|title=Proceedings of the Web Conference 2020 |chapter=Factoring Fact-Checks: Structured Information Extraction from Fact-Checking Articles |date=2020-04-20|chapter-url=https://dl.acm.org/doi/10.1145/3366423.3380231|series=WWW '20|language=en|location=Taipei Taiwan|publisher=ACM|pages=1592–1603|doi=10.1145/3366423.3380231|isbn=978-1-4503-7023-3|s2cid=215882520}}</ref> Google has worked with partners such as the United Nations (UN) to populate the repository,<ref name="Google 0923"/> which also includes data from the United States Census, the World Bank, the US Bureau of Labor Statistics,<ref>{{Cite web|last=Raghavan|first=Prabhakar|author-link=Prabhakar Raghavan|date=2020-10-15|title=How AI is powering a more helpful Google|url=https://blog.google/products/search/search-on/|access-date=2020-10-16|website=Google|language=en}}</ref> Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation.<ref name=":1">{{Cite journal|last1=Sheth|first1=Amit|last2=Padhee|first2=Swati|last3=Gyrard|first3=Amelie|last4=Sheth|first4=Amit|date=2019-07-01|title=Knowledge Graphs and Knowledge Networks: The Story in Brief|journal=IEEE Internet Computing|volume=23|issue=4|pages=67–75|doi=10.1109/MIC.2019.2928449|arxiv=2003.03623|bibcode=2019IIC....23d..67S |s2cid=204820800|issn=1089-7801}}</ref>

The service expanded during 2019 to include an RDF-style knowledge graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019.<ref>{{cite web|last1=Luong|first1=Daphne|last2=Chou|first2=Charina|date=5 March 2019|title=Doing our part to share open data responsibly|url=https://www.blog.google/technology/ai/sharing-open-data/|access-date=14 October 2020|website=The Keyword}}</ref> In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.<ref>{{cite news |last=Ramasubramanian |first=Sowmya |date=21 September 2020 |title=Google's open source data to study impact of COVID-19 |url=https://www.thehindu.com/sci-tech/technology/googles-open-source-data-to-study-impact-of-covid-19/article32660642.ece |work=The Hindu | access-date=14 October 2020}}</ref> In 2023, the service relaunched with a natural-language front end powered by a large language model.<ref name="Google 0923"/> It also launched as the back end to the UN data portal with Sustainable Development Goals data.<ref>{{cite news |last1=Manyika |first1=James |title=Using data and AI to track progress toward the UN Global Goals |url=https://blog.google/technology/ai/google-ai-data-un-global-goals/ |access-date=22 July 2024 |work=Google |date=19 September 2023 |language=en-us}}</ref>

== Features == Data Commons places more emphasis on statistical data than is common for linked data and knowledge graph initiatives. It includes geographical, demographic, weather and real estate data alongside other categories,<ref name=":0" /> describing states, Congressional districts, and cities in the United States as well as biological specimens, power plants, and elements of the human genome via the Encyclopedia of DNA Elements (ENCODE) project.<ref name=":1" /> It represents data as semantic triples each of which can have its own provenance.<ref name=":0" /> It centers on the entity-oriented integration of statistical observations from a variety of public datasets. Although it supports a subset of the W3C SPARQL query language,<ref>{{cite web |url=https://docs.datacommons.org/api/python/query.html |title=Query the Data Commons Knowledge Graph using SPARQL |website=datacommons.org |access-date=14 October 2020}}</ref> its APIs<ref>{{cite web |url=https://docs.datacommons.org/api/ |title=Overview |website=datacommons.org |access-date=14 October 2020}}</ref> also include tools — such as a Pandas dataframe interface — oriented towards data science, statistics and data visualization.

Data Commons is integrative, meaning that it does not provide a hosting platform for different datasets, but rather attempts to consolidate much of the information provided by the datasets into a single data graph.

== Technology == Data Commons is built on a graph data-model. The graph can be accessed through a browser interface and several APIs,<ref name=":0" /><ref name=":1" /> and is expanded through loading data (typically CSV and MCF-based templates).<ref>{{cite web |title=Contributing to Data Commons – Adding datasets |url=https://docs.datacommons.org/contributing/adding_datasets.html |website=datacommons.org |publisher=Data Commons |access-date=2020-10-14 |archive-date=2020-09-19 |archive-url=https://web.archive.org/web/20200919001318/https://docs.datacommons.org/contributing/adding_datasets.html |url-status=dead }}</ref> The graph can be accessed by natural language queries in Google Search.<ref>{{Cite web|last=Guha|first=Ramanathan V.|author-link=Ramanathan V. Guha|date=15 October 2020|title=Data Commons, now accessible on Google Search|url=https://docs.datacommons.org/2020/10/15/search_launch.html|access-date=2020-10-16|website=docs.datacommons.org}}</ref> The data vocabulary used to define the datacommons.org graph is based upon Schema.org.<ref name=":0" /> In particular the Schema.org terms StatisticalPopulation<ref>{{cite web |url=https://schema.org/StatisticalPopulation |title=StatisticalPopulation type at Schema.org |website=schema.org |access-date=14 October 2020}}</ref> and Observation<ref>{{cite web |url=https://schema.org/Observation |title=Observation type at Schema.org |website=schema.org |access-date=14 October 2020}}</ref> were proposed to Schema.org to support datacommons-like use cases.<ref>{{cite web |url=https://github.com/schemaorg/schemaorg/issues/2291 |title=Proposal for representing Aggregate Statistical Data |date=25 June 2019 |website=GitHub – Schema.org repository |access-date=14 October 2020}}</ref>

Software from the project is available on GitHub under Apache 2 license.<ref>{{cite web |url=https://github.com/datacommonsorg/ |title=datacommons.org GitHub|website=GitHub }}</ref>

== References == {{reflist}}

== External links == * {{Official website}} * [https://github.com/datacommonsorg/ GitHub repository]

{{Google LLC}}

Category:Google Category:Knowledge graphs Category:Open data