Progress in artificial intelligence

{{Short description|none}} {{see also|History of artificial intelligence|Timeline of artificial intelligence}} {{Artificial intelligence}} [[File:Test_scores_of_AI_systems_on_various_capabilities_relative_to_human_performance_-_Our_World_in_Data.png|thumb|Artificial intelligence, especially foundation models, has made rapid progress, surpassing human capabilities in various benchmarks.]] '''Progress in artificial intelligence''' ('''AI''') refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. The society as a whole is looking for artificial intelligence to be on a key factor in the upcming years because of its potential. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."<ref>[http://www.cnn.com/2006/TECH/science/07/24/ai.bostrom/ AI set to exceed human brain power] {{Webarchive|url=https://web.archive.org/web/20080219001624/http://www.cnn.com/2006/TECH/science/07/24/ai.bostrom/ |date=2008-02-19 }} CNN.com (July 26, 2006)</ref><ref name=andreas>{{cite journal|doi=10.1016/j.bushor.2018.08.004|title=Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence|journal=Business Horizons|volume=62|pages=15–25|year=2019|last1=Kaplan|first1=Andreas|last2=Haenlein|first2=Michael|s2cid=158433736}}</ref>

"Many thousands of AI applications are deeply embedded in the infrastructure of every industry."<ref name="Kurzweil2005p264">{{Harvnb|Kurtzweil|2005|p=264}}</ref> In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems,<ref name="Kurzweil2005p264" /><ref>{{Citation|last=National Research Council|title=Funding a Revolution: Government Support for Computing Research|year=1999|author-link=United States National Research Council|chapter=Developments in Artificial Intelligence|publisher=National Academy Press|isbn=978-0-309-06278-7|oclc=246584055|chapter-url-access=registration|chapter-url=https://archive.org/details/fundingrevolutio00nati}} under "Artificial Intelligence in the 90s"</ref> but the field was rarely credited for these successes at the time.

Kaplan and Haenlein structure artificial intelligence along three evolutionary stages:

# Artificial narrow intelligence – AI capable only of specific tasks; # Artificial general intelligence – AI with ability in several areas, and able to autonomously solve problems they were never even designed for; # Artificial superintelligence – AI capable of general tasks, including scientific creativity, social skills, and general wisdom.<ref name=andreas/>

To allow comparison with human performance, artificial intelligence can be evaluated on constrained and well-defined problems. Such tests have been termed subject-matter expert Turing tests. Also, smaller problems provide more achievable goals and there are an ever-increasing number of positive results.

In 2023, humans still substantially outperformed both GPT-4 and other models tested on the ConceptARC benchmark. Those models scored 60% on most, and 77% on one category, while humans scored 91% on all and 97% on one category.<ref name="Nature-20230725">{{cite news |last=Biever |first=Celeste |title= ChatGPT broke the Turing test — the race is on for new ways to assess AI |url= https://www.nature.com/articles/d41586-023-02361-7? |date=25 July 2023 |work=Nature |access-date=26 July 2023 }}</ref> However, later research in 2025 showed that human-generated output grids were only accurate 73% of the time, while AI models available that year managed to score above 77%.<ref name="ConceptARC2025">{{Cite arXiv|last1=Beger |first1=Claas |last2=Yi |first2=Ryan |last3=Fu |first3=Shuhao |last4=Moskvichev |first4=Arseny |last5=Tsai |first5=Sarah W. |last6=Rajamanickam |first6=Sivasankaran |last7=Mitchell |first7=Melanie |title=Do AI Models Perform Human-like Abstract Reasoning Across Modalities? |ref=S3.SS1.p3 |eprint=2510.02125 |date=6 October 2025 |class=cs.AI |quote=we found that human-generated output grids achieved an overall pass@1 accuracy of 73% on the 480 ConceptARC tasks, lower than that of the top reasoning models in the textual modality.}}</ref>

== History == Increasing, promoting or constraining AI progress has often be done via controlling or increasing the amount of compute.<ref name=":3">{{Cite web |title="The Main Resource is the Human"|url=https://cset.georgetown.edu/publication/the-main-resource-is-the-human/|access-date=2026-05-27|website=Center for Security and Emerging Technology|language=en-US|date=2023-04-01}}</ref><ref name=":02">{{Cite web |date=2018|title=AI and compute|url=https://openai.com/index/ai-and-compute/|access-date=2026-05-27|website=OpenAI|language=en-US}}</ref>

==Current performance in specific areas== {| class="wikitable floatright" style="text-align: center" |- !width=70| Game !width=30| Champion year<ref>Approximate year AI started beating top human experts</ref> !align=center width=30| Legal states (log<sub>10</sub>)<ref name=solved>{{cite journal|last1=van den Herik|first1=H.Jaap|last2=Uiterwijk|first2=Jos W.H.M.|last3=van Rijswijck|first3=Jack|title=Games solved: Now and in the future|journal=Artificial Intelligence|date=January 2002|volume=134|issue=1–2|pages=277–311|doi=10.1016/S0004-3702(01)00152-7|doi-access=free}}</ref> !align=center width=30| Game tree complexity (log<sub>10</sub>)<ref name=solved/> !width=40| Game of perfect information? !width=15|{{Reference column heading}} |- | Draughts (checkers) || 1994 || 21 || 31 || Perfect || <ref>{{cite news|last1=Madrigal|first1=Alexis C.|title=How Checkers Was Solved|url=https://www.theatlantic.com/technology/archive/2017/07/marion-tinsley-checkers/534111/|access-date=6 May 2018|work=The Atlantic|date=2017|archive-date=6 May 2018|archive-url=https://web.archive.org/web/20180506124806/https://www.theatlantic.com/technology/archive/2017/07/marion-tinsley-checkers/534111/|url-status=live}}</ref> |- | Othello (reversi) || 1997 || 28 || 58 || Perfect ||<ref name=":0">{{Cite web|url=http://berg.earthlingz.de/ocd/programs.php?lang=en|title=www.othello-club.de|website=berg.earthlingz.de|access-date=2018-07-15|archive-date=2018-07-15|archive-url=https://web.archive.org/web/20180715181721/http://berg.earthlingz.de/ocd/programs.php?lang=en|url-status=live}}</ref> |- | Chess || 1997 || 46 || 123 || Perfect || |- | Scrabble || 2006 || || || || <ref name="time top 10"/> |- | Shogi || 2017 || 71 || 226 || Perfect ||<ref name=":1">{{Cite news|url=https://www.japantimes.co.jp/opinion/2017/06/29/editorials/shogi-prodigy-breathes-new-life-game/#.W0svONUza01|title=Shogi prodigy breathes new life into the game {{!}} The Japan Times|work=The Japan Times|access-date=2018-07-15|language=en-US|archive-date=2018-07-15|archive-url=https://web.archive.org/web/20180715181446/https://www.japantimes.co.jp/opinion/2017/06/29/editorials/shogi-prodigy-breathes-new-life-game/#.W0svONUza01|url-status=live}}</ref> |- | Go || 2017 || 172 || 360 || Perfect || |- | 2p no-limit {{nowrap|hold 'em}}|| 2017 || || || Imperfect || <ref name=libratus/> |- | ''StarCraft'' ||style="background:#e5d1cb"|-|| 270+|| || Imperfect || <ref>{{cite magazine|title=Facebook Quietly Enters StarCraft War for AI Bots, and Loses|url=https://www.wired.com/story/facebook-quietly-enters-starcraft-war-for-ai-bots-and-loses/|access-date=6 May 2018|magazine=WIRED|date=2017|archive-date=7 May 2018|archive-url=https://web.archive.org/web/20180507091035/https://www.wired.com/story/facebook-quietly-enters-starcraft-war-for-ai-bots-and-loses/|url-status=live}}</ref> |- | ''StarCraft II'' ||style="background:#e5d1cb"|2019 || || || Imperfect || <ref>{{cite news |last1=Sample |first1=Ian |title=AI becomes grandmaster in 'fiendishly complex' StarCraft II |url=https://www.theguardian.com/technology/2019/oct/30/ai-becomes-grandmaster-in-fiendishly-complex-starcraft-ii |access-date=28 February 2020 |work=The Guardian |date=30 October 2019 |archive-date=29 December 2020 |archive-url=https://web.archive.org/web/20201229185547/https://www.theguardian.com/technology/2019/oct/30/ai-becomes-grandmaster-in-fiendishly-complex-starcraft-ii |url-status=live }}</ref> |}

There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas.

AI, like electricity or the steam engine, is a general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at.<ref>{{cite journal|last1=Brynjolfsson|first1=Erik|last2=Mitchell|first2=Tom|title=What can machine learning do? Workforce implications|url=https://www.science.org/doi/10.1126/science.aap8062|access-date=7 May 2018|journal=Science|date=22 December 2017|volume=358|issue=6370|pages=1530–1534|language=en|doi=10.1126/science.aap8062|pmid=29269459 |bibcode=2017Sci...358.1530B|s2cid=4036151 |archive-date=29 September 2021|archive-url=https://web.archive.org/web/20210929161800/https://www.science.org/doi/10.1126/science.aap8062|url-status=live|url-access=subscription}}</ref> Some versions of Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection.<ref>{{cite news|title=IKEA furniture and the limits of AI|url=https://www.economist.com/news/leaders/21740735-humans-have-had-good-run-most-recent-breakthrough-robotics-it-clear|access-date=24 April 2018|newspaper=The Economist|date=2018|language=en|archive-date=24 April 2018|archive-url=https://web.archive.org/web/20180424014106/https://economist.com/news/leaders/21740735-humans-have-had-good-run-most-recent-breakthrough-robotics-it-clear|url-status=live}}</ref> While projects such as AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets.<ref>{{cite news|last1=Sample|first1=Ian|title='It's able to create knowledge itself': Google unveils AI that learns on its own|url=https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|access-date=7 May 2018|work=the Guardian|date=18 October 2017|language=en|archive-date=19 October 2017|archive-url=https://web.archive.org/web/20171019213849/https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own|url-status=live}}</ref><ref>{{cite news|title=The AI revolution in science|url=https://www.science.org/content/article/ai-revolution-science|access-date=7 May 2018|work=Science {{!}} AAAS|date=5 July 2017|language=en|archive-date=14 December 2021|archive-url=https://web.archive.org/web/20211214221104/https://www.science.org/content/article/ai-revolution-science|url-status=live}}</ref> Researcher Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI."<ref>{{cite news|title=Will your job still exist in 10 years when the robots arrive?|url=https://www.scmp.com/tech/innovation/article/2098164/robots-are-coming-here-are-some-jobs-wont-exist-10-years|access-date=7 May 2018|work=South China Morning Post|date=2017|language=en|archive-date=7 May 2018|archive-url=https://web.archive.org/web/20180507221346/http://www.scmp.com/tech/innovation/article/2098164/robots-are-coming-here-are-some-jobs-wont-exist-10-years|url-status=live}}</ref>

Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system. AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016. Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player Lee Sedol.<ref>{{Cite journal|last=Mokyr|first=Joel|date=2019-11-01|title=The Technology Trap: Capital Labor, and Power in the Age of Automation. By Carl Benedikt Frey. Princeton: Princeton University Press, 2019. Pp. 480. $29.95, hardcover.|journal=The Journal of Economic History|volume=79|issue=4|pages=1183–1189|doi=10.1017/s0022050719000639|s2cid=211324400|issn=0022-0507}}</ref> Games of imperfect knowledge provide new challenges to AI in the area of game theory; the most prominent milestone in this area was brought to a close by Libratus' poker victory in 2017.<ref>{{cite news|last1=Borowiec|first1=Tracey Lien, Steven|title=AlphaGo beats human Go champ in milestone for artificial intelligence|url=https://www.latimes.com/world/asia/la-fg-korea-alphago-20160312-story.html|access-date=7 May 2018|work=Los Angeles Times|date=2016|archive-date=13 May 2018|archive-url=https://web.archive.org/web/20180513234132/http://www.latimes.com/world/asia/la-fg-korea-alphago-20160312-story.html|url-status=live}}</ref><ref>{{cite journal|last1=Brown|first1=Noam|last2=Sandholm|first2=Tuomas|title=Superhuman AI for heads-up no-limit poker: Libratus beats top professionals|journal=Science|date=26 January 2018|volume=359 |issue=6374 |pages=418–424|language=en|doi=10.1126/science.aao1733|pmid=29249696 |bibcode=2018Sci...359..418B |s2cid=5003977 |doi-access=free}}</ref> E-sports continue to provide additional benchmarks; Facebook AI, Deepmind, and others have engaged with the popular ''StarCraft'' franchise of videogames.<ref>{{cite journal|last1=Ontanon|first1=Santiago|last2=Synnaeve|first2=Gabriel|last3=Uriarte|first3=Alberto|last4=Richoux|first4=Florian|last5=Churchill|first5=David|last6=Preuss|first6=Mike|title=A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft|journal=IEEE Transactions on Computational Intelligence and AI in Games|date=December 2013|volume=5|issue=4|pages=293–311|doi=10.1109/TCIAIG.2013.2286295|citeseerx=10.1.1.406.2524|s2cid=5014732}}</ref><ref>{{cite magazine|title=Facebook Quietly Enters StarCraft War for AI Bots, and Loses|url=https://www.wired.com/story/facebook-quietly-enters-starcraft-war-for-ai-bots-and-loses/|access-date=7 May 2018|magazine=WIRED|date=2017|archive-date=2 February 2023|archive-url=https://web.archive.org/web/20230202181319/https://www.wired.com/story/facebook-quietly-enters-starcraft-war-for-ai-bots-and-loses/|url-status=live}}</ref>

Broad classes of outcome for an AI test may be given as: * '''optimal''': it is not possible to perform better (note: some of these entries were solved by humans) * '''super-human''': performs better than all humans * '''high-human''': performs better than most humans * '''par-human''': performs similarly to most humans * '''sub-human''': performs worse than most humans

===Optimal=== {{See also|Solved game}}

* Tic-tac-toe * Connect Four: 1988 * Checkers (aka 8x8 draughts): Weakly solved (2007)<ref>{{Cite journal| last1 = Schaeffer | first1 = J.| last2 = Burch | first2 = N.| last3 = Bjornsson | first3 = Y.| last4 = Kishimoto | first4 = A.| last5 = Muller | first5 = M.| last6 = Lake | first6 = R.| last7 = Lu | first7 = P.| last8 = Sutphen | first8 = S.| title = Checkers is solved| journal = Science| volume = 317| issue = 5844| pages = 1518–1522| year = 2007| pmid = 17641166| doi = 10.1126/science.1144079| citeseerx = 10.1.1.95.5393| bibcode = 2007Sci...317.1518S| s2cid = 10274228}}</ref> * Rubik's Cube: Mostly solved (2010)<ref>{{cite web|url=http://www.cube20.org/|title=God's Number is 20|access-date=2011-08-07|archive-date=2013-07-21|archive-url=https://web.archive.org/web/20130721182026/http://www.cube20.org/|url-status=live}}</ref> * Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015)<ref>{{Cite journal | doi = 10.1126/science.1259433| title = Heads-up limit hold'em poker is solved| journal = Science| volume = 347| issue = 6218| pages = 145–9| year = 2015| last1 = Bowling| first1 = M.| last2 = Burch| first2 = N.| last3 = Johanson| first3 = M.| last4 = Tammelin| first4 = O.| pmid=25574016| bibcode = 2015Sci...347..145B| citeseerx = 10.1.1.697.72| s2cid = 3796371}}</ref>

===Super-human=== * Othello (aka reversi): c. 1997<ref name=":0" /> * Scrabble:<ref>{{cite magazine|title=In Major AI Breakthrough, Google System Secretly Beats Top Player at the Ancient Game of Go|url=https://www.wired.com/2016/01/in-a-huge-breakthrough-googles-ai-beats-a-top-player-at-the-game-of-go/|access-date=28 December 2017|magazine=WIRED|archive-date=2 February 2017|archive-url=https://web.archive.org/web/20170202211927/https://www.wired.com/2016/01/in-a-huge-breakthrough-googles-ai-beats-a-top-player-at-the-game-of-go/|url-status=live}}</ref><ref>{{Cite journal| last1 = Sheppard | first1 = B.| title = World-championship-caliber Scrabble| journal = Artificial Intelligence| volume = 134| issue = 1–2| pages = 241–275| year = 2002| doi = 10.1016/S0004-3702(01)00166-7| doi-access = free}}</ref> 2006<ref name="time top 10">{{cite magazine|last1=Webley|first1=Kayla|title=Top 10 Man-vs.-Machine Moments|url=https://content.time.com/time/specials/packages/article/0,28804,2049187_2049195_2049083,00.html|access-date=28 December 2017|magazine=Time|date=15 February 2011|archive-date=26 December 2017|archive-url=https://web.archive.org/web/20171226005745/http://content.time.com/time/specials/packages/article/0,28804,2049187_2049195_2049083,00.html|url-status=live}}</ref> * Backgammon: c. 1995–2002<ref>{{cite journal |last=Tesauro |first=Gerald |url=http://www.research.ibm.com/massive/tdl.html |title=Temporal difference learning and TD-Gammon |journal=Communications of the ACM |volume=38 |issue=3 |date=March 1995 |pages=58–68 |doi=10.1145/203330.203343 |s2cid=8763243 |access-date=2008-03-26 |archive-date=2013-01-11 |archive-url=https://web.archive.org/web/20130111060444/http://www.research.ibm.com/massive/tdl.html |url-status=live |doi-access=free }}</ref><ref>{{cite journal|last1=Tesauro|first1=Gerald|title=Programming backgammon using self-teaching neural nets|journal=Artificial Intelligence|date=January 2002|volume=134|issue=1–2|pages=181–199|doi=10.1016/S0004-3702(01)00110-2|quote=...at least two other neural net programs also appear to be capable of superhuman play}}</ref> * Chess: Supercomputer (c. 1997); Personal computer (c. 2006);<ref>{{Cite news|url=https://en.chessbase.com/post/kramnik-vs-deep-fritz-computer-wins-match-by-4-2|title=Kramnik vs Deep Fritz: Computer wins match by 4:2|date=2006-12-05|work=Chess News|access-date=2018-07-15|language=en-US|archive-date=2018-11-25|archive-url=https://web.archive.org/web/20181125095823/https://en.chessbase.com/post/kramnik-vs-deep-fritz-computer-wins-match-by-4-2|url-status=live}}</ref> Mobile phone (c. 2009);<ref>{{Cite web|url=http://theweekinchess.com/html/twic771.html#13|title=The Week in Chess 771|website=theweekinchess.com|access-date=2018-07-15|archive-date=2018-11-15|archive-url=https://web.archive.org/web/20181115020103/http://theweekinchess.com/html/twic771.html#13|url-status=live}}</ref> Computer defeats human + computer (c. 2017)<ref>{{Cite web|url=http://www.infinitychess.com/Page/Public/Article/DefaultArticle.aspx?id=322|title=Zor Winner in an Exciting Photo Finish|last=Nickel|first=Arno|date=May 2017|website=www.infinitychess.com|publisher=Innovative Solutions|access-date=2018-07-17|quote=... on third place the best centaur ...|archive-date=2018-08-17|archive-url=https://web.archive.org/web/20180817111551/http://infinitychess.com/Page/Public/Article/DefaultArticle.aspx?id=322|url-status=live}}</ref> *''Jeopardy!'': Question answering, although the machine did not use speech recognition (2011)<ref>{{Cite news |last=Markoff |first=John |date=2011-02-16 |title=Computer Wins on 'Jeopardy!': Trivial, It's Not |language=en-US |work=The New York Times |url=https://www.nytimes.com/2011/02/17/science/17jeopardy-watson.html |access-date=2023-02-22 |issn=0362-4331}}</ref><ref name="contentpages">{{cite news|url=https://www.pcworld.com/article/219893/ibm_watson_vanquishes_human_jeopardy_foes.html|title=IBM Watson Vanquishes Human Jeopardy Foes|last=Jackson|first=Joab|publisher=PC World|agency=IDG News|access-date=2011-02-17|archive-date=2011-02-20|archive-url=https://web.archive.org/web/20110220020908/http://www.pcworld.com/article/219893/ibm_watson_vanquishes_human_jeopardy_foes.html|url-status=live}}</ref> * Arimaa: 2015<ref>{{Cite web|url=http://arimaa.com/arimaa/challenge/|title=The Arimaa Challenge|website=arimaa.com|access-date=2018-07-15|archive-date=2010-03-22|archive-url=https://web.archive.org/web/20100322201944/http://arimaa.com/arimaa/challenge/|url-status=live}}</ref><ref>{{cite news|last1=Roeder|first1=Oliver|title=The Bots Beat Us. Now What?|url=https://fivethirtyeight.com/features/the-bots-beat-us-now-what/|access-date=28 December 2017|work=FiveThirtyEight|date=10 July 2017|archive-date=28 December 2017|archive-url=https://web.archive.org/web/20171228171720/https://fivethirtyeight.com/features/the-bots-beat-us-now-what/|url-status=dead}}</ref> * Shogi: c. 2017<ref name=":1" /> * Go: 2017<ref>{{Cite news|url=https://www.theverge.com/2017/5/25/15689462/alphago-ke-jie-game-2-result-google-deepmind-china|title=AlphaGo beats Ke Jie again to wrap up three-part match|work=The Verge|access-date=2018-07-15|archive-date=2018-07-15|archive-url=https://web.archive.org/web/20180715181631/https://www.theverge.com/2017/5/25/15689462/alphago-ke-jie-game-2-result-google-deepmind-china|url-status=live}}</ref> * Heads-up no-limit hold'em poker: 2017<ref name=libratus>{{Cite journal| last1 = Brown | first1 = Noam| last2 = Sandholm| first2 = Tuomas|title = Superhuman AI for heads-up no-limit poker: Libratus beats top professionals| journal = Science| volume = 359| issue = 6374| pages = 418–424| year = 2017| doi = 10.1126/science.aao1733| pmid = 29249696| bibcode = 2018Sci...359..418B| doi-access = free}}</ref> * Six-player no-limit hold'em poker: 2019<ref>{{cite journal |last1=Blair |first1=Alan |last2=Saffidine |first2=Abdallah |title=AI surpasses humans at six-player poker |journal=Science |date=30 August 2019 |volume=365 |issue=6456 |pages=864–865 |doi=10.1126/science.aay7774 |pmid=31467208 |bibcode=2019Sci...365..864B |s2cid=201672421 |url=https://www.science.org/doi/10.1126/science.aay7774 |access-date=30 June 2022 |archive-date=18 July 2022 |archive-url=https://web.archive.org/web/20220718200144/https://www.science.org/doi/10.1126/science.aay7774 |url-status=live |url-access=subscription }}</ref> * ''Gran Turismo Sport'': 2022<ref>{{Cite news|url=https://www.theverge.com/2022/2/9/22925420/sony-ai-gran-turismo-driving-gt-sophy-nature-paper|title=Sony's new AI driver achieves 'reliably superhuman' race times in Gran Turismo|work=The Verge|access-date=2022-07-19|archive-date=2022-07-20|archive-url=https://web.archive.org/web/20220720023339/https://www.theverge.com/2022/2/9/22925420/sony-ai-gran-turismo-driving-gt-sophy-nature-paper|url-status=live}}</ref>

===High-human=== * Crosswords: c. 2012<ref>{{cite conference |last1=Keim |first1=Greg A. |last2=Shazeer |first2=Noam |last3=Littman |first3=Michael L. |title=Proverb: The probabilistic cruciverbalist |book-title=Proceedings of the Sixteenth National Conference on Artificial Intelligence |year=1999 |pages=710–717}}</ref><ref>{{cite news|title='Dr. Fill' vies for crossword solving supremacy, but still comes up short|url=https://theworld.org/stories/2014/09/24/dr-fill-vies-crossword-solving-supremacy|first=Adam|last=Wernick|date=24 Sep 2014|publisher=Public Radio International|url-status=live|archive-url=https://web.archive.org/web/20171228175445/https://www.pri.org/stories/2014-09-24/dr-fill-vies-crossword-solving-supremacy-still-comes-short|archive-date=2017-12-28}}</ref> * ''Freeciv'': 2016<ref>{{cite news|title=Arago's AI can now beat some human players at complex civ strategy games.|url=https://techcrunch.com/2016/12/06/aragos-ai-can-now-beat-some-human-players-at-complex-civ-strategy-games/|work=TechCrunch|date=6 December 2016|archive-url=https://web.archive.org/web/20220605151457/https://techcrunch.com/2016/12/06/aragos-ai-can-now-beat-some-human-players-at-complex-civ-strategy-games/|archive-date=5 June 2022}}</ref> * ''Dota 2'': 2018<ref>{{cite news|title=AI bots trained for 180 years a day to beat humans at Dota 2.|url=https://www.theverge.com/2018/6/25/17492918/openai-dota-2-bot-ai-five-5v5-matches|work=The Verge|date=25 June 2018|archive-url=https://web.archive.org/web/20180625183203/https://www.theverge.com/2018/6/25/17492918/openai-dota-2-bot-ai-five-5v5-matches|archive-date=25 June 2018}}</ref> * Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding.<ref>{{cite journal |last=Bethe |first=Paul M. |title=The state of automated bridge play |journal=Computers and Games |year=2009 |publisher=Springer}}</ref> * ''StarCraft II'': 2019<ref>{{cite web|url=https://deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii|title=AlphaStar: Mastering the Real-Time Strategy Game StarCraft II|date=24 January 2019|archive-url=https://web.archive.org/web/20220722140448/https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii|archive-date=2022-07-22}}</ref> * ''Mahjong'': 2019<ref>{{cite web|url=https://www.microsoft.com/en-us/research/project/suphx-mastering-mahjong-with-deep-reinforcement-learning/|title=Suphx: The World Best Mahjong AI|website=Microsoft Research|archive-url=https://web.archive.org/web/20220719233921/https://www.microsoft.com/en-us/research/project/suphx-mastering-mahjong-with-deep-reinforcement-learning/|archive-date=2022-07-19}}</ref> * ''Stratego'': 2022<ref>{{cite journal |title=Mastering Stratego with Model-Free Multiagent Reinforcement Learning |journal=Science |year=2022}}</ref> * No-Press ''Diplomacy'': 2022<ref>{{cite arXiv |eprint=2210.05492 |title=Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning |date=11 October 2022 |class=cs.GT |last1=Bakhtin |first1=Anton |last2=Wu |first2=David |last3=Lerer |first3=Adam |last4=Gray |first4=Jonathan |last5=Jacob |first5=Athul |last6=Farina |first6=Gabriele |last7=Miller |first7=Alexander |last8=Brown |first8=Noam }}</ref> * ''Hanabi'': 2022<ref>{{cite arXiv |eprint=2210.05125 |title=Human-AI Coordination via Human-Regularized Search and Learning |date=11 October 2022 |class=cs.AI |last1=Hu |first1=Hengyuan |last2=Wu |first2=David |last3=Lerer |first3=Adam |last4=Foerster |first4=Jakob |last5=Brown |first5=Noam }}</ref> * Natural language processing<ref>{{cite journal |last1=Devlin |first1=Jacob |last2=Chang |first2=Ming-Wei |last3=Lee |first3=Kenton |last4=Toutanova |first4=Kristina |title=BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |journal=Naacl-HLT |year=2019}}</ref>

=== Par-human ===

*Optical character recognition for ISO 1073-1:1976 and similar special characters.{{citation needed|date=January 2023}} *Classification of images<ref>{{cite web|url=https://venturebeat.com/2015/02/09/microsoft-researchers-say-their-newest-deep-learning-system-beats-humans-and-google/|title=Microsoft researchers say their newest deep learning system beats humans -- and Google - VentureBeat - Big Data - by Jordan Novet|work=VentureBeat|date=2015-02-10|access-date=2017-09-08|archive-date=2017-08-09|archive-url=https://web.archive.org/web/20170809060105/https://venturebeat.com/2015/02/09/microsoft-researchers-say-their-newest-deep-learning-system-beats-humans-and-google/|url-status=live}}</ref> *Handwriting recognition<ref>{{cite arXiv |eprint=1605.06065 |title=One-shot Learning with Memory-Augmented Neural Networks |at=p. 5, Table 1 |date= 19 May 2016 |quote=4.2. Omniglot Classification: "The network exhibited high classification accuracy on just the second presentation of a sample from a class within an episode (82.8%), reaching up to 94.9% accuracy by the fifth instance and 98.1% accuracy by the tenth. |last1=Santoro |first1=Adam |last2=Bartunov |first2=Sergey |last3=Botvinick |first3=Matthew |last4= Wierstra |first4=Daan |last5=Lillicrap |first5=Timothy |class=cs.LG }}</ref> * Facial recognition<ref>{{cite web|url=https://neurosciencenews.com/man-machine-facial-recognition-120191/|title=Man Versus Machine: Who Wins When It Comes to Facial Recognition?|work=Neuroscience News|date=2018-12-03|access-date=2022-07-20|archive-date=2022-07-20|archive-url=https://web.archive.org/web/20220720033355/https://neurosciencenews.com/man-machine-facial-recognition-120191/|url-status=live}}</ref> * Visual question answering<ref>{{cite arXiv |eprint=2111.08896 |title=Achieving Human Parity on Visual Question Answering |date= 17 November 2021 |class=cs.CL |last1=Yan |first1=Ming |last2=Xu |first2=Haiyang |last3=Li |first3=Chenliang |last4=Tian |first4=Junfeng |last5=Bi |first5=Bin |last6=Wang |first6=Wei |last7=Chen |first7=Weihua |last8=Xu |first8=Xianzhe |last9=Wang |first9=Fan |last10=Cao |first10=Zheng |last11=Zhang |first11=Zhicheng |last12=Zhang |first12=Qiyu |last13=Zhang |first13=Ji |last14=Huang |first14=Songfang |last15=Huang |first15=Fei |last16=Si |first16=Luo |last17=Jin |first17=Rong }}</ref> * SQuAD 2.0 English reading-comprehension benchmark (2019)<ref name="ai index 2021"/> * SuperGLUE English-language understanding benchmark (2020)<ref name="ai index 2021">Zhang, D., Mishra, S., Brynjolfsson, E., Etchemendy, J., Ganguli, D., Grosz, B., ... & Perrault, R. (2021). The AI index 2021 annual report. AI Index (Stanford University). arXiv preprint arXiv:2103.06312.</ref> * Some school science exams (2019)<ref>{{cite news |last1=Metz |first1=Cade |title=A Breakthrough for A.I. Technology: Passing an 8th-Grade Science Test |url=https://www.nytimes.com/2019/09/04/technology/artificial-intelligence-aristo-passed-test.html |access-date=5 January 2023 |work=The New York Times |date=4 September 2019 |archive-date=5 January 2023 |archive-url=https://web.archive.org/web/20230105051148/https://www.nytimes.com/2019/09/04/technology/artificial-intelligence-aristo-passed-test.html |url-status=live }}</ref> * Some tasks based on Raven's Progressive Matrices<ref name="how much intelligence 2020"/> * Many Atari 2600 games (2015)<ref>{{cite magazine |last1=McMillan |first1=Robert |title=Google's AI Is Now Smart Enough to Play Atari Like the Pros |url=https://www.wired.com/2015/02/google-ai-plays-atari-like-pros/ |access-date=5 January 2023 |magazine=Wired |date=2015 |archive-date=5 January 2023 |archive-url=https://web.archive.org/web/20230105053006/https://www.wired.com/2015/02/google-ai-plays-atari-like-pros/ |url-status=live }}</ref>

===Sub-human=== * Optical character recognition for printed text (nearing par-human for Latin-script typewritten text)<ref>{{cite book |last=Smith |first=Ray |chapter=An Overview of the Tesseract OCR Engine |pages=629–633 |title=Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 |year=2007 |doi=10.1109/ICDAR.2007.4376991 |isbn=978-0-7695-2822-9 }}</ref> * Object recognition{{clarify|date=December 2017}}<ref>{{cite book |last1=He |first1=Kaiming |last2=Zhang |first2=Xiangyu |last3=Ren |first3=Shaoqing |last4=Sun |first4=Jian |chapter=Deep Residual Learning for Image Recognition |pages=770–778 |title=2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |year=2016 |doi=10.1109/CVPR.2016.90 |isbn=978-1-4673-8851-1 |url=https://repositorio.unal.edu.co/handle/unal/81443 }}</ref> * Various robotics tasks that may require advances in robot hardware as well as AI, including: ** Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017)<ref>{{cite news|title=Robots with legs are getting ready to walk among us|url=https://www.theverge.com/2017/2/22/14635530/bipedal-legged-robots-mobility-advantages|work=The Verge|date=2017-02-22}}</ref> ** Humanoid soccer<ref>{{cite news|last=Hurst|first=Nathan|title=Why Funny, Falling, Soccer-Playing Robots Matter|url=https://www.smithsonianmag.com/innovation/why-funny-falling-soccer-playing-robots-matter-180964260/|work=Smithsonian}}</ref> * Speech recognition: "nearly equal to human performance" (2017)<ref>{{cite news|title=The Business of Artificial Intelligence|url=https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence|work=Harvard Business Review|date=2017-07-18}}</ref> * Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis.<ref>{{cite journal |last1=Brynjolfsson |first1=Erik |last2=Mitchell |first2=Tom |title=What can machine learning do? Workforce implications |journal=Science |volume=358 |issue=6370 |pages=1530–1534 |year=2017 |doi=10.1126/science.aap8062 |pmid=29269459 |bibcode=2017Sci...358.1530B }}</ref> * Many tests of fluid intelligence (2020)<ref name="how much intelligence 2020">{{cite journal |last1=van der Maas |first1=Han L.J. |last2=Snoek |first2=Lukas |last3=Stevenson |first3=Claire E. |title=How much intelligence is there in artificial intelligence? A 2020 update |journal=Intelligence |volume=87 |year=2021 |article-number=101548 |doi=10.1016/j.intell.2021.101548}}</ref> * Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020)<ref name="how much intelligence 2020"/><ref>{{cite journal |last1=Nie |first1=Weili |last2=Yu |first2=Zhaowei |last3=Mao |first3=Liqiang |last4=Patel |first4=Ankit B. |last5=Zhu |first5=Yuke |last6=Anandkumar |first6=Anima |title=Bongard-LOGO: A New Benchmark for Human-level Concept Learning and Reasoning |journal=Advances in Neural Information Processing Systems |volume=33 |year=2020}}</ref> * Visual Commonsense Reasoning (VCR) benchmark (as of 2020)<ref>{{cite journal |last1=Zhang |first1=Daniel |last2=Mishra |first2=Saurabh |last3=Brynjolfsson |first3=Erik |last4=Etchemendy |first4=John |last5=Ganguli |first5=Deep |last6=Grosz |first6=Barbara |last7=Lyons |first7=Terah |last8=Manyika |first8=James |author9=Juan Carlos Niebles |last10=Sellitto |first10=Michael |last11=Shoham |first11=Yoav |last12=Clark |first12=Jack |last13=Perrault |first13=Raymond |title=The AI Index 2021 Annual Report |journal=Stanford AI Index |year=2021 |arxiv=2103.06312 }}</ref> * Stock market prediction: Financial data collection and processing using Machine Learning algorithms<ref>{{cite journal |last1=Fischer |first1=Thomas |last2=Krauss |first2=Christopher |title=Deep learning with long short-term memory networks for financial market predictions |journal=European Journal of Operational Research |volume=270 |issue=2 |pages=654–669 |year=2018 |doi=10.1016/j.ejor.2017.11.054}}</ref> * ''Angry Birds'' video game, as of 2020<ref>{{cite journal |last1=Stephenson |first1=Matthew |last2=Renz |first2=Jochen |last3=Ge |first3=Xiaoyu |title=The computational complexity of Angry Birds |journal=Artificial Intelligence |volume=280 |year=2020 |article-number=103232 |doi=10.1016/j.artint.2019.103232 |url=https://cris.maastrichtuniversity.nl/en/publications/7f080df4-bbf1-4bd0-a063-1ef39efbe72d |arxiv=1812.07793 }}</ref> * Various tasks that are difficult to solve without contextual knowledge, including: ** Translation<ref>{{cite book |last=Koehn |first=Philipp |title=Neural Machine Translation |publisher=Cambridge University Press |year=2020}}</ref> ** Word-sense disambiguation<ref>{{cite journal |last=Navigli |first=Roberto |title=Word Sense Disambiguation: A Survey |journal=ACM Computing Surveys |volume=41 |issue=2 |year=2009 |doi=10.1145/1459352.1459355}}</ref>

== Proposed tests of artificial intelligence == {{See also|Turing test#Versions}} In his famous Turing test, Alan Turing picked language, the defining feature of human beings, for its basis.<ref>{{Turing 1950}}</ref> The Turing test is now considered too exploitable to be a meaningful benchmark.<ref>{{cite journal|last1=Schoenick|first1=Carissa|last2=Clark|first2=Peter|last3=Tafjord|first3=Oyvind|last4=Turney|first4=Peter|last5=Etzioni|first5=Oren|date=23 August 2017|title=Moving beyond the Turing Test with the Allen AI Science Challenge|journal=Communications of the ACM|volume=60|issue=9|pages=60–64|arxiv=1604.04315|doi=10.1145/3122814|s2cid=6383047}}</ref>

The Feigenbaum test, proposed by the inventor of expert systems, tests a machine's knowledge and expertise about a specific subject.<ref>{{cite journal|last=Feigenbaum|first=Edward A.|date=2003|title=Some challenges and grand challenges for computational intelligence|journal=Journal of the ACM|volume=50|issue=1|pages=32–40|doi=10.1145/602382.602400|s2cid=15379263}}</ref> A paper by Jim Gray of Microsoft in 2003 suggested extending the Turing test to speech understanding, speaking and recognizing objects and behavior.<ref>{{cite journal|last=Gray|first=Jim |year=2003|title=What Next? A Dozen Information-Technology Research Goals|journal=Journal of the ACM|volume=50|issue=1|pages=41–57|bibcode=1999cs.......11005G |arxiv=cs/9911005 |doi=10.1145/602382.602401|s2cid=10336312 }}</ref>

Proposed "universal intelligence" tests aim to compare how well machines, humans, and even non-human animals perform on problem sets that are generic as possible. At an extreme, the test suite can contain every possible problem, weighted by Kolmogorov complexity; however, these problem sets tend to be dominated by impoverished pattern-matching exercises where a tuned AI can easily exceed human performance levels.<ref>{{cite journal|last=Hernandez-Orallo|first=Jose|year=2000|title=Beyond the Turing Test|journal=Journal of Logic, Language and Information|volume=9|issue=4|pages=447–466|doi=10.1023/A:1008367325700|s2cid=14481982}}</ref><ref>{{Cite web |last=Kuang-Cheng |first=Andy Wang |date=2023 |title=International licensing under an endogenous tariff in vertically-related markets |url=https://ingestai.io/ |access-date=2023-04-23 |website=Journal of Economics |language=en}}</ref><ref>{{cite journal|last1=Dowe|first1=D. L.|last2=Hajek|first2=A. R.|year=1997|title=A computational extension to the Turing Test|url=http://www.csse.monash.edu.au/publications/1997/tr-cs97-322-abs.html|journal=Proceedings of the 4th Conference of the Australasian Cognitive Science Society|archive-url=https://web.archive.org/web/20110628194905/http://www.csse.monash.edu.au/publications/1997/tr-cs97-322-abs.html|archive-date=28 June 2011|df=dmy-all}}</ref><ref>{{cite journal|last1=Hernandez-Orallo|first1=J.|last2=Dowe|first2=D. L.|year=2010|title=Measuring Universal Intelligence: Towards an Anytime Intelligence Test|journal=Artificial Intelligence|volume=174|issue=18|pages=1508–1539|citeseerx=10.1.1.295.9079|doi=10.1016/j.artint.2010.09.006}}</ref><ref>{{cite journal|last1=Hernández-Orallo|first1=José|last2=Dowe|first2=David L.|last3=Hernández-Lloreda|first3=M.Victoria|date=March 2014|title=Universal psychometrics: Measuring cognitive abilities in the machine kingdom|journal=Cognitive Systems Research|volume=27|pages=50–74|doi=10.1016/j.cogsys.2013.06.001|hdl-access=free|hdl=10251/50244|s2cid=26440282}}</ref>

== Exams == According to OpenAI, in 2023 GPT-4 achieved high scores on several standardized and professional examinations, including around the 90th percentile on the Uniform Bar Exam, the 89th percentile on the mathematics section of the SAT, the 93rd percentile on SAT Reading and Writing, the 54th percentile on the analytical writing section of the GRE, the 88th percentile on GRE quantitative reasoning, and the 99th percentile on GRE verbal reasoning. OpenAI also reported that GPT-4 scored in the 99th to 100th percentile on the 2020 USA Biology Olympiad semifinal exam and earned top scores on several AP exams.<ref name="insider">{{cite news |last1=Varanasi |first1=Lakshmi |title=AI models like ChatGPT and GPT-4 are acing everything from the bar exam to AP Biology. Here's a list of difficult exams both AI versions have passed |url=https://www.businessinsider.com/list-here-are-the-exams-chatgpt-has-passed-so-far-2023-1 |access-date=22 June 2023 |work=Business Insider |date=March 2023}}</ref>

Independent researchers found in 2023 that ChatGPT based on GPT-3.5 performed "at or near the passing threshold" on all three parts of the United States Medical Licensing Examination (USMLE), suggesting that large language models could reach passing-level performance on some medical knowledge assessments even without domain-specific fine-tuning.<ref>{{cite journal |last1=Kung |first1=Tony H. |last2=Cheatham |first2=Morgan |last3=Medinilla |first3=Arielle |last4=Sillos |first4=Czarina |last5=Leon |first5=Lorie de |last6=Elepaño |first6=Camille |last7=Madriaga |first7=Maria |last8=Aggabao |first8=Rimsha |last9=Diaz-Candido |first9=Giezel |last10=Maningo |first10=James |last11=Tseng |first11=Vera |title=Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models |journal=PLOS Digital Health |volume=2 |issue=2 |article-number=e0000198 |year=2023 |doi=10.1371/journal.pdig.0000198 |doi-access=free|pmid=36812645 |pmc=9931230}}</ref> GPT-3.5 was also reported to attain a low but passing grade on examinations for four law school courses at the University of Minnesota.<ref name="insider" />

Further studies reported that GPT-4 passed a text-based radiology board-style examination.<ref>{{cite journal |last1=Bhayana |first1=Rajesh |last2=Bleakney |first2=Robert R. |last3=Krishna |first3=Satheesh |title=GPT-4 in Radiology: Improvements in Advanced Reasoning |journal=Radiology |date=1 June 2023 |volume=307 |issue=5 |article-number=e230987 |doi=10.1148/radiol.230987 |pmid=37191491 |s2cid=258716171 |url=https://pubs.rsna.org/doi/10.1148/radiol.230987 |url-access=subscription}}</ref> Later radiology studies in 2024–2025 continued to find strong performance by newer models on exam-style questions, including image-based and student radiology examinations, while also noting persistent weaknesses and variation by task type.<ref>{{cite journal |last1=Gotta |first1=Jan |last2=Brendlin |first2=Anna |last3=Hempel |first3=Jan Matthias |last4=Weikert |first4=Thomas |last5=Gassenmaier |first5=Thomas |title=Large language models (LLMs) in radiology exams for medical students: results of GPT-3.5, GPT-4, Perplexity and Bing |journal=European Radiology |year=2025 |volume=197 |issue=9 |pages=1057–1067 |doi=10.1007/s00330-024-11218-0 |pmid=39496293}}</ref><ref>{{cite journal |last1=Sun |first1=Shenghan |last2=Yoon |first2=Hye Mi |last3=Lee |first3=Jin Woo |last4=Kim |first4=Jin Mo |title=Large Language Models with Vision on Diagnostic Radiology Board Examination Questions |journal=Radiology: Artificial Intelligence |year=2025 |volume=32 |issue=5 |pages=3096–3102 |doi=10.1016/j.acra.2024.11.028 |pmid=39632215|doi-access=free }}</ref>

By 2025, comparative studies found substantial variation in medical-exam performance across models rather than a uniform "passing" level. A 2025 benchmarking study on publicly available USMLE sample questions reported that newer models such as ChatGPT and DeepSeek outperformed some rivals, but also made distinct errors and still showed limitations in clinical reasoning and domain-specific understanding.<ref>{{cite journal |last1=Siam |first1=Md Kamrul |last2=Islam |first2=M. Mahmudul |last3=Ahmed |first3=Shadman |last4=Rahman |first4=Md Mizanur |title=Benchmarking large language models on the United States medical licensing examination for clinical reasoning and medical licensing scenarios |journal=Scientific Reports |year=2025 |volume=16 |issue=1 |article-number=1387 |doi=10.1038/s41598-025-31010-4 |pmid=41339739 |pmc=12796295 }}</ref>

Newer legal benchmarks published in 2025 likewise suggested that exam performance remained uneven. The LEXam benchmark, built from 340 law exams across 116 law school courses, found that long-form legal reasoning remained challenging for contemporary large language models, especially on open-ended questions requiring structured, multi-step analysis.<ref>{{cite arXiv |eprint=2505.12864 |title=LEXam: Benchmarking Legal Reasoning on 340 Law Exams |date=2025 |class=cs.CL |last1=Fan |first1=Yu |last2=Ni |first2=Jingwei |last3=Merane |first3=Jakob |last4=Salimbeni |first4=Etienne |last5=Tian |first5=Yang |last6=Hermstrüwer |first6=Yoan |last7=Huang |first7=Yinya |last8=Akhtar |first8=Mubashara |last9=Geering |first9=Florian |last10=Dreyer |first10=Oliver |last11=Brunner |first11=Daniel |last12=Leippold |first12=Markus |last13=Sachan |first13=Mrinmaya |last14=Stremitzer |first14=Alexander |last15=Engel |first15=Christoph |last16=Ash |first16=Elliott |last17=Niklaus |first17=Joel}}</ref>

By 2026, broader work on expert-level academic testing emphasized that many older benchmarks and exam-style tasks were becoming saturated. A 2026 ''Nature'' paper introducing ''Humanity's Last Exam'' argued that state-of-the-art systems had surpassed 90% accuracy on several popular benchmarks, while still showing low accuracy on a more difficult benchmark designed to test the frontier of expert human knowledge.<ref>{{cite journal |last1=Phan |first1=Long |last2=Gatti |first2=Alice |last3=Li |first3=Nathaniel |last4=Khoja |first4=Adam |last5=Kim |first5=Ryan |last6=Ren |first6=Richard |last7=Hausenloy |first7=Jason |last8=Zhang |first8=Oliver |last9=Mazeika |first9=Mantas |last10=Hendrycks |first10=Dan |last11=Han |first11=Ziwen |last12=Hu |first12=Josephina |last13=Zhang |first13=Hugh |last14=Zhang |first14=Chen Bo Calvin |last15=Shaaban |first15=Mohamed |last16=Ling |first16=John |last17=Shi |first17=Sean |last18=Choi |first18=Michael |last19=Agrawal |first19=Anish |last20=Chopra |first20=Arnav |last21=Nattanmai |first21=Aakaash |last22=McKellips |first22=Gordon |last23=Cheraku |first23=Anish |last24=Suhail |first24=Asim |last25=Luo |first25=Ethan |last26=Deng |first26=Marvin |last27=Luo |first27=Jason |last28=Zhang |first28=Ashley |last29=Jindel |first29=Kavin |last30=Paek |first30=Jay |display-authors=1 |title=A benchmark of expert-level academic questions to assess AI capabilities |journal=Nature |year=2026 |volume=649 |issue=8099 |pages=1139–1146 |doi=10.1038/s41586-025-09962-4 |pmid=41606155 |pmc=12851929 |bibcode=2026Natur.649.1139C }}</ref> Stanford HAI also cautioned in 2025 that benchmark and exam performance should not be treated as equivalent to reliable real-world performance or trustworthy decision-making.<ref>{{cite web |title=Validating Claims About AI: A Policymaker's Guide |url=https://hai.stanford.edu/policy/validating-claims-about-ai-a-policymakers-guide |website=Stanford HAI |date=2025-09-24 |access-date=2026-03-20}}</ref>

== Competitions == {{Main|Competitions and prizes in artificial intelligence}}

Many competitions and prizes, such as the Imagenet Challenge, promote research in artificial intelligence. The most common areas of competition include general machine intelligence, conversational behavior, data-mining, robotic cars, and robot soccer as well as conventional games.<ref>{{Cite web|title=ILSVRC2017|url=http://image-net.org/challenges/LSVRC/2017/|access-date=2018-11-06|website=image-net.org|language=en|archive-date=2018-11-02|archive-url=https://web.archive.org/web/20181102131747/http://www.image-net.org/challenges/LSVRC/2017/|url-status=live}}</ref>

==Past and current predictions== An expert poll around 2016, conducted by Katja Grace of the Future of Humanity Institute and associates, gave median estimates of 3 years for championship ''Angry Birds'', 4 years for the World Series of Poker, and 6 years for ''StarCraft''. On more subjective tasks, the poll gave 6 years for folding laundry as well as an average human worker, 7–10 years for expertly answering 'easily Googleable' questions, 8 years for average speech transcription, 9 years for average telephone banking, and 11 years for expert songwriting, but over 30 years for writing a ''New York Times'' bestseller or winning the Putnam math competition.<ref name=bbc/><ref name=ns/><ref name=grace/>

Subsequent developments in the late 2010s and early 2020s showed rapid progress in several benchmark tasks, particularly in games and structured problem domains. Systems such as AlphaGo, AlphaZero, and later large language models achieved or exceeded human-level performance on a range of established benchmarks.<ref>{{cite arXiv|last1=Silver|first1=David|last2=Hubert|first2=Thomas|last3=Schrittwieser|first3=Julian|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI |year=2017|eprint=1712.01815}}</ref><ref>{{cite journal|last1=Gibney|first1=Elizabeth|title=Google AI algorithm masters ancient game of Go|journal=Nature|date=28 January 2016|volume=529|issue=7587|pages=445–446|doi=10.1038/529445a |pmid=26819021 |bibcode=2016Natur.529..445G }}</ref><ref>{{cite journal|last1=Mallik |first1=Anik |last2=Wang |first2=Haoxin |last3=Xie |first3=Jiang |last4=Chen |first4=Dawei |last5=Han |first5=Kyungtae |title=The AI Index 2023 Annual Report|journal=Stanford Institute for Human-Centered Artificial Intelligence|year=2023|arxiv=2303.01509 }}</ref>

At the same time, researchers have noted that performance on narrow benchmarks can saturate as systems are optimized for specific tasks, and that success on such evaluations does not necessarily generalize to broader forms of intelligence.<ref>{{cite journal|last1=Bowman|first1=Samuel R.|title=Challenges in Measuring Progress in Natural Language Understanding|journal=Communications of the ACM|year=2022|volume=65|issue=1|pages=60–68|doi=10.1145/3491209}}</ref>

===Chess=== thumb|right|Deep Blue at the Computer History Museum An AI defeated a grandmaster in a regulation tournament game for the first time in 1988; rebranded as Deep Blue, it beat the reigning human world chess champion in 1997 (see Deep Blue versus Garry Kasparov).<ref>{{cite news|last1=McClain|first1=Dylan Loeb|title=Bent Larsen, Chess Grandmaster, Dies at 75|url=https://www.nytimes.com/2010/09/11/world/americas/11larsen.html|access-date=31 January 2018|work=The New York Times|date=11 September 2010|archive-date=25 March 2014|archive-url=https://archive.today/20140325101752/http://www.nytimes.com/2010/09/11/world/americas/11larsen.html|url-status=live}}</ref>

By the 2010s, chess engines running on consumer hardware had surpassed top human players by a wide margin. Neural-network-based systems such as AlphaZero demonstrated that superhuman performance could be achieved through reinforcement learning from self-play without reliance on human expert data.<ref>{{cite arXiv|last1=Silver|first1=David|last2=Hubert|first2=Thomas|last3=Schrittwieser|first3=Julian|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI |year=2017|eprint=1712.01815}}</ref> Modern engines are widely used in preparation and analysis, and unaided human play is no longer competitive with top computer systems.

{| class="wikitable" |+ style="text-align: left;" | Estimates when computers would exceed humans at Chess |- ! scope="col" | Year prediction made !! scope="col" | Predicted year !! scope="col" | Number of years !! scope="col" | Predictor !! scope="col" | Contemporaneous source |- ! scope="row" | 1957 | 1967 or sooner || 10 or less || Herbert A. Simon, economist<ref>{{cite news|title=The Business of Artificial Intelligence|url=https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence|access-date=31 January 2018|work=Harvard Business Review|date=18 July 2017|language=en|archive-date=18 January 2018|archive-url=https://web.archive.org/web/20180118021138/https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence|url-status=live}}</ref> || |- ! scope="row" | 1990 | 2000 or sooner || 10 or less || Ray Kurzweil, futurist || ''Age of Intelligent Machines''<ref>{{cite news|title=4 Crazy Predictions About the Future of Art|url=https://www.inc.com/kevin-j-ryan/ray-kurzweil-future-of-storytelling-sxsw-2017.html|access-date=31 January 2018|work=Inc.com|date=2017|language=en|archive-date=12 September 2017|archive-url=https://web.archive.org/web/20170912004046/https://www.inc.com/kevin-j-ryan/ray-kurzweil-future-of-storytelling-sxsw-2017.html|url-status=live}}</ref> |- |}

===Go=== AlphaGo defeated a European Go champion in October 2015, and Lee Sedol in March 2016, one of the world's top players (see AlphaGo versus Lee Sedol). According to Scientific American and other sources, most observers had expected superhuman Computer Go performance to be at least a decade away.<ref>{{cite news|last1=Koch|first1=Christof|title=How the Computer Beat the Go Master|url=https://www.scientificamerican.com/article/how-the-computer-beat-the-go-master/|access-date=31 January 2018|work=Scientific American|date=2016|language=en|archive-date=6 September 2017|archive-url=https://web.archive.org/web/20170906224946/https://www.scientificamerican.com/article/how-the-computer-beat-the-go-master/|url-status=live}}</ref><ref>{{cite news|title='I'm in shock!' How an AI beat the world's best human at Go|url=https://www.newscientist.com/article/2079871-im-in-shock-how-an-ai-beat-the-worlds-best-human-at-go/|access-date=31 January 2018|work=New Scientist|date=2016|archive-date=13 May 2016|archive-url=https://web.archive.org/web/20160513193612/https://www.newscientist.com/article/2079871-im-in-shock-how-an-ai-beat-the-worlds-best-human-at-go/|url-status=live}}</ref><ref>{{cite news|last1=Moyer|first1=Christopher|title=How Google's AlphaGo Beat a Go World Champion|url=https://www.theatlantic.com/technology/archive/2016/03/the-invisible-opponent/475611/|access-date=31 January 2018|work=The Atlantic|date=2016|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131200838/https://www.theatlantic.com/technology/archive/2016/03/the-invisible-opponent/475611/|url-status=live}}</ref>

Subsequent systems such as AlphaGo Zero and AlphaZero demonstrated that superhuman performance could be achieved without human training data, using reinforcement learning from self-play.<ref>{{cite journal|last1=Silver|first1=David|last2=Schrittwieser|first2=Julian|last3=Simonyan|first3=Karen|title=Mastering the game of Go without human knowledge|journal=Nature|volume=550|pages=354–359|year=2017|issue=7676 |doi=10.1038/nature24270 |pmid=29052630 |bibcode=2017Natur.550..354S }}</ref> By the late 2010s, computer Go programs had surpassed human champions by a substantial margin, and Go ceased to be a primary frontier benchmark for AI research.

{| class="wikitable" |+ style="text-align: left;" | Estimates when computers would exceed humans at Go |- ! scope="col" | Year prediction made !! scope="col" | Predicted year !! scope="col" | Number of years !! scope="col" | Predictor !! scope="col" | Affiliation !! scope="col" | Contemporaneous source |- ! scope="row" | 1997 | 2100 or later || 103 or more || Piet Hutt, physicist and Go fan || Institute for Advanced Study || ''New York Times''<ref>{{cite news|last1=Johnson|first1=George|title=To Test a Powerful Computer, Play an Ancient Game|url=https://www.nytimes.com/1997/07/29/science/to-test-a-powerful-computer-play-an-ancient-game.html?pagewanted=all|access-date=31 January 2018|work=The New York Times|date=29 July 1997|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131140930/http://www.nytimes.com/1997/07/29/science/to-test-a-powerful-computer-play-an-ancient-game.html?pagewanted=all|url-status=live}}</ref><ref>{{cite news|last1=Johnson|first1=George|title=To Beat Go Champion, Google's Program Needed a Human Army|url=https://www.nytimes.com/2016/04/05/science/google-alphago-artificial-intelligence.html|access-date=31 January 2018|work=The New York Times|date=4 April 2016|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131200740/https://www.nytimes.com/2016/04/05/science/google-alphago-artificial-intelligence.html|url-status=live}}</ref> |- ! scope="row" | 2007 | 2017 or sooner || 10 or less || Feng-Hsiung Hsu, Deep Blue lead || Microsoft Research Asia || ''IEEE Spectrum''<ref>{{cite news|title=Cracking GO|url=https://spectrum.ieee.org/cracking-go|access-date=31 January 2018|work=IEEE Spectrum: Technology, Engineering, and Science News|date=2007|language=en|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131202330/https://spectrum.ieee.org/computing/software/cracking-go|url-status=live}}</ref><ref name="wired go"/> |- ! scope="row" | 2014 | 2024 || 10 || Rémi Coulom, Computer Go programmer || CrazyStone || ''Wired''<ref name="wired go">{{cite magazine|title=The Mystery of Go, the Ancient Game That Computers Still Can't Win|url=https://www.wired.com/2014/05/the-world-of-computer-go/|access-date=31 January 2018|magazine=WIRED|date=2014|archive-date=31 January 2016|archive-url=https://web.archive.org/web/20160131132350/http://www.wired.com/2014/05/the-world-of-computer-go/|url-status=live}}</ref><ref>{{cite journal|last1=Gibney|first1=Elizabeth|title=Google AI algorithm masters ancient game of Go|journal=Nature|date=28 January 2016|volume=529|issue=7587|pages=445–446|language=en|doi=10.1038/529445a|pmid=26819021 |bibcode=2016Natur.529..445G|s2cid=4460235 |doi-access=free}}</ref> |- |}

===Human-level artificial general intelligence (AGI)=== AI pioneer and economist Herbert A. Simon inaccurately predicted in 1965: "Machines will be capable, within twenty years, of doing any work a man can do". Similarly, in 1970 Marvin Minsky wrote that "Within a generation... the problem of creating artificial intelligence will substantially be solved."<ref name=superintelligence>{{cite book|last1=Bostrom|first1=Nick|title=Superintelligence|date=2013|publisher=Oxford University Press|location=Oxford|isbn=978-0-19-967811-2|language=en|title-link=Superintelligence (book)}}</ref>

Four polls conducted in 2012 and 2013 suggested that the median estimate among experts for when AGI would arrive was 2040 to 2050, depending on the poll.<ref name=newyorker>{{cite magazine|last1=Khatchadourian|first1=Raffi|title=The Doomsday Invention|url=https://www.newyorker.com/magazine/2015/11/23/doomsday-invention-artificial-intelligence-nick-bostrom|access-date=31 January 2018|magazine=The New Yorker|date=16 November 2015|archive-date=29 April 2019|archive-url=https://web.archive.org/web/20190429183807/https://www.newyorker.com/magazine/2015/11/23/doomsday-invention-artificial-intelligence-nick-bostrom|url-status=live}}</ref><ref>{{cite book|last1=Müller|first1=Vincent C.|last2=Bostrom|first2=Nick|chapter=Future progress in artificial intelligence: A survey of expert opinion|title=Fundamental Issues of Artificial Intelligence|series=Synthese Library |publisher=Springer|location=Cham|year=2016|volume=376 |pages=555–572|doi=10.1007/978-3-319-26485-1_33 |isbn=978-3-319-26483-7 }}</ref>

The Grace poll around 2016 found results varied depending on how the question was framed. Respondents asked to estimate "when unaided machines can accomplish every task better and more cheaply than human workers" gave an aggregated median answer of 45 years and a 10% chance of it occurring within 9 years. Other respondents asked to estimate "when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers" estimated a median of 122 years and a 10% probability of 20 years. The median response for when "AI researcher" could be fully automated was around 90 years. No link was found between seniority and optimism, but Asian researchers were much more optimistic than North American researchers on average; Asians predicted 30 years on average for "accomplish every task", compared with the 74 years predicted by North Americans.<ref name=bbc>{{cite news|last1=Gray|first1=Richard|title=How long will it take for your job to be automated?|url=http://www.bbc.com/capital/story/20170619-how-long-will-it-take-for-your-job-to-be-automated|access-date=31 January 2018|work=BBC|date=2018|language=en|archive-date=11 January 2018|archive-url=https://web.archive.org/web/20180111134529/http://www.bbc.com/capital/story/20170619-how-long-will-it-take-for-your-job-to-be-automated|url-status=live}}</ref><ref name=ns>{{cite news|title=AI will be able to beat us at everything by 2060, say experts|url=https://www.newscientist.com/article/2133188-ai-will-be-able-to-beat-us-at-everything-by-2060-say-experts/|access-date=31 January 2018|work=New Scientist|date=2018|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131202306/https://www.newscientist.com/article/2133188-ai-will-be-able-to-beat-us-at-everything-by-2060-say-experts/|url-status=live}}</ref><ref name=grace>{{cite arXiv|last1=Grace|first1=Katja|last2=Salvatier|first2=John|last3=Dafoe|first3=Allan|last4=Zhang|first4=Baobao|last5=Evans|first5=Owain|title=When Will AI Exceed Human Performance? Evidence from AI Experts|eprint=1705.08807|class=cs.AI|year=2017}}</ref>

A larger survey of 2,778 researchers who had published in top AI venues, fielded in 2023 and published in 2025, found shorter timelines for what it called "high-level machine intelligence". In that survey, the aggregate forecast assigned a 10% chance to unaided machines outperforming humans at every task by 2027 and a 50% chance by 2047. The same survey estimated that the full automation of all human occupations would reach a 10% probability by 2037 and a 50% probability by 2116.<ref>{{cite journal|last1=Grace|first1=Katja|last2=Thomas|first2=Stephen|last3=Stein-Perlman|first3=Zach|last4=Brauner|first4=Jan|last5=Korzekwa|first5=Richard C.|title=Thousands of AI Authors on the Future of AI|journal=Journal of Artificial Intelligence Research|volume=82|pages=1–58|year=2025|doi=10.1613/jair.1.19087|doi-access=free}}</ref>

Despite increasingly short timelines in some surveys, there was still no consensus in late 2025 and early 2026 that AGI was imminent. In Stanford HAI's predictions for 2026, co-director James Landay said: "there will be no AGI this year".<ref>{{cite web|title=Stanford AI Experts Predict What Will Happen in 2026|url=https://hai.stanford.edu/news/stanford-ai-experts-predict-what-will-happen-in-2026|website=Stanford HAI|date=15 December 2025|access-date=20 March 2026}}</ref>

{| class="wikitable" |+ style="text-align: left;" | Estimates of when AGI will arrive |- ! scope="col" | Year prediction made !! scope="col" | Predicted year !! scope="col" | Number of years !! scope="col" | Predictor !! scope="col" | Contemporaneous source |- ! scope="row" | 1965 | 1985 or sooner || 20 or less || Herbert A. Simon || ''The shape of automation for men and management''<ref name=superintelligence/><ref>{{cite book|last1=Muehlhauser|first1=Luke|last2=Salamon|first2=Anna|chapter=Intelligence explosion: Evidence and import|title=Singularity Hypotheses|series=The Frontiers Collection |publisher=Springer|location=Berlin, Heidelberg|year=2012|pages=15–42|doi=10.1007/978-3-642-32560-1_2 |isbn=978-3-642-32559-5 }}</ref> |- ! scope="row" | 1993 | 2023 or sooner || 30 or less || Vernor Vinge, science fiction writer || "The Coming Technological Singularity"<ref>{{cite news|last1=Tierney|first1=John|title=Vernor Vinge's View of the Future - Is Technology That Outthinks Us a Partner or a Master ?|url=https://www.nytimes.com/2008/08/26/science/26tier.html|access-date=31 January 2018|work=The New York Times|date=25 August 2008|archive-date=24 December 2017|archive-url=https://web.archive.org/web/20171224122936/http://www.nytimes.com/2008/08/26/science/26tier.html|url-status=live}}</ref> |- ! scope="row" | 1995 | 2040 or sooner || 45 or less || Hans Moravec, robotics researcher || ''Wired''<ref>{{cite magazine|title=Superhumanism|url=https://www.wired.com/1995/10/moravec/|access-date=31 January 2018|magazine=Wired|date=1995|archive-date=2 September 2017|archive-url=https://web.archive.org/web/20170902005717/https://www.wired.com/1995/10/moravec/|url-status=live}}</ref> |- ! scope="row" | 2008 | Never / Distant future<ref group=note>''IEEE Spectrum'' attributes to Moore both "Never" and "I don't believe this kind of thing is likely to happen, at least for a long time"</ref> || || Gordon E. Moore, inventor of Moore's Law || ''IEEE Spectrum''<ref>{{cite news|title=Tech Luminaries Address Singularity|url=https://spectrum.ieee.org/tech-luminaries-address-singularity|access-date=31 January 2018|work=IEEE Spectrum|date=2008|language=en|archive-date=30 April 2019|archive-url=https://web.archive.org/web/20190430150019/https://spectrum.ieee.org/computing/hardware/tech-luminaries-address-singularity|url-status=live}}</ref> |- ! scope="row" | 2017 | 2029 || 12 || Ray Kurzweil || Interview<ref>{{cite news|last1=Molloy|first1=Mark|title=Expert predicts date when 'sexier and funnier' humans will merge with AI machines|url=https://www.telegraph.co.uk/technology/2017/03/17/expert-predicts-date-sexier-funnier-humans-will-merge-ai-machines/|access-date=31 January 2018|work=The Telegraph|date=17 March 2017|archive-date=31 January 2018|archive-url=https://web.archive.org/web/20180131225240/http://www.telegraph.co.uk/technology/2017/03/17/expert-predicts-date-sexier-funnier-humans-will-merge-ai-machines/|url-status=live}}</ref> |}

==References== {{Reflist|30em}}

==Notes== {{Reflist|group=note}}

==External links== * [https://aiimpacts.org/miri-ai-predictions-dataset/ MIRI database of predictions about AGI]

Category:Artificial intelligence Category:Progress