Difference between revisions of "AI and Humans"
KevinYager (talk | contribs) (→See Also) |
KevinYager (talk | contribs) (→Human Sentiment towards AI) |
||
(48 intermediate revisions by the same user not shown) | |||
Line 26: | Line 26: | ||
* [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds] | * [https://www.deeplearning.ai/the-batch/gpt-4-boosts-remote-tutors-performance-in-real-time-study-finds/ LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds] | ||
** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise] | ** [https://arxiv.org/abs/2410.03017 Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise] | ||
+ | * 2025-06: Gallup & The Walton Foundation: [https://www.gallup.com/file/analytics/691922/Walton-Family-Foundation-Gallup-Teachers-AI-Report.pdf Teaching for Tomorrow Unlocking Six Weeks a Year With AI] | ||
==AI harms learning== | ==AI harms learning== | ||
Line 98: | Line 99: | ||
** LLMs can be creative | ** LLMs can be creative | ||
* 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments] | * 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments] | ||
+ | * 2025-06: [https://arxiv.org/abs/2506.00794 Predicting Empirical AI Research Outcomes with Language Models] | ||
===Art=== | ===Art=== | ||
Line 136: | Line 138: | ||
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence] | * 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence] | ||
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models] | * 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models] | ||
+ | * 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.13.25329541v1 Automation of Systematic Reviews with Large Language Models] | ||
+ | * 2025-06: [https://microsoft.ai/new/the-path-to-medical-superintelligence/ The Path to Medical Superintelligence] | ||
+ | * 2025-08: [https://www.nature.com/articles/s41591-025-03888-0?utm_source=chatgpt.com A personal health large language model for sleep and fitness coaching] | ||
+ | * 2025-08: [https://arxiv.org/abs/2508.08224 Capabilities of GPT-5 on Multimodal Medical Reasoning] | ||
====Bio==== | ====Bio==== | ||
Line 148: | Line 154: | ||
====Financial==== | ====Financial==== | ||
* 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models] | * 2024-07: [https://arxiv.org/abs/2407.17866 Financial Statement Analysis with Large Language Models] | ||
+ | |||
+ | ====HR==== | ||
+ | * 2025-08: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5395709 Voice AI in Firms: A Natural Field Experiment on Automated Job Interviews] | ||
==AI improves human work== | ==AI improves human work== | ||
Line 158: | Line 167: | ||
* 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise] | * 2025-03: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5188231 The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise] | ||
** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience | ** 2025-03: Ethan Mollick: [https://www.oneusefulthing.org/p/the-cybernetic-teammateThe Cybernetic Teammate]: Having an AI on your team can increase performance, provide expertise, and improve your experience | ||
+ | * 2025-09: [https://osf.io/preprints/psyarxiv/vbkmt_v1 Quantifying Human-AI Synergy] | ||
+ | * 2025-10: [https://arxiv.org/abs/2510.12049 Generative AI and Firm Productivity: Field Experiments in Online Retail] | ||
===Coding=== | ===Coding=== | ||
Line 163: | Line 174: | ||
* 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ] | * 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ] | ||
* 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084] | * 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084] | ||
+ | * 2025-09: [https://arxiv.org/abs/2509.19708 Intuition to Evidence: Measuring AI's True Impact on Developer Productivity] | ||
===Forecasting=== | ===Forecasting=== | ||
Line 177: | Line 189: | ||
* 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study] | * 2025-03: [https://journals.lww.com/international-journal-of-surgery/fulltext/2025/03000/chatgpt_s_role_in_alleviating_anxiety_in_total.20.aspx ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study] | ||
* 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench] | * 2025-05: [https://openai.com/index/healthbench/ Introducing HealthBench] | ||
+ | * 2025-06: [https://www.medrxiv.org/content/10.1101/2025.06.07.25329176v1 From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis] | ||
+ | * 2025-06: [https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-025-07414-1 Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study] | ||
+ | * 2025-07: [https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf AI-based Clinical Decision Support for Primary Care: A Real-World Study] ([https://openai.com/index/ai-clinical-copilot-penda-health/ blog]) | ||
+ | * 2025-07: [https://arxiv.org/abs/2507.15743 Towards physician-centered oversight of conversational diagnostic AI] | ||
===Translation=== | ===Translation=== | ||
Line 186: | Line 202: | ||
===Creativity=== | ===Creativity=== | ||
* See also: [[AI creativity]] | * See also: [[AI creativity]] | ||
+ | * 2024-02: [https://arxiv.org/abs/2402.01727 Prompting Diverse Ideas: Increasing AI Idea Variance] | ||
* 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content] | * 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content] | ||
* 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity] | * 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity] | ||
Line 196: | Line 213: | ||
* 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs | * 2024-12: [https://doi.org/10.1080/10400419.2024.2440691 Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality?] ([https://osf.io/preprints/psyarxiv/ygzw6 preprint]): shows that more creative humans produce more creative genAI outputs | ||
* 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor] | * 2025-01: [https://arxiv.org/abs/2501.11433 One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor] | ||
+ | * 2025-05: [https://arxiv.org/abs/2505.17241 Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis] | ||
===Equity=== | ===Equity=== | ||
* 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity] | * 2025-01: [https://ai.nejm.org/doi/full/10.1056/AIp2400889 Using Large Language Models to Promote Health Equity] | ||
− | |||
− | |||
− | |||
− | |||
==AI worse than humans== | ==AI worse than humans== | ||
Line 208: | Line 222: | ||
* 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions] | * 2025-04: [https://arxiv.org/abs/2504.18919 Clinical knowledge in LLMs does not translate to human interactions] | ||
* 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research] | * 2025-05: [https://royalsocietypublishing.org/doi/10.1098/rsos.241776 Generalization bias in large language model summarization of scientific research] | ||
+ | |||
+ | ==AI lowers human productivity== | ||
+ | * 2025-07: METR: [https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity] ([https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ blog], [https://secondthoughts.ai/p/ai-coding-slowdown commentary/analysis]) | ||
==Human Perceptions of AI== | ==Human Perceptions of AI== | ||
Line 226: | Line 243: | ||
* 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made. | * 2024-11: [https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-art-turing How Did You Do On The AI Art Turing Test?] Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made. | ||
* 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably] | * 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
=Uptake= | =Uptake= | ||
Line 254: | Line 266: | ||
** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05 | ** US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05 | ||
* 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence] | * 2025-05: [https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf Trends – Artificial Intelligence] | ||
+ | * 2025-06: [https://arxiv.org/abs/2506.08945 Who is using AI to code? Global diffusion and impact of generative AI] | ||
+ | * 2025-06: [https://www.iconiqcapital.com/growth/reports/2025-state-of-ai 2025 State of AI Report: The Builder’s Playbook] A Practical Roadmap for AI Innovation | ||
+ | * 2025-07: METR: [https://epochai.substack.com/p/after-the-chatgpt-moment-measuring After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?] | ||
+ | * 2025-07: Pew Research: [https://www.pewresearch.org/short-reads/2025/06/25/34-of-us-adults-have-used-chatgpt-about-double-the-share-in-2023/ 34% of U.S. adults have used ChatGPT, about double the share in 2023] | ||
==Usage For== | ==Usage For== | ||
Line 259: | Line 275: | ||
* 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup]) | * 2025-03: [https://learn.filtered.com/hubfs/The%202025%20Top-100%20Gen%20AI%20Use%20Case%20Report.pdf How People are Really Using Generative AI Now] ([https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025 writeup]) | ||
* 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude] | * 2025-04: [https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude Anthropic Education Report: How University Students Use Claude] | ||
+ | * 2025-09: [https://www.anthropic.com/research/economic-index-geography Anthropic Economic Index: Tracking AI's role in the US and global economy] | ||
+ | * 2025-09: [https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf How People Use ChatGPT] (OpenAI) | ||
==Hiding Usage== | ==Hiding Usage== | ||
* 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias] | * 2025-05: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5232910 Underreporting of AI use: The role of social desirability bias] | ||
− | =Sentiment= | + | =Societal Effects/Transformations= |
+ | * 2025-09: [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5425555 Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data] | ||
+ | |||
+ | =Psychological Impact= | ||
+ | * 2025-08: [https://arxiv.org/abs/2508.16628 The Impact of Artificial Intelligence on Human Thought] | ||
+ | |||
+ | ==Human Sentiment towards AI== | ||
* 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence] | * 2025-04: Pew Research: [https://www.pewresearch.org/internet/2025/04/03/how-the-us-public-and-ai-experts-view-artificial-intelligence/ How the U.S. Public and AI Experts View Artificial Intelligence] | ||
+ | * 2025-10: Pew Research: [https://www.pewresearch.org/global/2025/10/15/how-people-around-the-world-view-ai/ How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China] | ||
− | =Persuasion= | + | ==AI Persuasion of Humans== |
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.) | (AI can update beliefs, change opinions, tackle conspiracy theories, etc.) | ||
* 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences] | * 2022-11: [https://arxiv.org/abs/2211.15006 Fine-tuning language models to find agreement among humans with diverse preferences] | ||
Line 276: | Line 301: | ||
** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users] | ** [https://www.404media.co/researchers-secretly-ran-a-massive-unauthorized-ai-persuasion-experiment-on-reddit-users/ Researchers Secretly Ran a Massive, Unauthorized AI Persuasion Experiment on Reddit Users] | ||
* 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders] | * 2025-05: [https://arxiv.org/abs/2505.09662 Large Language Models Are More Persuasive Than Incentivized Human Persuaders] | ||
+ | * 2025-07: [https://arxiv.org/abs/2507.13919 The Levers of Political Persuasion with Conversational AI] | ||
+ | |||
+ | ==AI Effects on Human Psychology== | ||
+ | ===Human well-being=== | ||
+ | * 2024-01: [https://www.nature.com/articles/s44184-023-00047-6 Loneliness and suicide mitigation for students using GPT3-enabled chatbots] | ||
+ | * 2025-03: [https://cdn.openai.com/papers/15987609-5f71-433c-9972-e91131f399a1/openai-affective-use-study.pdf Investigating Affective Use and Emotional Well-being on ChatGPT] | ||
+ | * 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study] | ||
+ | |||
+ | ===Counter loneliness=== | ||
+ | * 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness] | ||
+ | * 2025-03: [https://dam-prod2.media.mit.edu/x/2025/03/21/Randomized_Control_Study_on_Chatbot_Psychosocial_Effect.pdf How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study] | ||
+ | * 2025-06: Anthropic: [https://www.anthropic.com/news/how-people-use-claude-for-support-advice-and-companionship How People Use Claude for Support, Advice, and Companionship] | ||
+ | |||
+ | ===Human mental abilities (creativity, learning)=== | ||
+ | * 2025-03: [https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/ The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers] | ||
+ | * 2025-06: [https://arxiv.org/abs/2506.08872 Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task] | ||
=Simulate Humans= | =Simulate Humans= | ||
* See also: [[Human brain]] | * See also: [[Human brain]] | ||
+ | |||
+ | ==Sociology== | ||
* 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods] | * 2021-10: [https://www.doi.org/10.1007/s10588-021-09351-y Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods] | ||
* 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia] | * 2023-12: Google: [https://arxiv.org/abs/2312.03664 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia] | ||
Line 289: | Line 332: | ||
* 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents] | * 2025-04: [https://www.nber.org/papers/w33662 Measuring Human Leadership Skills with AI Agents] | ||
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users] | * 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users] | ||
+ | * 2025-07: [https://www.nature.com/articles/s41586-025-09215-4 A foundation model to predict and capture human cognition] ([https://marcelbinz.github.io/centaur code]) | ||
+ | * 2025-07: [https://arxiv.org/abs/2507.15815 LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra] | ||
+ | * 2025-09: [https://benjaminmanning.io/files/optimize.pdf General Social Agents] | ||
+ | |||
+ | ==Theory of Mind== | ||
+ | * 2025-08: [https://www.nature.com/articles/s44387-025-00031-9 How large language models encode theory-of-mind: a study on sparse parameter patterns] | ||
+ | * 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents] | ||
+ | |||
+ | ==Humanlike Vibes== | ||
+ | * 2025-07: [https://arxiv.org/abs/2507.20525 The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?] | ||
+ | * 2025-10: [https://arxiv.org/abs/2510.08338 LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings] | ||
+ | |||
+ | ==Skeptical== | ||
+ | * 2025-08: [https://arxiv.org/abs/2508.06950 Large Language Models Do Not Simulate Human Psychology] | ||
=See Also= | =See Also= |
Latest revision as of 09:17, 17 October 2025
Contents
AI in Education
Survey/study of
- 2023-08: Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
- 2023-10: Employees secretly using AI at work.
- 2023-10: Survey shows students using AI more than professors.
- 2023-11: ChatGPT has entered the classroom: how LLMs could transform education
- 2025-04: Anthropic Education Report: How University Students Use Claude
- 2025-05: The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis
AI improves learning/education
- Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, AI Agents and Education: Simulated Practice at Scale (June 17, 2024). The Wharton School Research Paper. doi: 10.2139/ssrn.4871171
- Can enable personalized education.
- Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
- GPT4 can out-perform human tutors.
- Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers (August 13, 2024). doi: 10.2139/ssrn.4924786
- Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
- There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
- Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana
- AI Tutoring Outperforms Active Learning
- From chalkboards to chatbots: Transforming learning in Nigeria, one prompt at a time (writeup)
- 6 weeks of after-school AI tutoring = 2 years of typical learning gains
- outperforms 80% of other educational interventions
- AI Meets the Classroom: When Do Large Language Models Harm Learning?
- Outcomes depend on usage
- LLM Support for Tutors GPT-4 boosts remote tutors’ performance in real time, study finds
- 2025-06: Gallup & The Walton Foundation: Teaching for Tomorrow Unlocking Six Weeks a Year With AI
AI harms learning
- A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study
- Current grading systems cannot detect AI.
- Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, Generative AI Can Harm Learning (July 15, 2024). The Wharton School Research Paper.doi: 10.2139/ssrn.4895486
- Access to ChatGPT harmed math education outcomes.
- 2024-09: AI Meets the Classroom: When Does ChatGPT Harm Learning?
Software/systems
- GPTutor (code)
- EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education
- Eureka Labs (founded by Andrej Karpathy) aims to create AI-driven courses (first course is Intro to LLMs)
LLMs
Individual tools
- Chatbot (OpenAI ChatGPT, Anthropic Claude, Google Gemini)
- NotebookLM: Enables one to "chat with documents".
- Google Learn About
Systems
AI for grading
- Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education (preprint)
Detection
- 2024-06: Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
- GenAI can simulate student writing in a way that teachers cannot detect.
- AI essays are assessed more positively than student-written.
- Teachers are overconfident in their source identification.
- Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
- 2025-01: People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
AI Text Detectors Don't Work
- 2024-05: RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
- 2024-06: Testing of Detection Tools for AI-Generated Text
AI/human
Capabilities
Writing
- 2022-12: Re3: Generating Longer Stories With Recursive Reprompting and Revision
- 2023-03: English essays: Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay
- 2023-01: Journalism: Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education
- 2023-07: Science writing: Artificial intelligence in scientific writing: a friend or a foe?
- 2024-02: Wikipedia style: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
- 2024-02: LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs (code)
- 2024-08: Scientific papers: [The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery]
- 2024-09: PaperQA2: Language Models Achieve Superhuman Synthesis of Scientific Knowledge (𝕏 post, code)
- 2025-03: WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
- 2025-03: Learning to Reason for Long-Form Story Generation
AI out-performs humans
Tests
- 2023-07: SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
- 2024-06: A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study
- AI scores higher than median students.
Creativity
- See also: AI creativity
- 2023-07: Ideas Are Dimes A Dozen: Large Language Models For Idea Generation In Innovation
- 2023-09: Best humans still outperform artificial intelligence in a creative divergent thinking task
- Best humans out-perform AI at creativity. (By implication, median humans may not.)
- 2024-02: The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks
- 2024-02: Felin, Teppo and Holweg, Matthias, Theory Is All You Need: AI, Human Cognition, and Causal Reasoning (February 24, 2024). doi: 10.2139/ssrn.4737265
- Argues that human "theory-based" creativity is better than AI "data-based".
- 2024-07: Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- Top human (professional author) out-performs GPT4.
- 2024-09: Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
- LLMs can be creative
- 2024-09: Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments
- 2025-06: Predicting Empirical AI Research Outcomes with Language Models
Art
- 2024-11: AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably
- 2024-11: How Did You Do On The AI Art Turing Test?
Business & Marketing
- 2023-11: The power of generative marketing: Can generative AI create superhuman visual marketing content?
- 2024-02: Generative Artificial Intelligence and Evaluating Strategic Decisions
Professions
- Humanity's Last Exam
- Effort to build a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.
Coding
Medical
- 2024-03: Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study
- GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
- 2024-10: Perspectives on Artificial Intelligence–Generated Responses to Patient Messages
- 2024-10: Large Language Model Influence on Diagnostic Reasoning; A Randomized Clinical Trial
- Use of ChatGPT does not strongly improve medical expert work; but AI alone out-scores human or human+AI
- 2024-11: Large language models surpass human experts in predicting neuroscience results (writeup: AI can predict neuroscience study results better than human experts, study finds)
- 2024-12: Superhuman performance of a large language model on the reasoning tasks of a physician
- 2024-12: HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
- 2025-02: Media:
- 2025-02: GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial
- 2025-02: Artificial intelligence for individualized treatment of persistent atrial fibrillation: a randomized controlled trial
- Google AI Clinician:
- 2024-01: Towards Conversational Diagnostic AI (blog: Articulate Medical Intelligence Explorer, AMIE)
- 2025-03: Towards Conversational AI for Disease Management (blog)
- 2025-02: Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
- 2025-03: Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
- 2025-04: Comparison of Initial Artificial Intelligence (AI) and Final Physician Recommendations in AI-Assisted Virtual Urgent Care Visits
- 2025-04: Towards conversational diagnostic artificial intelligence
- 2025-04: Towards accurate differential diagnosis with large language models
- 2025-06: Automation of Systematic Reviews with Large Language Models
- 2025-06: The Path to Medical Superintelligence
- 2025-08: A personal health large language model for sleep and fitness coaching
- 2025-08: Capabilities of GPT-5 on Multimodal Medical Reasoning
Bio
Therapy
- 2025-02: When ELIZA meets therapists: A Turing test for the heart and mind
- 2025-03: Therabot: Randomized Trial of a Generative AI Chatbot for Mental Health Treatment
Financial
HR
AI improves human work
- 2023-07: Experimental evidence on the productivity effects of generative artificial intelligence
- 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper doi: 10.2139/ssrn.4573321
- 2023-11: Generative AI at Work (National Bureau of Economic Research)
- 2023-12: The Uneven Impact of Generative AI on Entrepreneurial Performance (doi: 10.31219/osf.io/hdjpk)
- 2023-12: Artificial Intelligence in the Knowledge Economy: Non-autonomous AI (chatbot) benefits least knowledgeable workers; autonomous agents benefit the most knowledgeable workers
- 2024-07: Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research
- 2025-03: The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise
- 2025-03: Ethan Mollick: Cybernetic Teammate: Having an AI on your team can increase performance, provide expertise, and improve your experience
- 2025-09: Quantifying Human-AI Synergy
- 2025-10: Generative AI and Firm Productivity: Field Experiments in Online Retail
Coding
- 2023-02: The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
- 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers (September 03, 2024). doi: 10.2139/ssrn.4945566
- 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, Generative AI and the Nature of Work (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, doi: 10.2139/ssrn.5007084
- 2025-09: Intuition to Evidence: Measuring AI's True Impact on Developer Productivity
Forecasting
Finance
- 2024-12: AI, Investment Decisions, and Inequality: Novices see improvements in investment performance, sophisticated investors see even greater improvements.
Law
Medical
- 2025-03: Medical Hallucination in Foundation Models and Their Impact on Healthcare
- 2025-03: ChatGPT’s role in alleviating anxiety in total knee arthroplasty consent process: a randomized controlled trial pilot study
- 2025-05: Introducing HealthBench
- 2025-06: From Tool to Teammate: A Randomized Controlled Trial of Clinician-AI Collaborative Workflows for Diagnosis
- 2025-06: Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study
- 2025-07: AI-based Clinical Decision Support for Primary Care: A Real-World Study (blog)
- 2025-07: Towards physician-centered oversight of conversational diagnostic AI
Translation
Customer service
- 2023-11: Generative AI at Work: Improvements for workers and clients (though also a ceiling to improvement)
Creativity
- See also: AI creativity
- 2024-02: Prompting Diverse Ideas: Increasing AI Idea Variance
- 2024-07: Generative AI enhances individual creativity but reduces the collective diversity of novel content
- 2024-08: An empirical investigation of the impact of ChatGPT on creativity
- 2024-08: Response: ChatGPT decreases idea diversity in brainstorming (pdf)
- 2025-05: Response: Reply to: ChatGPT decreases idea diversity in brainstorming
- 2024-08: The Crowdless Future? Generative AI and Creative Problem-Solving
- 2024-10: Human Creativity in the Age of LLMs
- 2024-11:
Artificial Intelligence, Scientific Discovery, and Product Innovation: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.- 2025-05: Retraction: Assuring an accurate research record
- 2024-12: Using AI to Generate Visual Art: Do Individual Differences in Creativity Predict AI-Assisted Art Quality? (preprint): shows that more creative humans produce more creative genAI outputs
- 2025-01: One Does Not Simply Meme Alone: Evaluating Co-Creativity Between LLMs and Humans in the Generation of Humor
- 2025-05: Generative AI and Creativity: A Systematic Literature Review and Meta-Analysis
Equity
AI worse than humans
- 2025-04: How Good is AI at Twisting Arms? Experiments in Debt Collection
- 2025-04: Clinical knowledge in LLMs does not translate to human interactions
- 2025-05: Generalization bias in large language model summarization of scientific research
AI lowers human productivity
- 2025-07: METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (blog, commentary/analysis)
Human Perceptions of AI
- 2023-09: AI and science: what 1,600 researchers think. A Nature survey finds that scientists are concerned, as well as excited, by the increasing use of artificial-intelligence tools in research.
- 2024-11: Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey (Nature commentary: Quest for AI literacy)
- 2025-03: Users Favor LLM-Generated Content -- Until They Know It's AI
AI passes Turing Test
Text Dialog
- 2023-05: Human or Not? A Gamified Approach to the Turing Test
- 2023-10: Does GPT-4 pass the Turing test?
- 2024-05: People cannot distinguish GPT-4 from a human in a Turing test
- 2024-07: GPT-4 is judged more human than humans in displaced and inverted Turing tests
- 2025-03: Large Language Models Pass the Turing Test
- 2025-04: A Minimal Turing Test
Art
- 2024-11: How Did You Do On The AI Art Turing Test? Differentiation was only slightly above random (60%). AI art was often ranked higher than human-made.
- 2024-11: AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably
Uptake
- 2023-07: ChatGPT: Early Adopters, Teething Issues and the Way Forward
- 2024-03: Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
- 2024-05: Humlum, Anders and Vestergaard, Emilie, The Adoption of ChatGPT. IZA Discussion Paper No. 16992 doi: 10.2139/ssrn.4827166
- 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper doi: 10.2139/ssrn.4857373
- 2024-06: Delving into ChatGPT usage in academic writing through excess vocabulary
- 2024-09: The Rapid Adoption of Generative AI
- 2024-10: Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report (executive summary, full report)
- 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
- Spending on genAI is up 130% (most companies plan to invest going forward)
- 2024-12: The unequal adoption of ChatGPT exacerbates existing inequalities among workers
- Higher adoption among young and less experienced
- Lower adoption among women and lower-earning workers
- 2025-02: The Widespread Adoption of Large Language Model-Assisted Writing Across Society: 10-25% adoption across a range of contexts
- 2025-02: Local Heterogeneity in Artificial Intelligence Jobs Over Time and Space
- 2025-04: Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming
- 2025-05: ChatGPT Is Still Leading the AI Wars but Google Gemini Is Gaining Ground
- 2025-05: Large Language Models, Small Labor Market Effects
- Significant uptake, but very little economic impact so far
- 2025-05: The Labor Market Effects of Generative Artificial Intelligence
- US worker usage of AI increasingly rapidly: 30% in 2024-12; 40% in 2025-05
- 2025-05: Trends – Artificial Intelligence
- 2025-06: Who is using AI to code? Global diffusion and impact of generative AI
- 2025-06: 2025 State of AI Report: The Builder’s Playbook A Practical Roadmap for AI Innovation
- 2025-07: METR: After the ChatGPT Moment: Measuring AI’s Adoption How quickly has AI been diffusing through the economy?
- 2025-07: Pew Research: 34% of U.S. adults have used ChatGPT, about double the share in 2023
Usage For
- 2024-12: Clio: A system for privacy-preserving insights into real-world AI use (Anthropic Clio)
- 2025-03: How People are Really Using Generative AI Now (writeup)
- 2025-04: Anthropic Education Report: How University Students Use Claude
- 2025-09: Anthropic Economic Index: Tracking AI's role in the US and global economy
- 2025-09: How People Use ChatGPT (OpenAI)
Hiding Usage
Societal Effects/Transformations
Psychological Impact
Human Sentiment towards AI
- 2025-04: Pew Research: How the U.S. Public and AI Experts View Artificial Intelligence
- 2025-10: Pew Research: How People Around the World View AI: More are concerned than excited about its use, and more trust their own country and the EU to regulate it than trust the U.S. or China
AI Persuasion of Humans
(AI can update beliefs, change opinions, tackle conspiracy theories, etc.)
- 2022-11: Fine-tuning language models to find agreement among humans with diverse preferences
- 2024-08: Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews
- 2024-04: Just the facts: How dialogues with AI reduce conspiracy beliefs
- 2024-09: Durably reducing conspiracy beliefs through dialogues with AI
- 2025-03: Scaling language model size yields diminishing returns for single-message political persuasion
- 2025-04: Can AI Change Your View? Evidence from a Large-Scale Online Field Experiment
- 2025-05: Large Language Models Are More Persuasive Than Incentivized Human Persuaders
- 2025-07: The Levers of Political Persuasion with Conversational AI
AI Effects on Human Psychology
Human well-being
- 2024-01: Loneliness and suicide mitigation for students using GPT3-enabled chatbots
- 2025-03: Investigating Affective Use and Emotional Well-being on ChatGPT
- 2025-03: How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study
Counter loneliness
- 2024-07: AI Companions Reduce Loneliness
- 2025-03: How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study
- 2025-06: Anthropic: How People Use Claude for Support, Advice, and Companionship
Human mental abilities (creativity, learning)
- 2025-03: The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
- 2025-06: Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Simulate Humans
- See also: Human brain
Sociology
- 2021-10: Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods
- 2023-12: Google: Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
- 2024-02: Are Large Language Models (LLMs) Good Social Predictors?
- 2024-04: Automated Social Science: Language Models as Scientist and Subjects
- 2024-07: Perils and opportunities in using large language models in psychological research
- 2024-08: Predicting Results of Social Science Experiments Using Large Language Models
- 2024-10: Large Language Models based on historical text could offer informative tools for behavioral science
- 2025-04: LLM Social Simulations Are a Promising Research Method
- 2025-04: Measuring Human Leadership Skills with AI Agents
- 2025-04: SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
- 2025-07: A foundation model to predict and capture human cognition (code)
- 2025-07: LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra
- 2025-09: General Social Agents
Theory of Mind
- 2025-08: How large language models encode theory-of-mind: a study on sparse parameter patterns
- 2025-10: Infusing Theory of Mind into Socially Intelligent LLM Agents
Humanlike Vibes
- 2025-07: The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text?
- 2025-10: LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings