Difference between revisions of "AI in education"
KevinYager (talk | contribs) (→Tests) |
KevinYager (talk | contribs) (→AI out-performs humans) |
||
(40 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
* [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors] | * [https://arxiv.org/abs/2306.17156 Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors] | ||
** GPT4 can out-perform human tutors. | ** GPT4 can out-perform human tutors. | ||
+ | * Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4924786 Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers] (August 13, 2024). [http://dx.doi.org/10.2139/ssrn.4924786 doi: 10.2139/ssrn.4924786] | ||
+ | ** Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas). | ||
+ | ** There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.). | ||
+ | * [https://arxiv.org/abs/2402.09809 Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana] | ||
+ | * [https://doi.org/10.21203/rs.3.rs-4243877/v1 AI Tutoring Outperforms Active Learning] | ||
==AI harms learning== | ==AI harms learning== | ||
− | * [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study] ** Current grading systems cannot detect AI. | + | * [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305354 A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study] |
+ | ** Current grading systems cannot detect AI. | ||
* Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486] | * Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 Generative AI Can Harm Learning] (July 15, 2024). The Wharton School Research Paper.[http://dx.doi.org/10.2139/ssrn.4895486 doi: 10.2139/ssrn.4895486] | ||
** Access to ChatGPT harmed math education outcomes. | ** Access to ChatGPT harmed math education outcomes. | ||
+ | * 2024-09: [https://arxiv.org/abs/2409.09047 AI Meets the Classroom: When Does ChatGPT Harm Learning?] | ||
==Software/systems== | ==Software/systems== | ||
* [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code]) | * [https://devpost.com/software/gptutor GPTutor] ([https://github.com/mynamegabe/GPTutor code]) | ||
+ | * [https://arxiv.org/abs/2308.02773 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education] | ||
+ | * [https://eurekalabs.ai/ Eureka Labs] (founded by [https://en.wikipedia.org/wiki/Andrej_Karpathy Andrej Karpathy]) aims to create AI-driven courses (first course is [https://github.com/karpathy/LLM101n Intro to LLMs]) | ||
+ | ===Individual tools=== | ||
+ | * Chatbot (OpenAI [https://chatgpt.com/ ChatGPT], Anthropic [https://www.anthropic.com/claude Claude], Google [https://gemini.google.com/app Gemini]) | ||
+ | * [https://notebooklm.google.com/ NotebookLM]: Enables one to "chat with documents". | ||
+ | * Google [https://learning.google.com/experiments/learn-about/signup Learn About] | ||
+ | |||
+ | ==AI for grading== | ||
+ | * [https://dl.acm.org/doi/10.1145/3657604.3664693 Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education] ([https://arxiv.org/abs/2405.02985 preprint]) | ||
+ | |||
+ | ==Detection== | ||
+ | * [https://www.sciencedirect.com/science/article/pii/S2666920X24000109 Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays] | ||
+ | ** GenAI can simulate student writing in a way that teachers cannot detect. | ||
+ | ** AI essays are assessed more positively than student-written. | ||
+ | ** Teachers are overconfident in their source identification. | ||
+ | ** Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students | ||
+ | ===AI Text Detectors Don't Work=== | ||
+ | * 2024-05: [https://arxiv.org/abs/2405.07940 RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors] | ||
+ | * 2024-06: [https://arxiv.org/abs/2306.15666 Testing of Detection Tools for AI-Generated Text] | ||
=AI/human= | =AI/human= | ||
Line 38: | Line 64: | ||
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers] | * 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers] | ||
** LLMs can be creative | ** LLMs can be creative | ||
+ | * 2024-09: [https://docs.iza.org/dp17302.pdf Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments] | ||
+ | |||
+ | ===Various=== | ||
+ | * 2024-11: [https://doi.org/10.1038/s41598-024-76900-1 AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably] | ||
===Professions=== | ===Professions=== | ||
* 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study] | * 2024-03: [https://www.medrxiv.org/content/10.1101/2024.03.12.24303785v1 Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study] | ||
** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks). | ** GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks). | ||
+ | * 2024-10: [https://doi.org/10.1001/jamanetworkopen.2024.38535 Perspectives on Artificial Intelligence–Generated Responses to Patient Messages] | ||
+ | * [https://agi.safe.ai/submit Humanity's Last Exam] | ||
+ | ** [https://x.com/alexandr_wang/status/1835738937719140440 Effort to build] a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics. | ||
==AI improves human work== | ==AI improves human work== | ||
− | * | + | * 2023-07: [https://www.science.org/doi/10.1126/science.adh2586 Experimental evidence on the productivity effects of generative artificial intelligence] |
+ | * 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality] (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4573321 doi: 10.2139/ssrn.4573321] | ||
+ | * 2023-11: [https://www.nber.org/papers/w31161 Generative AI at Work] (National Bureau of Economic Research) | ||
+ | * 2023-12: [https://osf.io/hdjpk The Uneven Impact of Generative AI on Entrepreneurial Performance] ([https://doi.org/10.31219/osf.io/hdjpk doi: 10.31219/osf.io/hdjpk]) | ||
+ | * 2024-07: [https://www.microsoft.com/en-us/research/publication/generative-ai-in-real-world-workplaces/ Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research] | ||
+ | |||
+ | ===Coding=== | ||
+ | * 2023-02: [https://arxiv.org/abs/2302.06590 The Impact of AI on Developer Productivity: Evidence from GitHub Copilot] | ||
+ | * 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers] (September 03, 2024). [http://dx.doi.org/10.2139/ssrn.4945566 doi: 10.2139/ssrn.4945566 ] | ||
+ | * 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5007084 Generative AI and the Nature of Work] (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, [http://dx.doi.org/10.2139/ssrn.5007084 doi: 10.2139/ssrn.5007084] | ||
+ | |||
+ | ===Forecasting=== | ||
+ | * 2024-02: [https://arxiv.org/abs/2402.07862 AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy] | ||
+ | ===Creativity=== | ||
+ | * 2024-07: [https://www.science.org/doi/10.1126/sciadv.adn5290 Generative AI enhances individual creativity but reduces the collective diversity of novel content] | ||
+ | * 2024-08: [https://www.nature.com/articles/s41562-024-01953-1 An empirical investigation of the impact of ChatGPT on creativity] | ||
+ | * 2024-11: [https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks. | ||
+ | |||
+ | ===Counter loneliness=== | ||
+ | * 2024-07: [https://arxiv.org/abs/2407.19096 AI Companions Reduce Loneliness] | ||
+ | |||
+ | ==AI passes Turing Test== | ||
+ | * 2023-05: [https://arxiv.org/abs/2305.20010 Human or Not? A Gamified Approach to the Turing Test] | ||
+ | * 2023-10: [https://arxiv.org/abs/2310.20216 Does GPT-4 pass the Turing test?] | ||
+ | * 2024-05: [https://arxiv.org/abs/2405.08007 People cannot distinguish GPT-4 from a human in a Turing test] | ||
+ | * 2024-07: [https://arxiv.org/abs/2407.08853 GPT-4 is judged more human than humans in displaced and inverted Turing tests] | ||
=Uptake= | =Uptake= | ||
− | + | * 2024-03: [https://arxiv.org/abs/2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews] | |
+ | * 2024-05: Humlum, Anders and Vestergaard, Emilie, [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4827166 The Adoption of ChatGPT]. IZA Discussion Paper No. 16992 [http://dx.doi.org/10.2139/ssrn.4827166 doi: 10.2139/ssrn.4827166] | ||
+ | * 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4857373 Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics] (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper [http://dx.doi.org/10.2139/ssrn.4857373 doi: 10.2139/ssrn.4857373 ] | ||
+ | * 2024-06: [https://arxiv.org/abs/2406.07016 Delving into ChatGPT usage in academic writing through excess vocabulary] | ||
+ | * 2024-09: [https://static1.squarespace.com/static/60832ecef615231cedd30911/t/66f0c3fbabdc0a173e1e697e/1727054844024/BBD_GenAI_NBER_Sept2024.pdf The Rapid Adoption of Generative AI] | ||
+ | * 2024-10: [https://ai.wharton.upenn.edu/focus-areas/human-technology-interaction/2024-ai-adoption-report/ Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report] ([https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Executive-Summary.pdf executive summary], [https://ai.wharton.upenn.edu/wp-content/uploads/2024/10/AI-Report_Full-Report.pdf full report]) | ||
+ | ** 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023) | ||
+ | ** Spending on genAI is up 130% (most companies plan to invest going forward) | ||
+ | |||
+ | =See Also= | ||
+ | * [https://www.google.com/books/edition/_/cKnYEAAAQBAJ?hl=en&gbpv=1&pg=PA2 UNESCO. Guidance for Generative AI in Education and Research] |
Latest revision as of 11:32, 15 November 2024
Contents
AI in Education
Survey/study of
- 2023-08: Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
- 2023-10: Employees secretly using AI at work.
- 2023-10: Survey shows students using AI more than professors.
- 2023-11: ChatGPT has entered the classroom: how LLMs could transform education
AI improves learning/education
- Mollick, Ethan R. and Mollick, Lilach and Bach, Natalie and Ciccarelli, LJ and Przystanski, Ben and Ravipinto, Daniel, AI Agents and Education: Simulated Practice at Scale (June 17, 2024). The Wharton School Research Paper. doi: 10.2139/ssrn.4871171
- Can enable personalized education.
- Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
- GPT4 can out-perform human tutors.
- Keppler, Samantha and Sinchaisri, Wichinpong and Snyder, Clare, Backwards Planning with Generative AI: Case Study Evidence from US K12 Teachers (August 13, 2024). doi: 10.2139/ssrn.4924786
- Teachers benefit from using AI as a co-pilot to aid in tasks (planning, how to teach topic, explore ideas).
- There is smaller utility in using AI purely as a text-generator (to make quizzes, workbooks, etc.).
- Effective and Scalable Math Support: Evidence on the Impact of an AI- Tutor on Math Achievement in Ghana
- AI Tutoring Outperforms Active Learning
AI harms learning
- A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study
- Current grading systems cannot detect AI.
- Bastani, Hamsa and Bastani, Osbert and Sungu, Alp and Ge, Haosen and Kabakcı, Özge and Mariman, Rei, Generative AI Can Harm Learning (July 15, 2024). The Wharton School Research Paper.doi: 10.2139/ssrn.4895486
- Access to ChatGPT harmed math education outcomes.
- 2024-09: AI Meets the Classroom: When Does ChatGPT Harm Learning?
Software/systems
- GPTutor (code)
- EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education
- Eureka Labs (founded by Andrej Karpathy) aims to create AI-driven courses (first course is Intro to LLMs)
Individual tools
- Chatbot (OpenAI ChatGPT, Anthropic Claude, Google Gemini)
- NotebookLM: Enables one to "chat with documents".
- Google Learn About
AI for grading
- Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability To Mark Short Answer Questions in K-12 Education (preprint)
Detection
- Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays
- GenAI can simulate student writing in a way that teachers cannot detect.
- AI essays are assessed more positively than student-written.
- Teachers are overconfident in their source identification.
- Both novice and experienced teachers could not identify texts generated by ChatGPT vs. students
AI Text Detectors Don't Work
- 2024-05: RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
- 2024-06: Testing of Detection Tools for AI-Generated Text
AI/human
AI out-performs humans
Tests
- 2023-07: SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
- 2024-06: A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study
- AI scores higher than median students.
Creativity
- 2023-09: Best humans still outperform artificial intelligence in a creative divergent thinking task
- Best humans out-perform AI at creativity. (By implication, median humans may not.)
- 2024-02: The current state of artificial intelligence generative language models is more creative than humans on divergent thinking tasks
- 2024-02: Felin, Teppo and Holweg, Matthias, Theory Is All You Need: AI, Human Cognition, and Causal Reasoning (February 24, 2024). doi: 10.2139/ssrn.4737265
- Argues that human "theory-based" creativity is better than AI "data-based".
- 2024-07: Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- Top human (professional author) out-performs GPT4.
- 2024-09: Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
- LLMs can be creative
- 2024-09: Creative and Strategic Capabilities of Generative AI: Evidence from Large-Scale Experiments
Various
- 2024-11: AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably
Professions
- 2024-03: Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study
- GPT4 improves medical practitioner work; surprisingly, GPT4 alone scored better than a human with GPT4 as aid (on selected tasks).
- 2024-10: Perspectives on Artificial Intelligence–Generated Responses to Patient Messages
- Humanity's Last Exam
- Effort to build a dataset of challenging (but resolvable) questions in specific domain areas, to act as a benchmark to test whether AIs are improving in these challenging topics.
AI improves human work
- 2023-07: Experimental evidence on the productivity effects of generative artificial intelligence
- 2023-09: Dell'Acqua, Fabrizio and McFowland III, Edward and Mollick, Ethan R. and Lifshitz-Assaf, Hila and Kellogg, Katherine and Rajendran, Saran and Krayer, Lisa and Candelon, François and Lakhani, Karim R., Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality (September 15, 2023). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013, The Wharton School Research Paper doi: 10.2139/ssrn.4573321
- 2023-11: Generative AI at Work (National Bureau of Economic Research)
- 2023-12: The Uneven Impact of Generative AI on Entrepreneurial Performance (doi: 10.31219/osf.io/hdjpk)
- 2024-07: Generative AI in Real-World Workplaces: The Second Microsoft Report on AI and Productivity Research
Coding
- 2023-02: The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
- 2024-09: Cui, Zheyuan and Demirer, Mert and Jaffe, Sonia and Musolff, Leon and Peng, Sida and Salz, Tobias, The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers (September 03, 2024). doi: 10.2139/ssrn.4945566
- 2024-11: Hoffmann, Manuel and Boysel, Sam and Nagle, Frank and Peng, Sida and Xu, Kevin, Generative AI and the Nature of Work (October 27, 2024). Harvard Business School Strategy Unit Working Paper No. 25-021, Harvard Business Working Paper No. No. 25-021, doi: 10.2139/ssrn.5007084
Forecasting
Creativity
- 2024-07: Generative AI enhances individual creativity but reduces the collective diversity of novel content
- 2024-08: An empirical investigation of the impact of ChatGPT on creativity
- 2024-11: Artificial Intelligence, Scientific Discovery, and Product Innovation: diffusion model increases "innovation" (patents), boosts the best performers, but also removes some enjoyable tasks.
Counter loneliness
- 2024-07: AI Companions Reduce Loneliness
AI passes Turing Test
- 2023-05: Human or Not? A Gamified Approach to the Turing Test
- 2023-10: Does GPT-4 pass the Turing test?
- 2024-05: People cannot distinguish GPT-4 from a human in a Turing test
- 2024-07: GPT-4 is judged more human than humans in displaced and inverted Turing tests
Uptake
- 2024-03: Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
- 2024-05: Humlum, Anders and Vestergaard, Emilie, The Adoption of ChatGPT. IZA Discussion Paper No. 16992 doi: 10.2139/ssrn.4827166
- 2024-06: Kellogg, Katherine and Lifshitz-Assaf, Hila and Randazzo, Steven and Mollick, Ethan R. and Dell'Acqua, Fabrizio and McFowland III, Edward and Candelon, Francois and Lakhani, Karim R., Don't Expect Juniors to Teach Senior Professionals to Use Generative AI: Emerging Technology Risks and Novice AI Risk Mitigation Tactics (June 03, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-074, Harvard Business Working Paper No. 24-074, The Wharton School Research Paper doi: 10.2139/ssrn.4857373
- 2024-06: Delving into ChatGPT usage in academic writing through excess vocabulary
- 2024-09: The Rapid Adoption of Generative AI
- 2024-10: Growing Up: Navigating Generative AI’s Early Years – AI Adoption Report (executive summary, full report)
- 72% of leaders use genAI at least once a week (c.f. 23% in 2023); 90% agree AI enhances skills (c.f. 80% in 2023)
- Spending on genAI is up 130% (most companies plan to invest going forward)