Difference between revisions of "Human Computer Interaction"

From GISAXS
Jump to: navigation, search
(Full Desktop GUI)
(AI Computer Use)
 
(3 intermediate revisions by the same user not shown)
Line 48: Line 48:
 
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
 
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
 
* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
 
* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
 +
* 2025-02: [https://arxiv.org/abs/2502.14282 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC] ([https://www.youtube.com/watch?v=EMbIpzqJld0 video], [https://github.com/X-PLUG/MobileAgent/tree/main code])
  
 
==Browser==
 
==Browser==
Line 61: Line 62:
 
* [https://huggingface.co/blog/Ziyang/screenspot-pro ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use] ([https://likaixin2000.github.io/papers/ScreenSpot_Pro.pdf paper], [https://huggingface.co/datasets/likaixin/ScreenSpot-Pro huggingface], [https://gui-agent.github.io/grounding-leaderboard/ leaderboard])
 
* [https://huggingface.co/blog/Ziyang/screenspot-pro ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use] ([https://likaixin2000.github.io/papers/ScreenSpot_Pro.pdf paper], [https://huggingface.co/datasets/likaixin/ScreenSpot-Pro huggingface], [https://gui-agent.github.io/grounding-leaderboard/ leaderboard])
 
* Microsoft [https://huggingface.co/microsoft/OmniParser-v2.0 OmniParser v2] ([https://github.com/microsoft/OmniParser/tree/master/omnitool code]) for Windows 11 VMs
 
* Microsoft [https://huggingface.co/microsoft/OmniParser-v2.0 OmniParser v2] ([https://github.com/microsoft/OmniParser/tree/master/omnitool code]) for Windows 11 VMs
* [https://github.com/bytedance/UI-TARS-desktop UI-TARS Desktop]: GUI agent for controlling computers with natural language
+
* ByteDance [https://github.com/bytedance/UI-TARS-desktop UI-TARS Desktop]: GUI agent for controlling computers with natural language
 
** [https://arxiv.org/abs/2501.12326 UI-TARS: Pioneering Automated GUI Interaction with Native Agents] ([https://github.com/bytedance/UI-TARS code])
 
** [https://arxiv.org/abs/2501.12326 UI-TARS: Pioneering Automated GUI Interaction with Native Agents] ([https://github.com/bytedance/UI-TARS code])
 +
* [https://manus.im/ Manus AI]
 +
* [https://github.com/camel-ai/owl OWL (Optimized Workforce Learning)]: General Multi-Agent Assistance in Real-World Task Automation
 +
* OpenAI [https://platform.openai.com/docs/api-reference/responses responses API] and [https://platform.openai.com/docs/guides/agents agents SDK]
  
 
==Screen Record==
 
==Screen Record==

Latest revision as of 08:50, 13 March 2025

A.k.a. HCI

Smart Wearables

Pendants, etc.

Smart Glasses

UIs tailored to AI

Example products with AI-first interfaces

  • Thread of examples
    • granola: AI notepad for meetings.
    • attio: A next-generation CRM platform that leverages AI to automate complex go-to-market tasks and enhance customer relationship management.
    • rabbitholes.ai: An AI-powered platform that facilitates deep, explorative conversations on an infinite canvas, enabling users to learn faster and delve deeper into topics.
    • tldraw.com: A free, instant collaborative whiteboarding tool for creating diagrams, flowcharts, and sketches with real-time collaboration.
    • herostuff.com: An AI-driven marketplace that allows users to scan, price, and list items for sale quickly using AI technology.
    • krea.ai: A platform that simplifies generative AI, enabling users to create and enhance images and videos for free.
    • superrandom.studio/venngenn: An AI tool that generates unique and creative images based on user prompts, offering various style and environment modifiers.
    • scrapybara.com/playground: An experimental AI-powered playground that allows users to interact with AI agents for various tasks, including web scraping and data extraction.
    • sdk.vercel.ai: An open-source AI SDK for TypeScript that provides tools to build AI-powered products, supporting multiple AI providers and frameworks.
    • midday.ai: A business management platform designed for freelancers, offering features like invoicing, time tracking, financial overviews, and an AI assistant.
    • dupe.com: A platform that helps users find similar products at lower prices, aiming to provide affordable alternatives to popular items.
  • Granola: The AI notepad for people in back-to-back meetings

AI Computer Use

Research

Browser

  • Helium: Light-weight web automation with Python; library for automating browsers (Chrome, Firefox) actions (enter values, click buttons, etc.)
  • Browser-Use (app)
  • Lightpanda Browser: open-source browser made for headless usage
  • Steel: The open-source browser API for AI agents & apps (build live web agents and browser automation tools)
  • stagehand: An AI web browsing framework focused on simplicity and extensibility
  • Skyvern: Automate Browser-based workflows using LLMs and Computer Vision

Full Desktop GUI

Screen Record

Computer Use Agents