Difference between revisions of "Human Computer Interaction"
| KevinYager (talk | contribs)  (→Full Desktop GUI) | KevinYager (talk | contribs)   (→AI Computer Use) | ||
| Line 48: | Line 48: | ||
| * 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code]) | * 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code]) | ||
| * 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments] | * 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments] | ||
| + | * 2025-02: [https://arxiv.org/abs/2502.14282 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC] ([https://www.youtube.com/watch?v=EMbIpzqJld0 video], [https://github.com/X-PLUG/MobileAgent/tree/main code]) | ||
| ==Browser== | ==Browser== | ||
Revision as of 08:50, 13 March 2025
A.k.a. HCI
Contents
Smart Wearables
Pendants, etc.
- Rewind pendant
- Compass AI pendant
- Humane AI pin ($700)
- Rabbit R1 ($200)
- OpenInterpreter natural-language control fob ($100, code and schematics available)
- Limitless Pendant for AI transcription ($100)
- Iyo One earbuds ($600)
- Omi open-source wearable ($90)
Smart Glasses
- Meta Ray-Ban smart glasses
- XReal Air2
- RayNeo X2 Lite
- Digilens Argo
- Brilliant Frame smart glasses ($350)
- Based Hardware open-source AI wearables, including glasses
- Snap Spectacles AR glasses (specs)
- Meta Orion Augmented Reality glasses
- Rokid Glasses (release news)
- Even Realities G1
- INMO Air 3 (video)
- Solos AirGo V smart glasses, with vision/camera, ChatGPT integration
- Raven Resonance
UIs tailored to AI
Example products with AI-first interfaces
- Thread of examples
- granola: AI notepad for meetings.
- attio: A next-generation CRM platform that leverages AI to automate complex go-to-market tasks and enhance customer relationship management.
- rabbitholes.ai: An AI-powered platform that facilitates deep, explorative conversations on an infinite canvas, enabling users to learn faster and delve deeper into topics.
- tldraw.com: A free, instant collaborative whiteboarding tool for creating diagrams, flowcharts, and sketches with real-time collaboration.
- herostuff.com: An AI-driven marketplace that allows users to scan, price, and list items for sale quickly using AI technology.
- krea.ai: A platform that simplifies generative AI, enabling users to create and enhance images and videos for free.
- superrandom.studio/venngenn: An AI tool that generates unique and creative images based on user prompts, offering various style and environment modifiers.
- scrapybara.com/playground: An experimental AI-powered playground that allows users to interact with AI agents for various tasks, including web scraping and data extraction.
- sdk.vercel.ai: An open-source AI SDK for TypeScript that provides tools to build AI-powered products, supporting multiple AI providers and frameworks.
- midday.ai: A business management platform designed for freelancers, offering features like invoicing, time tracking, financial overviews, and an AI assistant.
- dupe.com: A platform that helps users find similar products at lower prices, aiming to provide affordable alternatives to popular items.
 
- Granola: The AI notepad for people in back-to-back meetings
AI Computer Use
Research
- 2024-11: The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use (code)
- 2025-01: Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
- 2025-02: PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC (video, code)
Browser
- Helium: Light-weight web automation with Python; library for automating browsers (Chrome, Firefox) actions (enter values, click buttons, etc.)
- Browser-Use (app)
- Lightpanda Browser: open-source browser made for headless usage
- Steel: The open-source browser API for AI agents & apps (build live web agents and browser automation tools)
- stagehand: An AI web browsing framework focused on simplicity and extensibility
- Skyvern: Automate Browser-based workflows using LLMs and Computer Vision
Full Desktop GUI
- OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (huggingface, paper)
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use (paper, huggingface, leaderboard)
- Microsoft OmniParser v2 (code) for Windows 11 VMs
- ByteDance UI-TARS Desktop: GUI agent for controlling computers with natural language
- Manus AI
- OWL (Optimized Workforce Learning): General Multi-Agent Assistance in Real-World Task Automation
- OpenAI responses API and agents SDK
Screen Record
- screenpipe (github): AI app store powered by 24/7 desktop history
Computer Use Agents
- Anthropic Computer Use
- OpenAI Operator
- Convergence AI Proxy (examples)

