MyGit

web-infra-dev/midscene

Fork: 937 Star: 12630 (更新于 2026-04-15 17:20:56)

license: MIT

Language: TypeScript .

AI-powered, vision-driven UI automation for every platform.

最后发布版本: v1.7.3 ( 2026-04-09 18:07:44)

官方网址 GitHub网址

Midscene.js

Midscene.js

English | 简体中文

Official Website: https://midscenejs.com/

web-infra-dev%2Fmidscene | Trendshift

AI-powered, vision-driven UI automation for every platform.

npm version hugging face model downloads License discord twitter Ask DeepWiki.com

📣 Midscene Skills is here!

Use Midscene Skills to control any platform with OpenClaw

Showcases

💡 Features

Write Automation with Natural Language

  • Describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • Use Javascript SDK or YAML to write your automation script.

Web & Mobile App & Any Interface

  • Web Automation: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
  • Android Automation: Use Javascript SDK with adb to control your local Android device.
  • iOS Automation: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators.
  • Any Interface Automation: Use Javascript SDK to control your own interface.

For Developers

  • Three kinds of APIs:
  • MCP: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs
  • Caching for Efficiency: Replay your script with cache and get the result faster.
  • Debugging Experience: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.

👉 Zero-code Quick Experience

✨ Driven by Visual Language Model

Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like Qwen3-VL, Doubao-1.6-vision, gemini-3-pro, and UI-TARS. For data extraction and page understanding, you can still opt in to include DOM when needed.

  • Pure-vision localization for UI actions; the DOM extraction mode is removed.
  • Works across web, mobile, desktop, and even <canvas> surfaces.
  • Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs.
  • DOM can still be included for data extraction and page understanding when needed.
  • Strong open-source options for self-hosting.

Read more about Model Strategy

📄 Resources

🤝 Community

🌟 Awesome Midscene

Community projects that extend Midscene.js capabilities:

📝 Credits

We would like to thank the following projects:

  • Rsbuild and Rslib for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen-VL for the open-source VL model Qwen-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • appium-webdriveragent for the javascript operate XCTest。
  • YADB for the yadb tool which improves the performance of text input.
  • libnut-core for the cross-platform native keyboard and mouse control.
  • Puppeteer for browser automation and control.
  • Playwright for browser automation and control and testing.

📖 Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

✨ Star History

Star History Chart

📝 License

Midscene.js is MIT licensed.


If this project helps you or inspires you, please give us a star

最近版本更新:(数据更新于 2026-04-15 17:20:31)

2026-04-09 18:07:44 v1.7.3

2026-04-09 11:48:51 v1.7.2

2026-04-08 21:37:04 v1.7.1

2026-04-08 10:51:43 v1.7.0

2026-04-07 16:33:36 v1.6.4

2026-04-07 10:51:50 v1.6.3

2026-04-02 12:25:13 v1.6.2

2026-04-01 09:19:12 v1.6.1

2026-03-26 11:04:53 v1.6.0

2026-03-25 11:28:31 v1.5.8

主题(topics):

javascript ai testing computer-use ai-test browser-use gpt-operator phone-use

web-infra-dev/midscene同语言 TypeScript最近更新仓库

2026-04-19 02:40:58 bbc/sqs-producer

2026-04-18 19:46:48 ueberdosis/tiptap

2026-04-18 14:23:51 lobehub/lobehub

2026-04-18 13:48:48 thunderbird/thunderbolt

2026-04-18 13:23:09 simstudioai/sim

2026-04-18 10:46:33 TanStack/table