- Status
- Offline
- Joined
- Mar 3, 2026
- Messages
- 644
- Reaction score
- 457
Had to dig into a new approach for GUI automation lately. Instead of the usual fragile pixel-searching, Screph takes a different route by combining computer vision with structured data export for LLM agents. It's essentially a Windows-based annotator that turns visual UI elements into machine-readable JSON.
The Technical Core
Screph is built on Python, utilizing PySide6 for the interface and a heavy-duty CV pipeline for element detection. It doesn't just look for images; it attempts to understand the GUI hierarchy via YOLO-based detection and standard OpenCV processing. This is particularly useful for environments where traditional APIs don't exist or are heavily protected.
Key Architectural Features
A Note on Environment Safety
The author explicitly frames this as a general-purpose automation tool and states it is not intended for bypassing protections or botting in online games. While the tool provides high-level human-like input and hardware emulation via Arduino, any application in "sensitive" environments is entirely on the user. If you're planning to test this on software with active monitoring, keep the hardware isolation in mind.
Getting the Build
You can find the technical documentation and the main repository on GitHub -
The setup requires Python 3.x and the standard PySide6/OpenCV/YOLO stack.
Anyone already trying to link CV-based annotation with local LLMs for decision making?
The Technical Core
Screph is built on Python, utilizing PySide6 for the interface and a heavy-duty CV pipeline for element detection. It doesn't just look for images; it attempts to understand the GUI hierarchy via YOLO-based detection and standard OpenCV processing. This is particularly useful for environments where traditional APIs don't exist or are heavily protected.
Key Architectural Features
- Computer Vision Pipeline: Uses YOLO for detecting UI regions and OpenCV for segmentation, contours, and masks. It supports extensible architecture for custom CV modes.
- Human-Like Input Emulation: Implements mouse movements with natural trajectories, clicks, and drags. Keyboard events include variable typing speeds and micro-delays to mimic human behavior.
- Hardware Integration: Crucially, it supports Arduino-based modes. If you're working in environments with strict input monitoring, routing mouse/keyboard events through external hardware is a standard move to stay under the radar.
- Data Export: Outputs structured JSON. The idea is to feed this into an LLM agent which then generates the actual automation logic based on the UI flow you've annotated.
- Backend Support: Includes an optional Django-based backend for remote control and REST API integrations.
Intended Use Cases:
— Interfacing with legacy systems (old terminals, banking software) lacking APIs.
— Building visual compliance scripts that verify UI states.
— Automating workflows across complex creative suites like Adobe or DaVinci.
— Digitizing expert routines by recording and annotating decision points.
Advanced Input Config:
— Interfacing with legacy systems (old terminals, banking software) lacking APIs.
— Building visual compliance scripts that verify UI states.
— Automating workflows across complex creative suites like Adobe or DaVinci.
— Digitizing expert routines by recording and annotating decision points.
Advanced Input Config:
Code:
input_settings = {
"trajectories": "human",
"micro_movements": True,
"random_delays": 50, # ms
"hardware_interop": "arduino_com3"
}
A Note on Environment Safety
The author explicitly frames this as a general-purpose automation tool and states it is not intended for bypassing protections or botting in online games. While the tool provides high-level human-like input and hardware emulation via Arduino, any application in "sensitive" environments is entirely on the user. If you're planning to test this on software with active monitoring, keep the hardware isolation in mind.
Getting the Build
You can find the technical documentation and the main repository on GitHub -
You cant view this link please login.
. The setup requires Python 3.x and the standard PySide6/OpenCV/YOLO stack.
Anyone already trying to link CV-based annotation with local LLMs for decision making?