- Status
- Offline
- Joined
- Mar 3, 2026
- Messages
- 421
- Reaction score
- 7
Tired of your AI aimbot feeling like it is running on a potato?
Most of the garbage being shared lately relies on bloated YOLO architectures that tank your frame times and introduce massive input lag. If you want real performance for an external pixel bot, you need something lightweight. I got my hands on a tiny Convolutional Neural Net (CNN) designed specifically for sub-millisecond head localization.
The Technical Breakdown
This isn't just another paste. It uses a 50x50 RGB input crop and outputs normalized (x, y) coordinates along with a presence logit (basically: is there a player head or not?).
The CoordConvStem is the secret sauce here—it appends normalized coordinate channels before the first convolution stack, which helps the model actually understand where it is looking in the 2D space during regression. This is a massive improvement over basic CNNs that struggle with coordinate precision.
Model Definition (model.py)
Inference Logic (inference.py)
Reality Check
Is it perfect? No. The original implementation was trained on a small dataset of 1000 images. The dev notes that x,y localization can be inconsistent. If you are building a serious external, do not rely on this solo. Run it in parallel with traditional HSV color scanning or a target verification check. It is a solid base for anyone looking to move away from slow YOLO weights to something actually built for competitive FPS latency requirements.
Who is going to be the first to port this to a C++ inference engine for a proper internal project?
Most of the garbage being shared lately relies on bloated YOLO architectures that tank your frame times and introduce massive input lag. If you want real performance for an external pixel bot, you need something lightweight. I got my hands on a tiny Convolutional Neural Net (CNN) designed specifically for sub-millisecond head localization.
You cant view this link please login.
You cant view this link please login.
The Technical Breakdown
This isn't just another paste. It uses a 50x50 RGB input crop and outputs normalized (x, y) coordinates along with a presence logit (basically: is there a player head or not?).
- Input: 50x50 RGB
- Output: Head X/Y + Presence Bit
- Architecture: Custom CoordConvStem for better spatial awareness
The CoordConvStem is the secret sauce here—it appends normalized coordinate channels before the first convolution stack, which helps the model actually understand where it is looking in the 2D space during regression. This is a massive improvement over basic CNNs that struggle with coordinate precision.
Tested inference latency averages (Google Colab/Local):
- ONNX model + CUDA FP32: 0.7ms (The gold standard for external pixel bots)
- ONNX model + CPU FP32: 2.5ms
- ONNX model + CPU quantized (INT8): 1.3ms
- CoreML (Macbook): 0.6ms
- ONNX model + CUDA FP32: 0.7ms (The gold standard for external pixel bots)
- ONNX model + CPU FP32: 2.5ms
- ONNX model + CPU quantized (INT8): 1.3ms
- CoreML (Macbook): 0.6ms
Model Definition (model.py)
Code:
import torch
import torch.nn as nn
class CoordConvStem(nn.Module):
def forward(self, x: torch.Tensor) -> torch.Tensor:
batch_size, _, height, width = x.shape
device = x.device
dtype = x.dtype
yy = torch.linspace(-1.0, 1.0, steps=height, device=device, dtype=dtype)
xx = torch.linspace(-1.0, 1.0, steps=width, device=device, dtype=dtype)
yy = yy.view(1, 1, height, 1).expand(batch_size, 1, height, width)
xx = xx.view(1, 1, 1, width).expand(batch_size, 1, height, width)
return torch.cat([x, xx, yy], dim=1)
class TinyPointPresenceNet(nn.Module):
def __init__(self):
super().__init__()
self.coord_stem = CoordConvStem()
self.features = nn.Sequential(
nn.Conv2d(5, 24, 3, padding=1), nn.ReLU(),
nn.Conv2d(24, 24, 3, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(24, 32, 3, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU()
)
self.head = nn.Sequential(
nn.Flatten(),
nn.Linear(64 * 13 * 13, 128), nn.ReLU(),
nn.Linear(128, 3)
)
def forward(self, x: torch.Tensor):
x = self.coord_stem(x)
x = self.features(x)
raw = self.head(x)
return torch.sigmoid(raw[:, :2]), raw[:, 2]
Inference Logic (inference.py)
Code:
import torch
import time
from model import TinyPointPresenceNet
def run_inference(model, image_tensor):
model.eval()
with torch.no_grad():
start = time.perf_counter()
pred_xy, present_logits = model(image_tensor)
latency = (time.perf_counter() - start) * 1000.0
return pred_xy, torch.sigmoid(present_logits), latency
Reality Check
Is it perfect? No. The original implementation was trained on a small dataset of 1000 images. The dev notes that x,y localization can be inconsistent. If you are building a serious external, do not rely on this solo. Run it in parallel with traditional HSV color scanning or a target verification check. It is a solid base for anyone looking to move away from slow YOLO weights to something actually built for competitive FPS latency requirements.
Who is going to be the first to port this to a C++ inference engine for a proper internal project?