gfuhr's posts

some thoughts about Computer Vision

Deploying state-of-the-art object detectors (DETRs) to AWS.

Forget about Yolo. Transformer-based models are better now, and easy to deploy!

14 min read · January 01, 2025

2025 · computer-vision dfine aws batch sagemaker deploy object detection yolo · posts
ScreenshotQuery, make queries to screenshots using Vision Language Models.

Describe and talk with your set of screenshots

16 min read · December 04, 2024

2024 · computer-vision VLM image-retrieval image-caption semantic-search · posts
Three random scripts.

I made some cool little code snippets that I like it to share.

5 min read · August 01, 2024

2024 · computer-vision arxiv real-madrid · posts
The future of OCR is not to use OCR!

End-to-end document understanding will be huge.

4 min read · May 20, 2024

2024 · ocr document-understanding computer-vision · posts
Caution while using Large Language Models

Weird failures and behavior from LLMs.

3 min read · February 11, 2024

2024 · machine-learning large-language-models llm · posts