Skip to content

Conversation

@sfallah
Copy link
Contributor

@sfallah sfallah commented Nov 20, 2025

Feature Request: #16676

Make sure to read the contributing guidelines before submitting a PR

GGUF Models

sabafallah/DeepSeek-OCR-GGUF

deepseek-ocr-f32.gguf

mmproj-deepseek-ocr-f32.gguf

Running the Model

Build llama.cpp (Mac)

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release

Running llama-mtmd-cli

DeepSeekOCR Paper (First page)
build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-f16.gguf \
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf \
--image tmp/mtmd_test_data/Deepseek-OCR-2510.18234v1_page1.png \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek-ocr --temp 0
Hard Test (Old Newspaper Image)
build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-f16.gguf \
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf \
--image tools/mtmd/test-1.jpeg \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek-ocr --temp 0

@github-actions github-actions bot added model Model specific examples python python script changes labels Nov 20, 2025
@sfallah sfallah marked this pull request as draft November 20, 2025 09:12
}

} else {
if (mtmd_is_deepseekocr(ctx.ctx_vision.get())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many models does not support chat mode - it's not our responsibility to tell user what to do

Copy link
Collaborator

@ngxson ngxson Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beside, model-specific API like is_model_abc is not allowed. it's an anti-pattern when designing public API

Copy link
Contributor

@bluebread bluebread Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll clean it up very soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

@ngxson
Copy link
Collaborator

ngxson commented Dec 9, 2025

@sfallah @bluebread Hmm, are you sure that this PR is working?

Using the existing test file in the repo ./tools/mtmd/test-1.jpeg

llama-mtmd-cli -m ../models/DeepSeek-OCR/model.gguf --mmproj ../models/DeepSeek-OCR/mmproj-model.gguf \
  -p "<|grounding|>Convert the document to markdown." \
  --image ./tools/mtmd/test-1.jpeg --chat-template deepseek

The beginning of output looks coherent

<|ref|>title<|/ref|><|det|>[[185, 109, 717, 220]]<|/det|>
# Che New Jork Cimes 

<|ref|>text<|/ref|><|det|>[[68, 142, 174, 180]]<|/det|>
"All the News That's Fit to Print" 

<|ref|>text<|/ref|><|det|>[[67, 228, 176, 246]]<|/det|>
VOL.CXVIII. No.40,721 

<|ref|>text<|/ref|><|det|>[[292, 228, 374, 246]]<|/det|>
1 an the New York Times 

<|ref|>text<|/ref|><|det|>[[410, 228, 584, 246]]<|/det|>
NEW YORK, MONDAY, JULY 21, 1891 

<|ref|>text<|/ref|><|det|>[[799, 138, 908, 204]]<|/det|>
LATE CITY EDITION
Wednesday, June, seven miles nine
tenths, Brooklyn, Jamaica Plain,
Three miles, Brooklyn, Brooklyn,
One hundred and fifty-sixth Street,
N. C. Congress St. uptown at 9, 12. 

<|ref|>text<|/ref|><|det|>[[839, 220, 903, 240]]<|/det|>
13 CENTES 

<|ref|>title<|/ref|><|det|>[[65, 269, 896, 399]]<|/det|>
# MEN WALK ON MOON 

<|ref|>title<|/ref|><|det|>[[65, 408, 894, 528]]<|/det|>
# ASTRONAUTS LAND ON PLAIN; COLLECT ROCKS, PLANT FLAG 

<|ref|>text<|/ref|><|det|>[[65, 552, 230, 587]]<|/det|>
Voice From Moon: 

<|ref|>text<|/ref|><|det|>[[73, 594, 269, 624]]<|/det|>
‘Eagle Has Lander’

But then it went repeatedly:

<|ref|>text<|/ref|><|det|>[[710, 834, 904, 876]]<|/det|>
The second item is that the sun is moving. Assuming that its orbit is circular, the distance between the earth and the sun will increase at the rate of about 1.3 miles per day. 

<|ref|>text<|/ref|><|det|>[[710, 876, 904, 908]]<|/det|>
The third item is that the moon is moving. Assuming that its orbit is circular, the distance between the earth and the moon will increase at the rate of about 1.3 miles per day. 

<|ref|>text<|/ref|><|det|>[[710, 908, 904, 940]]<|/det|>
The fourth item is that the sun is moving. Assuming that its orbit is circular, the distance between the earth and the sun will increase at the rate of about 1.3 miles per day. 

<|ref|>text<|/ref|><|det|>[[710, 940, 904, 972]]<|/det|>
The fifth item is that the moon is moving. Assuming that its orbit is circular, the distance between the earth and the moon will increase at the rate of about 1.3 miles per day. 

<|ref|>text<|/ref|><|det|>[[710, 972, 904, 984]]<|/det|>
The sixth item is that the sun is moving. Assuming that its orbit is circular, the distance between the earth and the sun will increase at the rate of about 1.3 miles per day.

@ngxson
Copy link
Collaborator

ngxson commented Dec 9, 2025

Another run, it repeats:

<|ref|>text<|/ref|><|det|>[[763, 151, 911, 207]]<|/det|> Wednesday, June, seven miles nine tenths, Brooklyn, Jamaica Plain, 7th Avenue, 8th Street, Brooklyn, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8

@bluebread
Copy link
Contributor

Another run, it repeats:

<|ref|>text<|/ref|><|det|>[[763, 151, 911, 207]]<|/det|> Wednesday, June, seven miles nine tenths, Brooklyn, Jamaica Plain, 7th Avenue, 8th Street, Brooklyn, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8th Avenue, 8th Street, 8

Here are the script of the reference model and the output, which is also not good:

from transformers import AutoModel, AutoTokenizer
import torch
import os

os.environ["CUDA_VISIBLE_DEVICES"] = '1'
model_name = '/root/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='eager', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = '/root/llama.cpp/tools/mtmd/test-1.jpeg'
output_path = './outputs'
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 640, image_size = 640, crop_mode=False, save_results = True, test_compress = True)
(deepseek-ocr) root@13ca65024005:~# python3 /root/DeepSeek-OCR-vLLM/DeepSeek-OCR-master/DeepSeek-OCR-hf/run_dpsk_ocr.py
You are using a model of type deepseek_vl_v2 to instantiate a model of type DeepseekOCR. This is not supported for all configurations of models and can yield errors.
Some weights of DeepseekOCRForCausalLM were not initialized from the model checkpoint at /root/DeepSeek-OCR and are newly initialized: ['model.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
directly resize
Saving tensor global_view (shape: (1, 3, 640, 640), dtype: torch.float32, sum: 354309.21875) to global_view_py.txt
Tensor saved successfully
/root/miniconda3/envs/deepseek-ocr/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
Saving tensor patch_w (shape: (768, 3, 16, 16), dtype: torch.float32, sum: 77.85554504394531) to patch_w_py.txt
Tensor saved successfully
Saving tensor patch_b (shape: (1, 1, 1, 768), dtype: torch.float32, sum: 11.546276092529297) to patch_b_py.txt
Tensor saved successfully
Saving tensor inp_raw (shape: (1, 3, 640, 640), dtype: torch.float32, sum: 354792.75) to inp_raw_py.txt
Tensor saved successfully
Saving tensor inpL (shape: (1, 40, 40, 768), dtype: torch.float32, sum: 52674.0703125) to inpL_py.txt
Tensor saved successfully
=====================
BASE:  torch.Size([1, 100, 1280])
NO PATCHES
=====================
The attention layers in this model are transitioning from computing the RoPE embeddings internally through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed `position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be removed and `position_embeddings` will be mandatory.
<|ref|>title<|/ref|><|det|>[[60, 255, 936, 520]]<|/det|>
# MEN WALK ON MOON  MEN WALK ON MOON  MEN WALK ON MOON  MEN WALK ON MOAN  MEN WALK ON MOON  MEN WALK ON MOON  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PLANT  MEN WALK ON PLANT  MEN WALK ON PLANT  MEN WALK ON PLANET  MEN WALK ON PLANT  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PLAIN  MEN WALK ON PLAIN  MEN WALK ON PLAIN  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PLEAN  MEN WALK ON PLEAN  MEN WALK ON PLEAN  MEN WALON PLANET  MEN WALON PLANET  MEN WALON PLANET  MEN WALON PLANT  MEN WALON PLANT  MEN WALON PLANT  MEN WALON PLANET  MEN WALON PLANET  MEN WALON PLEAN  MEN WALON PLEAN  MEN WALON PLEAN  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PANEL  MEN WALK ON PANEL  MEN WALK ON PANEL  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALK ON PALAN  MEN WALK ON PALAN  MEN WALK ON PALAN  MEN WALK ON PLANET  MEN WALK ON PLANET  MEN WALKON PLANET  MEN WALKON PLANET  MEN WALKON PLANET  MEN WALON PLANET  MEN WALON PLANET  MENWALKON PLANET  MENWALKON PLANET  MENWALKON PLANET  MEN WALKON PLANET  MEN WALKON PLANET  MENE WALKON PLANET  MEN WALKON PLANET  MEN WALKON PLANET
==================================================
image size:  (640, 488)
valid image tokens:  100
output texts tokens (valid):  446
compression ratio:  4.46
==================================================
===============save results:===============
image: 0it [00:00, ?it/s]

Based on my experiments, while DeepSeek-OCR is able to handle clean well-formatted documents (e.g. academic papers), it struggles with arbitrary image inputs (e.g. photograph of newspaper) due to its limited training dataset. I suspect it's an experimental project as some kind of POC rather than a product.

@ngxson
Copy link
Collaborator

ngxson commented Dec 9, 2025

I suspect it's an experimental project as some kind of POC rather than a product.

If that actually the case, I'm quite doubt about merging this PR as it can easily flood the project with issues regarding the model quality, which we have no control at all.

Not ignoring your efforts here - it's amazing to see such a complicated architecture implemented in GGML. But in the past, I myself also had many PRs that are not merge-able due to model quality - which is even not my faults. Most recent example was the PaddleOCR model

What I think can be better is to keep the PR as an experiment until when more users confirm that it works (or maybe deepseek team will have another better-trained OCR model in the future)

@sfallah
Copy link
Contributor Author

sfallah commented Dec 9, 2025

@ngxson @bluebread
I don't think there is an issue with the model itself!

Forcing the base image size (by hardcoding it) I get a better result.


<|ref|>text<|/ref|><|det|>[[63, 118, 177, 178]]<|/det|>
"All the News That's Fit to Print" 

<|ref|>text<|/ref|><|det|>[[63, 222, 176, 238]]<|/det|>
VOL. CXVIII. No. 40,721 

<|ref|>text<|/ref|><|det|>[[396, 221, 579, 238]]<|/det|>
NEW YORK, MONDAY, JULY 21, 1969 

<|ref|>text<|/ref|><|det|>[[789, 120, 908, 138]]<|/det|>
LATE CITY EDITION 

<|ref|>text<|/ref|><|det|>[[783, 140, 911, 211]]<|/det|>
Wednesday, July 21, 1969, 10 CENTS
Water: Salt, warm today; clear
tonight, Sunny, pleasant sunshine.
Time, range: today 10:45; Monday
11:45. High: 10:00, Friday 10:00. 
In: Complete U.S. report as P. 51. 

<|ref|>title<|/ref|><|det|>[[60, 263, 907, 540]]<|/det|>
# MEN WALK ON MOON
## ASTRONAUTS LAND ON PLAIN; COLLECT ROCKS, PLANT FLAG 

<|ref|>text<|/ref|><|det|>[[56, 561, 256, 632]]<|/det|>
Voice From Moon:
'Eagle Has Landed' 

<|ref|>text<|/ref|><|det|>[[66, 653, 262, 723]]<|/det|>
EAGLE (the lunar module) Houston, Tranquility
Base here. The Eagle has landed.
HOCHSTON: Major, Tranquility, we envy you on the
ground. You've put a bunch of guys about to turn blue.
We're breaking again. Thanks a lot. 

<|ref|>text<|/ref|><|det|>[[66, 724, 262, 793]]<|/det|>
TRANQUILITY BASE: Thank you.
HOCHSTON: You're looking good here.
TRANQUILITY BASE: A very smooth touchdown.
HOCHSTON: Eagle, you are my life. (The first
step in the lunar operation.) Over. 

<|ref|>text<|/ref|><|det|>[[66, 793, 262, 843]]<|/det|>
TRANQUILITY BASE: Major, stay for T1.
HOCHSTON: Major and I are on my way. (The second step.)
TRANQUILITY BASE: Major, 

<|ref|>text<|/ref|><|det|>[[66, 843, 262, 872]]<|/det|>
HOCHSTON: The command and service module
How do you read me? 

<|ref|>text<|/ref|><|det|>[[66, 872, 262, 902]]<|/det|>
HOCHSTON: Columbia, he has landed Tranquility
Base. Eagle is at Tranquility. I read you first by.
Over. 

<|ref|>text<|/ref|><|det|>[[66, 902, 225, 921]]<|/det|>
COLUMBIA: Yes, I heard the whole thing. 

<|ref|>text<|/ref|><|det|>[[66, 921, 210, 940]]<|/det|>
HOCHSTON: Well, it's a good show. 

<|ref|>text<|/ref|><|det|>[[66, 940, 175, 959]]<|/det|>
TRANQUILITY BASE: Yes. 

<|ref|>text<|/ref|><|det|>[[66, 959, 262, 979]]<|/det|>
COLUMBIA: The most lunar module step you
will be for the 72 event. That is at 21 minutes 26 sec- 

<|ref|>text<|/ref|><|det|>[[66, 979, 190, 988]]<|/det|>
and 21.7 seconds. 

<|ref|>image<|/ref|><|det|>[[264, 543, 693, 968]]<|/det|>
 

<|ref|>text<|/ref|><|det|>[[502, 969, 692, 988]]<|/det|>
The lunar module's main attitude relative to the first step on the surface of the moon. 

<|ref|>text<|/ref|><|det|>[[696, 565, 909, 628]]<|/det|>
A Powdery Surface
Is Closely Explored 

<|ref|>text<|/ref|><|det|>[[754, 647, 852, 666]]<|/det|>
By JOHN NOBLE WILFORD 

<|ref|>text<|/ref|><|det|>[[696, 668, 909, 700]]<|/det|>
HOCHSTON, Monday, July 21—Men have landed and
walked on the moon. 

<|ref|>text<|/ref|><|det|>[[696, 700, 909, 748]]<|/det|>
Two of the astronauts, astronauts of Apollo 11, arrived their
fragile four-legged lunar module safely and smoothly to
the lunar landing yesterday at 4:17:40 P.M., Eastern day-
light time. 

<|ref|>text<|/ref|><|det|>[[696, 748, 909, 787]]<|/det|>
Neil A. Armstrong, the 38-year-old civilian commander,
landed on earth and the astronauts were there. 

<|ref|>text<|/ref|><|det|>[[696, 787, 909, 836]]<|/det|>
"Obviously, Tranquility Base here. The Eagle has landed."
The first step to reach the moon—the Armstrong and
his engineer, Col. Edwin E. Aldrin Jr. of the Aeronau- 
tics Department—was a long one. 

<|ref|>text<|/ref|><|det|>[[696, 836, 909, 884]]<|/det|>
Aldrin and a half hours later, Mr. Armstrong opened
the landing craft's hatch, stepped slowly down the ladder
and descended as he pushed out first American flag on the
lunar crest. 

<|ref|>text<|/ref|><|det|>[[696, 884, 909, 923]]<|/det|>
"I don't see much step for my nose, one giant leap for
the whole world."
He first step on the moon came at 10:56:29 P.M., as
a television camera outside the craft transmitted his every
move to an aerial and visual audience of hundreds of
millions of people on earth. 

<|ref|>text<|/ref|><|det|>[[756, 923, 861, 940]]<|/det|>
TELEVISION: "Slippin' Test Soil"

@ngxson
Copy link
Collaborator

ngxson commented Dec 9, 2025

I think we must make sure it works with different input image qualities (e.g. lighting conditions, colors) and different sizes. A test script would be nice-to-have - it can be push to a https://gist.github.com/ and shared here in the comment. such test script can be placed into tools/mtmd/test-deepseek-ocr.py for example

Any bugs related to output quality should be addressed before I can do any refactoring on the PR, otherwise it's very difficult to trace back the commit containing the bug.

@bluebread
Copy link
Contributor

bluebread commented Dec 9, 2025

@sfallah Good job! We need more delicate logic for auto mode selection and comprehensive testing. Could you please take care of it? I'll probably be busy for the next couple of days.

@sfallah
Copy link
Contributor Author

sfallah commented Dec 9, 2025

@sfallah Good job! We need more delicate logic for auto mode selection and comprehensive testing. Could you please take care of it? I'll probably be busy for the next couple of days.

I will take care of tests and the rest no problem.

setting min-resolution base (1024) max large (1280) for dynamic-resolution
@ngxson
Copy link
Collaborator

ngxson commented Dec 10, 2025

FYI, some changes are added via this PR: #17909

  • build_vit() with fused qkv support
  • in single-turn mode, image placement in prompt changed (now follow the order: image first, then text)

# Conflicts:
#	tools/mtmd/clip.cpp
#	tools/mtmd/mtmd-cli.cpp
added new opt to tests.sh to disable flash-attn
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR
@ngxson
Copy link
Collaborator

ngxson commented Dec 12, 2025

heads up, sorry for the breaking change but there will be a refactoring (just moving stuff around) in #17965

after finishing with this refactoring (and after you done testing on your side), I'll go back to deepseek-ocr

@sfallah
Copy link
Contributor Author

sfallah commented Dec 13, 2025

@ngxson

heads up, sorry for the breaking change but there will be a refactoring (just moving stuff around) in #17965

after finishing with this refactoring (and after you done testing on your side), I'll go back to deepseek-ocr

Merge with #17965 is done.
I have also added deepseek-ocr to tests.sh.
As far my tests goes, it works, but the python test script is not done yet.
I will finish the test script tomorrow.

python test script for deepseek-ocr
testing OCR on text-1.jpeg newspaper image
checking against expected reference model output for Free-OCR and Markdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml changes relating to the ggml tensor library for machine learning model Model specific Nvidia GPU Issues specific to Nvidia GPUs python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants