Update values to int64 #13548

teonbrooks · 2025-12-16T18:56:29Z

Fix attempt as fixing the overflow issue in the read_raw_cnt reader. This error has manifested with numpy upgrade.

Reference issue

Fixes #13547.

What does this implement/fix?

This follows a pattern suggested in #12907 to cast the integer to int64.

larsoner · 2025-12-16T19:05:27Z

To read your file it needs a few more fixes actually... I'll push

larsoner · 2025-12-16T19:16:47Z

Definitely still something wrong here...

$ python -uic "import mne; raw = mne.io.read_raw_cnt('~/Desktop/945flankers_ready.cnt', data_format='int16').load_data(); raw.plot(annotation_regex='aaa')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import mne; raw = mne.io.read_raw_cnt('~/Desktop/945flankers_ready.cnt', data_format='int16').load_data(); raw.plot(annotation_regex='aaa')
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "<decorator-gen-190>", line 12, in load_data
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 589, in load_data
    self._preload_data(True)
    ~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 601, in _preload_data
    self._data = self._read_segment(data_buffer=data_buffer)
                 ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<decorator-gen-189>", line 12, in _read_segment
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 420, in _read_segment
    data = _allocate_data(data_buffer, data_shape, dtype)
  File "/home/larsoner/python/mne-python/mne/io/base.py", line 2577, in _allocate_data
    data = np.zeros(shape, dtype)
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 2.06 TiB for an array with shape (66, 4294966564) and data type float64

Same error if I use data_format='int32'. If I remove the .load_data and use data_format='int32' the plot at least looks okay

So need to figure out the n_samples issue, 4294966564 samples for 66 channels is totally unreasonable for a 150MB file...

mne/io/cnt/_utils.py

larsoner · 2025-12-16T19:31:31Z

@teonbrooks I'm done pushing/looking for now, I hope the changes I made help debugging a bit more. Something is wrong with n_samples here, it gets read as 4294966564 ...

teonbrooks · 2025-12-17T14:39:37Z

thanks @larsoner!

came across this post and adding it here for reference in the future
https://paulbourke.net/dataformats/eeg/

teonbrooks · 2025-12-17T14:45:20Z

@teonbrooks I'm done pushing/looking for now, I hope the changes I made help debugging a bit more. Something is wrong with n_samples here, it gets read as 4294966564 ...

according to the link above, it looks like this is not an uncommon occurrence:

Experience has shown that many (most) of the fields are not filled out correctly by the software. In particular, the best way to work out the number of samples is

it looks like n_samples should be calculated as:

nsamples = SETUP.EventTablePos - (900 + 75 * nchannels) / (2 * nchannels)

larsoner · 2025-12-17T15:15:59Z

Great, can you add some comments / links in the code for the next time we dig into this, and try the suggested fix?

teonbrooks · 2025-12-20T00:36:23Z

added a note. after trying it out and I look more closely at the code, it looks as though the n_samples logic is already there starting at https://github.com/mne-tools/mne-python/blob/main/mne/io/cnt/cnt.py#L339.

teonbrooks · 2025-12-20T00:37:41Z

I actually don't know what to do about the n_samples. it looks like the code already is trying to best handle the data without knowing the data_format and with the header not having a reliable header entry for it.

larsoner · 2025-12-29T17:35:54Z

@teonbrooks do you want me to take a look?

it looks like the code already is trying to best handle the data without knowing the data_format and with the header not having a reliable header entry for it.

So we have two potential sources of truth:

n_samples, which is 4294966564 for this dataset
SETUP.EventTablePos - (900 + 75 * nchannels) / (2 * nchannels), which is presumably correct for this dataset (right?)

In main we assume (1) is going to be more correct so we use it incorrectly for this file. Does that sound right?

If so, maybe we should prefer to use (2) if it's available, since it's more likely to be correct.

We could add some parameter to control which of these to prefer, too, if needed. We can even make it like n_samples="computed" (new default, option 2 above) | "read" (default on main, option 1 above) and change the default without a deprecation cycle since I think we can consider this a bugfix given the unreliability of "read"

teonbrooks · 2026-01-02T19:00:52Z

@larsoner, yes, that would be a great help if you could take a look at it. and yep, I agree with your assessment on the number of samples

* upstream/main: BUG: Fix bug with error message check (mne-tools#13579) Add QC + Full MNE Report tutorial (mne-tools#13532) FIX: adding kit_system_id info to forward solution (mne-tools#13520) MAINT: Update code credit (mne-tools#13572) MAINT: Fix Circle (mne-tools#13574) DOC: Clarify read_raw_nirx expects directory path, not file path (mne-tools#13541)

larsoner · 2026-01-09T16:42:54Z

Yikes this is a bit of a nightmare. Looks like in the header for event_table_offset and n_samples:

Sometimes they are both correct, and consistent
Sometimes event_table_offset is correct and n_samples is incorrect (like in the linked docs)
Sometimes event_table_offset is incorrect and n_samples is correct (like in some of our test datasets)
event_table_offset will always be wrong for file sizes > 2 GB

So when I said before:

If so, maybe we should prefer to use (2) [event_table_offset-computed value] if it's available, since it's more likely to be correct.

I'm no longer convinced this is a good idea, given it's not even the case for our test datasets! To make things worse, all of this stuff interacts with data_format, which we allow to be "auto". So thinking about it more, how about we:

Add n_samples="header" (default) | "computed" where "computed" means "compute it using event_table_offset and data_format"
Only allow data_format="auto" when n_samples="header" or data_format != "auto" and file size < 2GB, because you can't figure out both data_format and n_samples given just an event_table_pos (and event_table_pos is only even potentially usable for file sizes < 2 GB) -- you can just as easily say there are 2 bytes per sample and X samples or 4 bytes per sample and X//2 samples.

Also, looking at the docs from https://paulbourke.net/dataformats/eeg/, their example file has the data offset (SETUP+ELECTLOC) at 5550, which runs to EventTablePos=8511950 for 62 channels. They say this span is n_samples * n_channels * 2 (so data_format="int16"). But if you take their math at the top nsamples = SETUP.EventTablePos - (900 + 75 * nchannels) / (2 * nchannels) you get a very wrong value 8511950-(900+75*62)/(2*62) = 8511905.241935484. Correcting their algebra to be what I think should be correct (EventTablePos-(900+75*n_channels))/(n_channels*n_bytes) we get a more reasonable 68675.0 (note the whole value on this float div, which is good / suggests possible correctness once we convert to integer arithmetic!)... but according to the doc itself the number of samples is 68600! So somehow there are 75 extra values here. I wondered if this could be the source of #11802, but I also see this is the data @teonbrooks shared -- computing the number of samples from the event_table_offset I see ~1865 samples toward the end of the data that are almost all (but not all!) zeros. Wouldn't be surprised if this is the result of something writing un-zeroed malloc'ed rather than calloc'ed data or something... in any case, we should revisit #11802 once we work out the solutions above since maybe we'll magically fix that issue, too.

@withmywoessner you've worked on this stuff a bit recently... WDYT?

xref previous nightmares #6535 #6537 #11802 #12393

teonbrooks · 2026-01-09T17:15:43Z

But if you take their math at the top nsamples = SETUP.EventTablePos - (900 + 75 * nchannels) / (2 * nchannels) you get a very wrong value 8511950-(900+75*62)/(2*62) = 8511905.241935484. Correcting their algebra to be what I think should be correct (EventTablePos-(900+75*n_channels))/(n_channels*n_bytes) we get a more reasonable 68675.0

this was exactly where I got stumped as well! trying to understand this gave me a massive headache 🤕

For setting the data_format for files >2GB, would this be a matter of the user having to try out both options to see if the data makes sense?

larsoner · 2026-01-09T17:17:35Z

For setting the data_format for files >2GB, would this be a matter of the user having to try out both options to see if the data makes sense?

Yeah I think so. It's not great but I think it's probably better that they be explicit. Hopefully it's pretty obvious by eye, your data looked completely wrong for int16 but reasonable for int32 for example

larsoner requested review from agramfort, dengemann, drammock and larsoner as code owners December 16, 2025 19:17

larsoner reviewed Dec 16, 2025

View reviewed changes

mne/io/cnt/_utils.py Show resolved Hide resolved

teonbrooks force-pushed the cnt-overflow-fix branch from a5c1f92 to 603dd01 Compare December 20, 2025 00:36

teonbrooks force-pushed the cnt-overflow-fix branch from 603dd01 to 66c24ca Compare December 20, 2025 00:44

teonbrooks and others added 8 commits January 2, 2026 14:01

Update values to int64

dbf6fd2

FIX: Numbers

4169a00

[autofix.ci] apply automated fixes

61c4a8d

FIX: Unify

272ba47

Remove code used for debugging

f7d648e

added note

a71392a

revert back to "<i4" for n_samples

a762526

formatting

74d66ba

teonbrooks force-pushed the cnt-overflow-fix branch from 66c24ca to 74d66ba Compare January 2, 2026 19:01

larsoner added 3 commits January 8, 2026 09:50

TST: Skip

246da2f

TST: Skip [ci skip]

5191067

larsoner added this to the 1.12 milestone Jan 9, 2026

Uh oh!

Update values to int64 #13548

Are you sure you want to change the base?

Update values to int64 #13548

Conversation

teonbrooks commented Dec 16, 2025

Reference issue

What does this implement/fix?

Uh oh!

larsoner commented Dec 16, 2025

Uh oh!

larsoner commented Dec 16, 2025

Uh oh!

Uh oh!

larsoner commented Dec 16, 2025

Uh oh!

teonbrooks commented Dec 17, 2025

Uh oh!

teonbrooks commented Dec 17, 2025

Uh oh!

larsoner commented Dec 17, 2025

Uh oh!

teonbrooks commented Dec 20, 2025

Uh oh!

teonbrooks commented Dec 20, 2025

Uh oh!

larsoner commented Dec 29, 2025

Uh oh!

teonbrooks commented Jan 2, 2026

Uh oh!

larsoner commented Jan 9, 2026

Uh oh!

teonbrooks commented Jan 9, 2026

Uh oh!

larsoner commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants