-
Notifications
You must be signed in to change notification settings - Fork 40
OMC: init heap/hint table across different shard #1171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
need to figure out soundness #1178 |
|
TODO
|
d047dde to
b610c86
Compare
b610c86 to
0c7475f
Compare
kunxian-xia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st round of review
| } else { | ||
| system_config | ||
| .mmu_config | ||
| .assign_init_table_circuit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact we can just avoid this step as we only assign static init tables in the 0th shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although literally we can omit this in shard id > 0 but in some place we need all chip presented with empty RMM so we still need it, e.g. remove this cause mock_prover failed
| // │ │ └─ later rw? YES (rw in >0 exists) -> ShardRAM | ||
| // │ │ | ||
| // │ └─ rw occurs in current shard (current shard may be >0) | ||
| // │ ├─ later rw? NO (no rw in later) -> ShardRAM + LocalFinalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? It should be "LocalFinalize".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ShardRAM + LocalFinalize" correct, for example 0th shard is previous shard, then need to global read from ShardRAM
| // │ └─ later rw? YES -> ShardRAM | ||
| // │ | ||
| // └─ NO: init in a previous shard | ||
| // ├─ later rw? NO -> LocalFinalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO will update this to later rw? NO -> ShardRAM + LocalFinalize
kunxian-xia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Follow up on #1171 replace `rkyv` with custom serde for 2 reasons - serialization to bytes need to implement `rkyv` friendly struct, or leverage external library, e.g. "rkyv + bincode". `rkyv` friendly struct is not friendly for existing application in particular struct are defined in 3rd-party library. Thus previously we integrate bincode for the serialiation to bytes. This bring extra effort for guest program as we need bincode to deserialize back to owned struct. - in deserialize `rkyv` will access high addr to retrieve some meta information before sequential read. Below are example read pattern ``` hint address 28000004 hint address 28000008 hint address 280c8770 <- high addr accessed to fetch meta data hint address 280c876c hint address 280c876c hint address 280c8770 hint address 28000010 hint address 28000010 hint address 28000010 hint address 28000010 hint address 28000014 hint address 28000014 hint address 28000014 ... ``` The high order access not friendly if we want to record max accessed address in each shard for memory region initialized across shard. New `ceno_serde` crates credits to https://github.com/openvm-org/openvm/tree/main/crates/toolchain/openvm/src/serde ## benchmark Run on 23817600 with K=6, cache trace = none, 1x4090 | Component | base Time | Init-Across-Shard Time | Custom Serde Time | Init-Across-Shard Improvement | Custom Serde Improvement | |--------------------------------------|-----------|-------------------------|-------------------|-------------------------------|--------------------------| | app.prove | 203 s | 175 s | 160 s | ↓ 13.8% | ↓ 21.2% | | emulator.preflight-execute | 43.9 s | 35.8 s | 36.6 s | ↓ 18.5% | ↓ 16.6% | | app_prove.inner | 158 s | 139 s | 123 s | ↓ 12.0% | ↓ 22.2% | | create_proof_of_shard (shard_id = 0) | 14.9 s | 4.91 s | 8.46 s | ↓ 67.0% | ↓ 43.2% | | create_proof_of_shard (shard_id = 1) | 16.4 s | 12.7 s | 3.74 s | ↓ 22.6% | ↓ 77.2% | Overall cycle `322620252` -> `315955342` (2.06%) > Custom Serde updated setting: max cell: (1 << 30) * 8 / 4 / 2 ------> (1 << 30) * 10 / 4 / 2 keccak blowup K=6 `unchange`
Related to #1145
Build on top of #1202
design rationale
Make init chip across shard, and assure consecutive address across shards without overlap. The mechanism to read from
previousshard works as usual.remove pitfall of
Platformclonewrap
Platform -> prog_datainto Arc otherwise each platform clone with a massive and hidden prover cost which we do not aware.benchmark
Run on
23817600with K=6, cache trace = none, 1x4090verifier sound TODOs in next PR
This PR got some verifier change, but not fully sound yet. A refactor is need to constrain specific chip number of instance is on the way, in short to support this
pi[<chip_id>].num_instance == chip_proof.num_instance. This is needed and assure init heap/hint in each shard are constrain within [platform.heap.start, platform.heap.end) or [platform.hint.start, platform.hint.end)pi[heap_length],pi[hint_length]equal to number of instance of respective init chip.pi[xxx_start_addr]+pi[xxx_length]within rangeAnother TODO for
Hintread andrkyzrkyzserialized data from hint_end -> hint_start, so right now hint still mostly happened in shard one because in first readmax_hint_addrgoes from end address. This make first shard still need to write bunch of records toshard ram circuit. To further improve this, we need to find way, e.g. make rykv go from lower address to high. So the first shard issue can be solved entirely