Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18247
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 251 PendingAs of commit 2e04f71 with merge base eb92cec ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA example LoRA linear implementation to call an nn.Linear module in forward() (instead of torch.nn.functional.linear), presumably to align with module-based patterns.
Changes:
- Replace
torch.nn.functional.linear(x, self.weight, self.bias)withself.linear(x). - Introduce
self.linear = nn.Linear(...)duringLoRALinearinitialization.
Comments suppressed due to low confidence (1)
examples/models/llama/lora.py:36
linearandweightare now undefined after switching toself.linear;bias = linear.bias ...andregister_parameter("weight", ...)will raise at init time. Either derivebias/weightfromself.linear(or remove the extraregister_parametercalls entirely) so construction works.
self.linear = nn.Linear(in_dim, out_dim, bias=use_bias)
bias = linear.bias if self.use_bias else None
self.register_parameter("weight", nn.Parameter(weight))
self.register_parameter(
"bias", nn.Parameter(bias) if bias is not None else None
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Updates the Llama LoRA implementation to call an nn.Linear submodule directly (instead of torch.nn.functional.linear) and simplifies quantization filtering accordingly.
Changes:
- Refactors
LoRALinearto wrap a realnn.Linearmodule and use it inforward. - Adds partial state-dict backward compatibility by remapping the legacy
weightkey. - Simplifies the 8da*w quantization
filter_fnto only considernn.Linearmodules.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies the 8da4w/8da8w quantization filter to target nn.Linear modules only. |
| examples/models/llama/lora.py | Refactors LoRALinear to delegate to an internal nn.Linear and updates forward/state-dict behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
45fd1f3 to
1ebb7d7
Compare
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA LoRA implementation to wrap a real nn.Linear submodule (instead of calling torch.nn.functional.linear directly), enabling TorchAO quantization tooling to recognize and quantize the base linear layer consistently across LoRA and non-LoRA models.
Changes:
- Refactors
LoRALinearto containself.linear: nn.Linear, addsweight/biasproperties for backward-compatible access, and remaps old checkpoint keys on load. - Simplifies TorchAO 8da*xw quantization filtering to target
nn.Linearmodules only (no special-casing LoRA modules). - Makes XNNPACK constant serialization keys content-hash-based (removes tensor-name prefix) and updates the LoRA CI expectation string accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies TorchAO quantization filter to match nn.Linear modules and apply group-size compatibility logic. |
| examples/models/llama/lora.py | Refactors LoRALinear to wrap an nn.Linear submodule; adds BC weight/bias accessors and state-dict key remapping. |
| backends/xnnpack/operators/node_visitor.py | Changes named constant key generation to be solely SHA256-based to stabilize dedup/indexing behavior. |
| .ci/scripts/test_lora.sh | Updates expected output prefix text for the quantized LoRA test case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| self.use_bias = use_bias | ||
| self.dropout = dropout | ||
|
|
||
| linear = nn.Linear(in_dim, out_dim, bias=use_bias) | ||
| weight = linear.weight | ||
| bias = linear.bias if self.use_bias else None | ||
| self.register_parameter("weight", nn.Parameter(weight)) | ||
| self.register_parameter( | ||
| "bias", nn.Parameter(bias) if bias is not None else None | ||
| ) | ||
|
|
||
| self.linear = nn.Linear(in_dim, out_dim, bias=use_bias) | ||
| self.dropout = nn.Dropout(p=dropout) if dropout > 0.0 else nn.Identity() | ||
| self.lora_a = nn.Linear(in_features=in_dim, out_features=rank, bias=False) |
Summary
Update lora def to use nn.linear instead of torch.nn.functional.linear
executorch/examples/models/llama/static_attention.py
Line 1154 in eb92cec
Test plan
CI