Skip to content

Issue#733 reduce compilation overhead#787

Open
overlorde wants to merge 5 commits into
binpash:mainfrom
overlorde:gh733-reduce-compilation-overhead
Open

Issue#733 reduce compilation overhead#787
overlorde wants to merge 5 commits into
binpash:mainfrom
overlorde:gh733-reduce-compilation-overhead

Conversation

@overlorde

@overlorde overlorde commented Apr 17, 2026

Copy link
Copy Markdown

Hello All, I am trying to solve this issue #733 , please let me know about the modifications i need to do / feedbacks.
I tried to sovle ir.py:546 comments through directly constructing shasta types and bypassing deepcopy to gain some performance boost.

  • Added cProfile around compile_ir() in compilation_server.py to bottlenecks - round trips ir.py:547, ir_to_ast.py:114, ir_to_ast.py:151
  • Replace make_kv("Tag", [...]) dict construction + to_ast_node(...) round-trips with direct shasta typed constructors
  • Bypass CommandInvocationWithIOVars.__init__ deepcopy at 8 call sites via __new__ around ~100 ms reduction on top of shasta type
  • Result: 4-6x faster compilation at width=200 on heavy one-liners (e.g., nfa-regex drops from 2675ms to 421ms), and
  • evaluation/tests/test_evaluation_scripts.sh passes 100/100.
  • Bypass __init__ is safe — no caller mutates these containers after passing them in. Proper fix is to remove the deepcopy in pash-annotations itself.

speedup gained for nfa-regex.sh

nfa-regex.sh — compile time (--dry_run_compiler, 3-run avg)

Width Original (ms) PR (ms) Speedup
2 42.8 10.2 4.21x
16 218.9 36.3 6.02x
64 834.4 132.7 6.29x
200 2674.9 421.4 6.35x

@github-actions

Copy link
Copy Markdown

OS:ubuntu-24.04
Fri Apr 17 00:36:55 UTC 2026
intro: 3/3 tests passed.
interface: 43/43 tests passed.
compiler: 100/100 tests passed.

Signed-off-by: Farhan Saif <fsaif@uic.edu>
Signed-off-by: Farhan Saif <fsaif@uic.edu>
Signed-off-by: Farhan Saif <fsaif@uic.edu>
@overlorde overlorde force-pushed the gh733-reduce-compilation-overhead branch from 976ec26 to 30cab20 Compare April 17, 2026 13:18
@github-actions

Copy link
Copy Markdown

OS:ubuntu-24.04
Fri Apr 17 13:23:07 UTC 2026
intro: 3/3 tests passed.
interface: 43/43 tests passed.
compiler: 100/100 tests passed.

@angelhof angelhof left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on these changes! Can you fix/respond to my points? In particular I would really like to avoid not using the normal Python object initialization. There must be a way to avoid the deepcopy there right?

Comment thread src/pash/compiler/ir.py Outdated
@@ -544,8 +555,8 @@ def to_ast(self, drain_streams) -> "list[AstNode]":
asts.append(assignment)

## TODO: Ideally we would like to make them as typed nodes already

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment and the solved now that it is OK :)

operand_list.extend(out_ids)
access_map = {output_id: make_stream_output() for output_id in out_ids}
access_map[input_id] = make_stream_input()
"""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely remove this code instead of commenting it out! (similarly in all other places!

"""


# Skip __init__ to avoid its deepcopy; inputs are freshly constructed here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other way to avoid the deepcopy without avoiding the standard init? This seems a bit janky to me to not use Python's normal object initialization. There must be a way to do this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angelhof one way on top of my mind is runtime modification of the __init__ , replacing it with fast_init in pash. The existing initialization codes around the pash repo would stay the same. To have a more stable fix, need to go into the pash annotations repo, but the scope becomes larger then, and side effects needs to be checked. Runtime modification / monkeypatching seems like a way where we can use pythons normal object initialization with minimal code changes. I am adding modifications in a while.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper fix requires modifying the pash annotations repo, which keeps the scope lower than bypassing Python's init mechanism :) I think we should only accept a fix that requires bypassing init if the gains are on the critical path and very very significant (compilation is not on the critical path, so improving it but making maintenance harder is not worth it IMO).

@overlorde overlorde Apr 20, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nfa-regex.sh — compile time (--dry_run_compiler, 3-run avg - with _fast_init)

Width Original (ms) PR (ms) Speedup
2 42.33 6.12 6.92x
16 214.51 5.92 36.23x
64 827.98 6.95 119.13x
200 2615.91 7.32 357.37x

The speedup is way more than before because of the removal of deepcopy from _init_ - bypassing with _fast_init_ - this time its global cause all instances of __init__ was modified during start/runtime. Same modification on pash annotations repo should also give the same result. As per the last comment, proper fix would be modifying the pash annotations repo, which I agree.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted back to the original init - with shasta typed objects, comments are addressed. Need an integrated test to check both.

@github-actions

Copy link
Copy Markdown

OS:ubuntu-24.04
Wed Apr 22 17:04:28 UTC 2026
intro: 3/3 tests passed.
interface: 43/43 tests passed.
compiler: 100/100 tests passed.

Signed-off-by: Farhan Saif <fsaif@uic.edu>
@overlorde

Copy link
Copy Markdown
Author

Back to original __init__, comments addressed.

@github-actions

Copy link
Copy Markdown

OS:ubuntu-24.04
Wed Apr 22 17:10:16 UTC 2026
intro: 3/3 tests passed.
interface: 43/43 tests passed.
compiler: 100/100 tests passed.

@overlorde overlorde requested a review from angelhof April 22, 2026 17:21
@angelhof

Copy link
Copy Markdown
Member

Great, do we get any speedups now? If not, what is the expected path to get back the deepcopy speedup? What needs to happen in the annotation repo?

@overlorde

Copy link
Copy Markdown
Author

Great, do we get any speedups now? If not, what is the expected path to get back the deepcopy speedup? What needs to happen in the annotation repo?

Yes, we still get speedups from converting to shasta typed objects, below are the ones:

Script | Width | Orig | PR | Speedup
nfa-regex.sh | 2 | 42.7 | 8.5 | 5.03x
nfa-regex.sh | 16 | 215.3 | 43.8 | 4.92x
nfa-regex.sh | 64 | 828.4 | 150.1 | 5.52x
nfa-regex.sh | 200 | 2626.7 | 510.9| 5.14x

The deepcopy speedup is more.
The expected path to get deepcopy speedup would be modifying CommandInvocationWithIOVars.py from annotations repo : line 26 to 30 - turn to shallow copy from deepcopy - except for streaming input/outputs - they can be directly assigned. That would lead to rethink how regression tests would work out for the annotations repo/need to write test cases for new modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants