fix(python): improve taint receiver resolution#126
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the Python analyzer by adding support for Tornado input taint flow, improving file UUID resolution, and refining the handling of function receivers and member call closures. It also ensures that Python classes without explicit init methods are correctly instantiated. Feedback indicates that the change to member access matching logic causes a regression in existing tests and that the new symbol table lookup function has performance concerns due to its O(N*M) complexity.
| switch (node.type) { | ||
| case 'MemberAccess': { | ||
| if (!matchPrefix(el, node.property.name)) return false | ||
| if (i === 0) return true |
There was a problem hiding this comment.
This change, which allows suffix matching on member access chains, appears to break an existing test case in test/rules-basic-handler/test-match-field.ts. The test fsig "c" 不匹配 x().c expects matches(member(call(id('x')), 'c'), 'c') to be false, but this change will cause it to return true. This indicates a potential regression or that the test suite has not been updated to reflect this intentional change in matching logic. Please update the tests to align with this new behavior or reconsider the change if it has unintended consequences.
| resolveByQidFromSymbolTable(qid: any): any { | ||
| if (!qid || typeof qid !== 'string') return null | ||
| const symbolMap = this.symbolTable?.getMap?.() | ||
| if (!symbolMap) return null | ||
|
|
||
| for (const val of symbolMap.values()) { | ||
| if (val?.qid === qid || val?.sid === qid) return val | ||
| } | ||
|
|
||
| const parts = qid.split('.').filter(Boolean) | ||
| for (let i = parts.length - 1; i > 0; i--) { | ||
| const baseQid = parts.slice(0, i).join('.') | ||
| let current: any = null | ||
| for (const val of symbolMap.values()) { | ||
| if (val?.qid === baseQid || val?.sid === baseQid) { | ||
| current = val | ||
| break | ||
| } | ||
| } | ||
| if (!current) continue | ||
| for (const fieldName of parts.slice(i)) { | ||
| current = | ||
| current?.value?.[fieldName] || | ||
| current?.members?.get?.(fieldName) || | ||
| current?.getFieldValue?.(fieldName, false) | ||
| if (!current) break | ||
| } | ||
| if (current) return current | ||
| } | ||
|
|
||
| return null | ||
| } |
There was a problem hiding this comment.
The function resolveByQidFromSymbolTable contains a nested loop structure that iterates over all values in symbolMap within another loop. This can lead to significant performance degradation, especially when the symbol table is large, as the complexity would be O(N*M) where N is the number of parts in the qid and M is the number of symbols. This function is also called multiple times in recoverMemberCallClosure, potentially amplifying the performance impact. Consider optimizing this by using a more direct lookup method if available, or by creating an index for qid/sid to avoid iterating through all symbols repeatedly.
No description provided.