Skip to content

Numba typing errors #15

Description

@Drzwioddomu

Hi!

I'm trying to reproduce the experiment in your readme, but I keep getting numba errors that are not very descriptive.

My code:

from gefs import RandomForest
from experiments.prep import get_data, train_test_split

data, ncat = get_data('wine')
X_train, X_test, y_train, y_test, data_train, data_test = train_test_split(data, ncat)
rf = RandomForest(n_estimators=30, ncat=ncat)
rf.fit(X_train, y_train)
gef = rf.topc()

Traceback:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "***/test_gefs.py", line 7, in <module>
    rf.fit(X_train, y_train)
  File "***/gefs/trees.py", line 533, in fit
    self.estimators = build_forest(X, y, self.n_estimators, self.bootstrap,
  File "/opt/conda/lib/python3.9/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/opt/conda/lib/python3.9/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>) found for signature:

 >>> choice(array(int64, 1d, C), OptionalType(int64), replace=Literal[bool](False))

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function 'choice': File: numba/cpython/randomimpl.py: Line 1360.
    With argument(s): '(array(int64, 1d, C), OptionalType(int64), replace=bool)':
   Rejected as the implementation raised a specific error:
     TypingError: Failed in nopython mode pipeline (step: nopython frontend)
   No implementation of function Function(<built-in function empty>) found for signature:

    >>> empty(OptionalType(int64), class(int64))

   There are 2 candidate implementations:
         - Of which 2 did not match due to:
         Overload in function 'ol_np_empty': File: numba/np/arrayobj.py: Line 4086.
           With argument(s): '(OptionalType(int64), class(int64))':
          Rejected as the implementation raised a specific error:
            TypingError: Cannot parse input types to function np.empty(OptionalType(int64), class(int64))
     raised from /opt/conda/lib/python3.9/site-packages/numba/np/arrayobj.py:4105
   
   During: resolving callee type: Function(<built-in function empty>)
   During: typing of call at /opt/conda/lib/python3.9/site-packages/numba/cpython/randomimpl.py (1417)
   
   
   File "../../../../../../opt/conda/lib/python3.9/site-packages/numba/cpython/randomimpl.py", line 1417:
           def choice_impl(a, size=None, replace=True):
               <source elided>
               if replace:
                   out = np.empty(size, dtype)
                   ^

  raised from /opt/conda/lib/python3.9/site-packages/numba/core/typeinfer.py:1086

During: resolving callee type: Function(<built-in method choice of numpy.random.mtrand.RandomState object at 0x7f2c36da0940>)
During: typing of call at ***/gefs/split.py (145)


File "gefs/split.py", line 145:
def find_best_split(node, tree, random_state):
    <source elided>
    np.random.seed(random_state)
    vars = np.random.choice(np.arange(tree.X.shape[1]), tree.max_features, replace=False)
    ^

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)

During: resolving callee type: type(CPUDispatcher(<function find_best_split at 0x7f2b8ca93ee0>))
During: typing of call at ***/gefs/trees.py (132)


File "gefs/trees.py", line 132:
def build_tree(tree, parent, counts, ordered_ids):
    <source elided>
        node = queue.pop(0)
        split = find_best_split(node, tree, np.random.randint(1e6))
        ^

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/trees.py (465)

During: resolving callee type: type(CPUDispatcher(<function build_tree at 0x7f2b8ca9f700>))
During: typing of call at ***/gefs/trees.py (465)


File "gefs/trees.py", line 465:
    def fit(self, X, y):
        <source elided>
        ordered_ids = np.arange(X.shape[0], dtype=np.int64)
        self.root, self.n_nodes = build_tree(self, None, counts, ordered_ids)
        ^

- Resolution failure for non-literal arguments:
None

During: resolving callee type: BoundFunction((<class 'numba.core.types.misc.ClassInstanceType'>, 'fit') for instance.jitclass.Tree#7f2b8caad490<X:OptionalType(array(float64, 2d, A)),y:OptionalType(array(int64, 1d, A)),ncat:OptionalType(array(int64, 1d, A)),scope:OptionalType(array(int64, 1d, A)),imp_measure:unicode_type,min_samples_leaf:int64,min_samples_split:int64,n_classes:int64,max_features:OptionalType(int64),n_nodes:int64,root:instance.jitclass.TreeNode#7f2b8caa6b80<id:int64,counts:array(int64, 1d, A),idx:array(int64, 1d, A),split:OptionalType(instance.jitclass.Split#7f2b8ca89bb0<score:float64,var:int64,threshold:array(float64, 1d, A),surr_var:array(int64, 1d, A),surr_thr:array(float64, 1d, A),surr_go_left:array(bool, 1d, A),surr_blind:bool,left_ids:array(int64, 1d, A),right_ids:array(int64, 1d, A),left_counts:array(int64, 1d, A),right_counts:array(int64, 1d, A),type:unicode_type>),parent:OptionalType(DeferredType#139825020508336),left_child:OptionalType(DeferredType#139825020508336),right_child:OptionalType(DeferredType#139825020508336),isleaf:OptionalType(bool),depth:int16>,depth:int16,max_depth:int64,surrogate:bool,random_state:int64>)
During: typing of call at ***/gefs/trees.py (179)


File "gefs/trees.py", line 179:
def build_forest(X, y, n_estimators, bootstrap, ncat, imp_measure,
    <source elided>
                                               estimators[i].random_state)
            estimators[i].fit(Xtree_, ytree_)

My guess is that it might happen because some dependencies got updated. I'm running the code in a conda environment with the following versions installed:

numba                     0.56.3
numpy                     1.22.3 
pandas                    1.4.2
scipy                     1.9.0
sklearn                   1.1.2
tqdm                      4.64.0 

Could you possibly upload a solved environment or a freeze with specific package versions that allow to execute the code properly?

BR,
Maurycy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions