You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dian SUN edited this page Jul 29, 2020
·
3 revisions
Welcome to the SCALPEL-Analysis wiki!
This will guide you to understand and fully use the SCALPEL-Analysis library.
SCALPEL: A Scalable Pipeline
As SCALPEL-Flattening and SCALPEL-Extraction perform batch operations, they need to read (resp. write) input (resp. output) data from the file-system (local or HDFS). They are implemented in Scala in order to access Spark's low-level API and take advantage of functional programming and static typing, resulting in rigorous automated testing (94% of the Scala code is covered by unit tests). Both can be configured through textual configuration files or be used as libraries. SCALPEL-Analysis is a python module implemented in Python/PySpark and designed for interactive use. It can be used in a Jupyter notebook for instance. This workflow is illustrated in following Fig.