MATLAB implementation of a supervised missing-link prediction pipeline that uses an Ant-Colony-Optimization-derived pheromone (PH) feature, combined with the classical Common Neighbor (CN) and Node-Link Clustering (NLC) features, on eight real-world complex networks.
acolp-link-prediction/
βββ README.md β you are here
βββ CITATION.cff β machine-readable citation
βββ LICENSE β MIT (code)
βββ .gitignore
βββ data/ β 8 benchmark networks (see data/README.md)
βββ Supervised/ β supervised pipeline (28 files)
β βββ Main.m β entry point
βββ Unsupervised/ β unsupervised baselines (12 files)
βββ Unsupervised_CN_V1.m β CN baseline
βββ Unsupervised_NLC_V0.m β NLC baseline
The two implementation folders are kept self-contained on purpose: a few
filenames are shared between them (adj_gen.m, CN.m, NLC.m,
sort_column1.m, β¦) but the supervised and unsupervised versions are
not identical, for example, sort_column1.m sorts ascending for
triangle search in the supervised code and descending for top-L ranking
in the unsupervised code. Keep them separate when you run experiments.
See Supervised/README.md and
Unsupervised/README.md for run instructions.
For each candidate node pair, we compute three structural features: CN (number of common neighbours), NLC (node-link clustering coefficient, Wu et al. 2016), and PH (a pheromone score derived from an Ant-Colony triangle search; see Β§3.3 of the paper). Class imbalance is handled with ADASYN oversampling, the candidate pool is restricted to node pairs at graph distance β€ 2, and a linear-kernel SVM is trained on a 10-fold cross-validation split. Performance is reported in terms of Recall, F1-measure, and AUP (area under the precision-recall curve).
The PH feature is the main contribution. Its design is inspired by the
unsupervised ACO link prediction of Sherkat et al. (2015), but here the
pheromone values are not used to score links directly, they are used as
a feature that a downstream classifier can combine with other
indices. The combined feature vector SVM(NLC, CN, PH) gives the best
result on the largest benchmark (PolBlogs, AUP β 0.90).
You will need MATLAB R2018b or newer with the Statistics and Machine
Learning Toolbox (for fitcsvm, perfcurve, crossvalind) and the
graph functions (graph, adjacency, degree).
% From the repo root, add the code and data to the path
addpath(genpath('Supervised'));
addpath(genpath('Unsupervised'));
addpath('data');
% Run the supervised pipeline (default dataset is 'polblogs')
cd Supervised
MainTo switch datasets, edit the dataset variable near the top of
Main.m. Supported values:
'Karate', 'Dolphins', 'Football', 'NS', % small-scale
'Jazz', 'celegans', 'USAir', 'polblogs' % large-scale
To turn features on/off, edit the three flags in Main.m:
NLC_feat = 1; CN_feat = 0; ACO_feat = 1;Eight publicly available real-world networks are bundled in data/.
Full attribution, edge counts, and reference papers are listed in
data/README.md. Source: SNAP, Newman's network data
collection, KONECT, and the network repository.
If you use this code or the PH feature in your own work, please cite
@misc{ansari2021acolp_code, author = {Ansari, Soheila}, title = {{ACO-LP}: Supervised Link Prediction via Ant-Colony Pheromone Features (MATLAB)}, year = {2018}, howpublished = {{https://github.com//acolp-link-prediction}} }
A `CITATION.cff` file is provided at the repo root so GitHub will show a
**"Cite this repository"** button automatically.
---
## Acknowledgments
This work builds on two prior ideas that should be cited alongside
the code:
* **Sherkat, E., Rahgozar, M., Asadpour, M.** (2015).
*Structural link prediction based on ant colony approach in social
networks.* Physica A 419, 80β94. β origin of the (a)/(b) triangle
sub-graph idea that the PH feature builds on.
* **Wu, Z., Lin, Y., Wan, H., Jamil, W.** (2016).
*Predicting top-L missing links with node and link clustering
information in large-scale networks.* J. Stat. Mech. 083202. β
origin of the NLC feature.
The ADASYN oversampling implementation in `Supervised/ADASYN.m`
follows He, Bai, Garcia & Li (2008).