[$ShortWalk$: an approach to network embedding on directed graphs](https://link.springer.com/article/10.1007/s13278-020-00714-y) **Fen Zhao**, Yi Zhang & Jianguo Lu In a network embedding algorithm, long random walk is used to transform the graph into `text' so that node embeddings can be learned by Skip-gram with Negative Sampling (SGNS) model. However, in a directed graph, long random walks can be trapped or interrupted, leading to low-quality embeddings. This paper proposes ShortWalk to improve the directed graph network embeddings. ShortWalk performs short random walk that restarts more frequently thus produces short traces than long random walks. It also gives nodes equal weights by generating the training pairs by taking the pairwise combination of nodes on the traces. We validate our method on eight directed graphs with different sizes and structures. Experimental results suggest ShortWalk outperforms DeepWalk consistently on all datasets in node classification and link prediction tasks. ## Implementation You can download our implementation [here](http://zhang18f.myweb.cs.uwindsor.ca/shortwalk/shortwalk.tar.gz). The codes are written in Cython, we recommend Intel Python distribution for better performance. You can compile the code using the following command: ``` python3 setup.py build_ext --inplace ``` Note: If you compile under Mac OS, please make sure you are using GCC(GNU compiler collection) instead of system build-in compiler (clang LLVM compile) You can install GCC via `brew install gcc` Then make is default compiler by ``` export CC=gcc-8 export CXX=g++-8 ``` ### Filelist The provided implementation has three files: 1. google.ipynb -- An example in Jupyter Notebook using google dataset. 2. model.pyx Implementation for $ShortWalk$. The code is written in Cython. 3. setup.py The compliation configuration file for Cython model. ## Dataset More dataset can be found [here](http://zhang18f.myweb.cs.uwindsor.ca/datasets/) |Dataset |# Nodes |# edges | Avg degree | Avg shortest path | # Triangles | # Labels | |--------------------:|-----------:|-------------------:|------------------:|------------:|-----------:|----------:| |CiteSeer | 2,110 | 3,757 | 1.78 | 1.52 | 1,083 | 6 | |Cora | 2,485 | 5,209 | 2.10 | 4.57 | 1,558 | 7 | |wiki Vote | 7,066 | 103,663 | 14.67 | 3.34 | 608,389 | -- | |Google | 15,763 | 171,206 | 10.86 | 6.33 | 591,156 | 2 | |PubMed | 19,717 | 44,338 | 2.25 | 4.32 | 12,520 | 3 | |Cora Citation | 23,166 | 91,500 | 3.95 | 13.82 | 78,791 | 10 | |Web BerkStan | 654,782 | 7,499,425 | 11.45 | 13.75 | 64,520,617 | -- | |AMinerV8 | 766,059 | 4,181,905 | 5.46 | -- | -- | 11 | |AMiner APN | 2,283,309 | 14,949,187 | 6.55 | -- | -- | 7 | |Meta APN | 58,703,242 |618,675,900 | 10.54 | -- | -- | 2 |