TABLE OF CONTENTS (H)

SOFTWARE

SumSweep [P1,P5,P9,P10]

SumSweep is the first bound-refinement algorithm that is able to find diameter and radius in directed, not necessarily strongly connected graphs. This algorithm not only is more general than all previous counterparts, but it also outperforms them on directed, strongly connected graphs or undirected graphs. The algorithm is based on a clever combination of the sweep approach with an extension of the bound-refinement techniques developed by Takes and Kosters, in order to compute the diameter of undirected graphs. The Java code for SumSweep along with some sample graphs can be downloaded from the publication section of Michele's home page.

Distance distribution [P2]

You can get the source code used in our experiments, written in ANSI C, as a single tar-gzipped archive. The source code of ANF - Approximate Neighborhood Function - by Palmer et al. (see references, and copyright of the authors in their code), is available here. We provide only a patch file, automatically managed by our Makefile, to make the output consistent with the other methods. Extracting distdist.tar.gz, you create a folder called distdist. If you want to use the method ANF, save anf-1.0.tar.gz inside it, and use the make utility to do all the stuff (extract anf-1.0.tar.gz, patch it and compile the source), simply typing make at the shell prompt. Running the executable with the -h option, you can get a brief explanation on how to pass options and the datafile. Note that the option -g exports the results in a multicolumn formatted text file, where each experiment corresponds to a column, with elements equal to the approximate distance distribution values, at increasing distance starting from  0. The first line is an example of command to plot it with gnuplot. Redirecting the standard output to a file, it can be further processed. Indeed, with AnalysisDD.java you can perform also the error analysis. Use:
javac AnalysisDD.java
to obtain the .class file. For error evaluation, just follow the output of:
java AnalysisDD -h

Collaboration networks [P3]

The main goal of the research on social networks is to understand the processes taking place on the networks themselves. We focus our attention on collaboration networks in which nodes represent entities taking part to some collaboration act, and edges represent the fact that two entities have taken part to at least one common act. In particular, we analyse six different topological properties of the networks (and fourteen quantities related to these properties) in order to represent the idea of closeness between real and simulated data. We experimentally evaluate how well three different evolution models are able to mimic these properties. The three considered models differ only for the recommendation mechanism they use, so that our analysis might suggest what is the recommendation process that takes place in collaboration networks: in other words, who recommends whom when a new collaboration act is initiated.
  • Dataset: our collaboration networks come from different public sources on the Web, and have been preprocessed in order to execute a time-sorted remapping of the identifiers of the projects. Compressed directories with original and preprocessed networks data files can be downloaded here.
  • Source code: for each evolution model, the goal of our experiments has been to determine the best values of their parameters, by analysing six different topological properties and fourteen different measures related to these properties. The free parameters for the three models are the preferential attachment exponent π and the depth factor α in the case of the SI model, π and the recommendation factor ρ in the case of the RF model, and the burning probability β in the case of the FF model. For each network, we have created an equally spaced grid of possible values of the parameters of the model: for instance, for the preferential attachment exponent π we have considered the values 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, and 2, while for the recommendation factor ρ we have considered the values 0, 0.2, 0.4, 0.6, 0.8, and 1. The following zip archives respectively contain C source code, bash and awk scripts for simulations execution, topological measurements on simulated networks, and closeness analysis between simulated and real networks.
  • AOF table: differences between measured quantities of real and simulated networks are obtained by making use of these error definitions
    • mean relative error (MRE)
    • mean quadratic error (MQE)
    Both definitions were applied to the following measures
    • The average MRE of the number of vertices of the greatest connected component,
    • the average and maximum MRE, and the relative error on the exponent of the densification plot,
    • the average and maximum MRE, and the correlation coefficient of the diameter,
    • the average and maximum MRE, and the correlation coefficient of the average distance,
    • the MQE and the correlation coefficient of the whole distance distribution,
    • the MQE and the correlation coefficient of the degree distribution.
    We used a weighted linear combination of the closeness measures listed above, by defining a single aggregate objective function (AOF). The AOF is the sum of the scaled quantities to the [0, 1] interval (giving opposite contributions to correlation coefficients and errors), eventually with different weight. The scaling factor is determined by the minimum and maximum values, for all sets of the closeness measures of the three models and for every specific network. In this way, we take into account how much each parameters setup simulates a specific property not only with respect to the other possible parameters choice, but also with respect to the capability of the other models to mimic the same property; in other words, we avoid to choose the least bad case for one model without comparison with the other ones. Finally, the AOF is re-normalised with respect to the sum of the weights. An ods worksheet table summarize the results and best parameters matching.
  • Acknowledgements: we are grateful to Pierre Fraigniaud and Emmanuelle Lebhar for some interesting discussions concerning a preliminary version of what we call the RF model. We would like to thank José Javier Ramasco for providing his samples of condmat and IMDB networks, and Francesco Salvini for participating to the earlier phases of this work. This research was supported in part by funds of the following projects and institutions: EU-FET grant NADINE (GA 288956), PRIN 2012C4E3KT Italian national research project "AMANDA – Algorithmics for MAssive and Networked DAta", and Florence section of INFN (National Institute of Nuclear Physics).

Lasagne [P2,P4,P5]

LASAGNE is a Java application which allows the user to compute distance measures on graphs by making a clever use either of the breadth-first search or of the Dijkstra algorithm. In particular, the current version of LASAGNE can compute the exact value of the diameter of a graph: the graph can be directed or undirected and it can be weighted or unweighted. Moreover, LASAGNE can compute an approximation of the distance distribution of an undirected unweighted graph. These two features are integrated within a graphical user interface along with other features, such as computing the maximum (strongly) connected component of a graph. You can download LASAGNE starting from its SourceForge project. A user guide is also available. If you use LASAGNE in your research, please cite our corresponding paper(s).

Telling stories [P6]

We have developed a linear-time delay algorithm for enumerating all directed acyclic subgraphs of a directed graph G(V,E) that have their sources and targets included in two subsets S and T of V, respectively. From these subgraphs, called pitches, the maximal ones, called stories, may be extracted in a dramatically more efficient way in relation to a previous story telling algorithm (see the paper by Acuña et al.). The improvement may even increase if a pruning technique is further applied that avoids generating many pitches which have no chance to lead to a story. We experimentally demonstrate these statements by making use of a quite large dataset of real metabolic pathways and networks. In order to evaluate the efficiency of our new algorithm for the enumeration of stories, called Touché, we performed three experiments, two of them comparing Touché with the previous algorithm, called Gobbolino, and the third one to evaluate the effect of the pruning approach. You can get the Java code, used in our experiments, as a single zip archive. Unzipping it, you create a folder called storysoftware. Inside it, you will find two jar files, that is, ciak.jar and gobbolino.jar.
  • Experiment 1: our first experiment consisted in the enumeration of the whole set of stories using both Gobbolino and Touché (with the pruning approach implemented) and the comparison of their running time. The dataset of this experiment consists of 62 pathways with no more than 10 vertices. For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1000M gobbolino.jar -n="" -q=all and java -jar -Xmx1000M ciak.jar -f="" -verb=0 -bc -sc Each of these commands will return the number of stories and the execution time.
  • Experiment 2: our second experiment consisted in comparing the randomized approach of Gobbolino to Touché (with the pruning approach implemented), giving a fixed amount of time for both algorithms (1 minute, in our experiment) and checking how many stories each method produced. The dataset of the second experiment consists of 118 metabolic networks of various sizes. The dataset may be divided as follows: 8 networks (with size greater than or equal to 10) come from the same metabolic pathways considered in the first experiment; 4 networks are inputs for some experiments in which the set of black vertices came from biological experiments; the remaining 106 were metabolic networks downloaded from the public database MetExplore and with a random set of black vertices (5% of the vertices of the graph were considered to be black). For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1500M gobbolino.jar -n="" -q=10000 -stop=TIME -stopTime=1 -stopTimeCondition=elapsed and java -jar -Xmx1500M ciak1min.jar -f="" -verb=0 -bc -sc (to run the second command, the time limited version of Touché is needed). Each of these commands will return the number of produced stories.
  • Experiment 3: our third experiment was designed in order to evaluate how effective is the pruning approach described in the previous section. To this aim, we collected the running time of Touché with and without the pruning. The dataset of this experiment consists of 62 pathways with no more than 10 vertices. For each of the examined networks (see the corresponding network page) we run the following two commands (the network file has to be included into a subfolder, called input, of the folder containing the jar file): java -jar -Xmx1500M ciak.jar -f="" -verb=0 and java -jar -Xmx1500M ciak.jar -f="" -verb=0 -allpitches Each of these commands will return the number of stories and the execution time.
  • Results: in the tables contained in this PDF file the results of the three experiments are detailed.

Hyperbolicity [P11]

The (Gromov) hyperbolicity is a topological property of a graph, which has been recently applied in several different contexts, such as the design of routing schemes, network security, computational biology, the analysis of graph algorithms, and the classification of complex networks. Computing the hyperbolicity of a graph can be very time consuming: indeed, the best available algorithm has running-time O(n3.69), which is clearly prohibitive for big graphs. We provide a new and more efficient algorithm: although its worst-case complexity is O(n4), in practice it is much faster, allowing, for the first time, the computation of the hyperbolicity of graphs with up to 200,000 nodes. You can get the source code used in our experiments, written in ANSI C, as a single zip archive, and more than 60 real-world graphs, also in a single zip archive.