Saturday, April 25, 2009

Network Algorithm and Stastics

Hello everyone, this is my first post in the process of working on a Google Summer of Code 2009 project. The organization is Gephi which is an open-source software for visualizing and analyzing large networks graphs. My project (as the title of this blog suggests) is to develop code for some common (and some cutting-edge) network statistics and algorithms for the Gephi platform. My mentor is Sebastien Heymann.

So far I've been able to:

1) Install netbeans
2) Download the 0.7 branch of Gephi (via bzr)
3) Start to examine the new Network Statistics API.

This is my second year in GSOC and I'm hoping this blog will have a better fate than last year's blog which dwindeled off after only a few posts.

Currently the statistics, metrics, etc. that we are planning to implement this summer are:
1) Clustering Coefficient
2) PageRank
3) HITS
4) Graph Distance metrics
a) Average shortest path
b) Network diameter
c) Node Betweeness centrality
d) Node Closeness centrality
5) Modularity (Community Detection)
6) Degree Distribution (scale-free power)
My full proposal can be found at: http://web.ecs.syr.edu/~pjmcswee/gephi.pdf


At present one of my main questions will be whether or not we need to differentiate between large and small networks as some of these algorithms are very time consuming. There are approximations which are faster but less accurate for larger graphs but slower more accurate algorithms exist for smaller graphs. Some investigation may be necessary to evaluate this issue.


I'm also working on setting up a wiki for my fellow Gephi GSOC-ers. I'm not sure where I should post it, so for now I'll set it up on my server, although I'm under the impression a better place may be at gephi.org.

That's all for now... more to come

2 comments:

  1. Salut Patrick!

    I'm one of your fellows Gephi GSOC-ers and I'm working on the vectorial preview project. Thank you for having created the wiki! I'm just adding your GSOC blog in my feed reader to follow your work (wow what a pressure for you lol, you'll have to write a lot hehe).

    See you soon, and have fun with GSOC & Gephi! ;)

    ReplyDelete
  2. Hi Patrick!

    For 1) Clustering Coefficient you should certainly use the algorithms presented and implemented at:
    http://www-rp.lip6.fr/~latapy/Triangles/

    For 4) Graph Distance metrics, in particular diameter computations on huge graphs, have a look at:
    http://www-rp.lip6.fr/~latapy/Triangles/

    And finally, for 5) Modularity (Community Detection) the best known algorithm (by far) is at:
    http://findcommunities.googlepages.com/

    All these algorithms are both very simple and extremely efficient. They are provided with a full description in a paper and an efficient implementation, which should help.

    Please feel free to contact me:
    http://www-rp.lip6.fr/~latapy/

    ReplyDelete