The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The network of political blogs was first analyzed in “The political blogosphere and the 2004 US Election” by Lada A. Adamic and Natalie Glance, in Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005). This data set, collected before the 2004 American presidential election, records hyperlinks connecting political blogs to one another. These blogs have been labeled manually as either “liberal” or “conservative”. We conduct our analysis on the largest connected component of the graph, and we ignore the direction of the links.
# Load packages
library(Matrix)
library(igraph)
library(gsbm)
# Load data
data(blogosphere)
<- blogosphere$A
A <- blogosphere$names
names <- blogosphere$opinion opinion
We run our algorithm and we use our estimator \(\widehat{S}_{\epsilon}\) to detect outliers : a node is considered an outlier if the corresponding column of the matrix \(\widehat{S}_{\epsilon}\) is not null.
<- colSums(A)
degrees <- nrow(A)
n <-sqrt(mean(degrees))
sqrt_deg
# Choice of parameters
<- 10*sqrt_deg
lambda_1<- 5*sqrt_deg
lambda_2
print(lambda_1)
#> [1] 52.30216
print(lambda_2)
#> [1] 26.15108
# Run the mcgd algorithm
<- gsbm_mcgd(A, lambda_1,lambda_2)
res
# Detect the outliers
<- which(colSums(res$S)>0)
outliers_detected <- length(outliers_detected)
s
names[outliers_detected]#> [1] "atrios.blogspot.com" "dailykos.com" "talkingpointsmemo.com"
#> [4] "washingtonmonthly.com" "blogsforbush.com" "drudgereport.com"
#> [7] "instapundit.com" "michellemalkin.com" "powerlineblog.com"
#> [10] "truthlaidbear.com"
Our algorithm detects \(s = 10\) outliers.
Then, we use our estimator \(\widehat{L}_{\epsilon}\) to estimate the communities of the remaining nodes. More precisely, we estimate the community of a node by the sign of its coordinate along the second eigenvector of \(\widehat{L}_{\epsilon}\), up to a permutation of the two communities. We compare our results with the labels obtained by manual labeling.
# Estimate the communities of the remaining (inlier) nodes
<- which(colSums(res$S)==0)
I <- matrix(rep(0, (n-s)*2), nrow = 2, ncol = n-s)
com_est <- svd(res$L, nu = 2, nv = 2)
sv 1,] <- floor(sign(sv$u[I,2])/2 + rep(0.5,n - s))
com_est[2,] <- rep(3,n-s) - com_est[1,]
com_est[
# labels are obtained up to a permutation
<- which.max(c(sum(com_est[1,] == opinion[I]), sum(com_est[2,] == opinion[I])))
best_est
# Missclassified nodes
<- (com_est[best_est,] != opinion[I])
missclassified_nodes <- sum(missclassified_nodes)
error
print(error)
#> [1] 84
Among the \(n-s = 1212\) remaining nodes, which are considered as inliers, \(84\) are missclassified. The number of missclassified nodes is comparable with the best-known methods that have been applied for this dataset. We note that the nodes that are missclassified by our method either have low degree, or are well connected with nodes belonging to the other community.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.