Popularity Evaluation Model for Microbloggers Online Social Network
Abstract
Recently, microblogging is widely studied by the researchers in the domain of the online social network (OSN). How to evaluate the popularities of microblogging users is an important research field, which can be applied to commercial advertising, user behavior analysis and information dissemination, and so forth. Previous studies on the evaluation methods cannot effectively solve and accurately evaluate the popularities of the microbloggers. In this paper, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive feedbacks) on the microblogging platform; finally, we used three methods to calculate one’s microblogging flux density, which can represent one’s popularity on the microblogging platform. In the experimental work, we evaluated our model using real microblogging data and selected the best one from the three popularity measure methods. We also compared our model with the classic PageRank algorithm; and the results show that our model is more effective and accurate to evaluate the popularities of the microbloggers.
1. Introduction
Microblogging is a broadcast medium in the form of blogging. A microblog differs from a traditional blog in that its content is typically smaller in both actual and aggregate file size. Microblogging allows users to exchange small elements of content such as short sentences, individual images, or video links. Twitter and Weibo are all the famous microblogging and have more than hundreds of millions of users. Twitter and Weibo social networks have emerged as a critical factor in information dissemination, search, marketing, expertise, and influence discovery and potentially an important tool for mobilizing people [1–5]. Social media have made social networks ubiquitous, also given researchers access to massive quantities of data for empirical analysis. These data sets offer a rich source of evidence for studying dynamics of individual and group behavior, the structure of networks, and global patterns of the flow of information on them [6–8]. Popularity Evaluation Model for microbloggers is very important research field on social network such as Sina Weibo. For example, companies choose popularities of microblogging users to run their commercials, by popularities of microblogging users to publish and forward, to achieve enterprise business advertising. Also in the study of online social networks, the network users need to study the role of other issues and also need a model or method of analysis of the popularities to the user. Therefore, Popularity Evaluation Model for microbloggers can be applied into commercial advertising, user behavior analysis and information dissemination, and so forth.
How to evaluate the popularities of microblogging users is an important research for online social network. Previous studies on the evaluation methods cannot effectively solve and accurately evaluate the popularities of the microbloggers [9, 10]. For example, the popularities of microbloggers are hard to evaluate based on transitional network structure models (PageRank algorithm [10]). It is well known that the more fans users has, which showed greater the popularities on Weibo social network [11]. According to the actual data statistics, we found that there were inactive users in Sina Weibo. We referred to the users as “zombie.” The existence of “zombie” had no contribution to popularity of users, and this is why the relationships between the number of users’ fans and popularity were not close enough. Therefore, this method based on the fan list cannot truly reflect one’s connection strength or popularity.
In order to effectively and accurately evaluate the popularities of the microbloggers over time, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers in this paper. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive comments) in the microblogging platform; finally, we used three methods to calculate one’s microblogging flux density, which can represent one’s popularity in the microblogging platform. The remainder of this paper is organized as follows. We discuss the related work in Section 2. In Section 3, we proposed three kinds of user source in microblogging field as those in the electromagnetic field, which are positive, negative, and neutral sources. For every microblogger, the microblogging flux is calculated according to his/her behaviors (send or receive comments). Three methods are put forward to calculate one’s microblogging flux density in Section 4. In Section 5, we evaluated our model using real microblogging data and selected the best one from the three measure methods to evaluate the popularities of microbloggers. Finally, we conclude the paper in Section 6.
2. Related Work
Recently, online social networks [1] have gained significant popularity and are now among the most popular sites on the Web. Online social network researchers mainly focus on network-structure-model construction [2–5], user-behavior analysis [6], information dissemination [7], content recommendation [8], and so forth. Those research fields are closely associated. For example, information dissemination is mainly influenced by the user interest degree, number of users’ friends, and user’s behavior [12–15]. Mislove et al. [13] collected mass data from four social network sites; and they measured and analyzed the structures of online social networks. Zhao et al. [14] found weak connection in online social network, which has significant influence in evaluating the speed and the breath of the network information transmission. Kwak et al. [15] analyzed the topological structures of Twitter, and they found that the number of Twitter users’ followers is distributed according to a power law followed by an exponential cutoff. Letierce et al. [12] studied the label for the transmission of the information between users. At present, in the view of the network-structure-model researchers, user connection study and its weight measure are very popular. Yun et al. [16] analyzed five factors which influence mutual connections between users in Twitter. Chen et al. [17] analyzed the equivalence attributes of online users, which are mostly based on the user connection strength. In the field of content recommendation, some researchers focus on the category of the content, and then the content will be recommended to users whose interest matching degrees are high. Content recommendation is widely used in the electronic commerce system, video sharing sites, and other fields [18]. Among these applications, collaborative filtering is a mostly used technology; for a given user, it is used to recommend those whose interests are similar to him/her as his/her potential friends. Saito et al. [19] and Tang et al. [20] conducted a series of experiments and found that, due to the different preferences, users with similar interests behave differently in spreading their topics. Yang and Scott [21] found that the mention rate from relevant users is an important factor which influences many aspects of information transmission, such as speed, scale, and scope. Lerman and Ghosh [9] analyzed the influence of network structure in information transmission based on Digg and Twitter data.
3. User Source in Microblog
In the microblogging social network, users release microblogs to share information, and they can also interact with each other by forwarding or commenting on microblogs. The behaviors of the microbloggers are described in Figure 1. As shown in Figure 1(a), the static circle represents a microblogger; lines with positive arrows mean that the microblogger is forwarded or commented on by others. As shown in Figure 1(b), lines with negative arrows mean that the microblogger posts microblogs or comments on other blogs. On microblogging platform, everyone can post microblogs or communicate with each other. However, their popularities are different. We often find on microblogging platform that even two microbloggers post similar number of blogs; their blogs receive different responses (the total numbers of followers and comments are far different). It agrees with the well-known Matthew effect, namely, “the rich get richer and the poor get poorer.” Those celebrities are more frequently followed or commented on by others. To study this Matthew effect in microblogging environment, we will first put forward the concept of the source in microblogging field and then give the criteria to calculate one’s microblogging flux (activity). Our model is based on the electromagnetic field theory, which is first in its kind.


In order to calculate one’s microblogging flux (activity), we first introduce the concept of the source in the electromagnetic field theory. An electromagnetic field (also EMF or EM field) is a physical field produced by electrically charged objects. It affects the behavior of charged objects in the vicinity of the field. The electromagnetic field extends indefinitely throughout space and describes the electromagnetic interaction. The field can be viewed as the combination of an electric field and a magnetic field. The electric field is produced by stationary charges and the magnetic field by moving charges (currents); these two are often described as the sources of the field. As the distribution of the charge or current in space is uneven, the charge density and current density are put forward to describe the distribution of the source. The electric field is produced by stationary charges; there are three kinds of point charges in nature, which are (a) the positive charge, (b) the negative charge, and (c) the neutral charge as shown in Figure 2.



The point charge is a source of electric field which can produce an electric field. Similar with the source of electromagnetic field, in microblogging social network, the flux (activity) is produced by microbloggers. Therefore, we can take every microblogger as a source in microblogging field. In electromagnetism, the magnetic flux (often denoted as Φ) through a surface is the component of the magnetic B field passing through that surface. As shown in Figure 3, the magnetic flux is properly defined as the component of the magnetic field passing through a surface, where B is the magnitude of the magnetic field (the magnetic flux density) having the unit of θ/S (tesla), S is the area of the surface, and θ is the angle between the magnetic field lines and the normal (perpendicular) to S.

User_ID | Blog_Num | Forwarded_Num | Commented_Num | Flux | Source type |
---|---|---|---|---|---|
1440451110 | 1354 | 3 | 2 | −1349 | Negative |
1038518860 | 4 | 30 | 0 | 26 | Positive |
1052073395 | 726 | 19 | 62 | −645 | Negative |
1049364404 | 74 | 33 | 0 | −41 | Negative |
1045865362 | 782 | 32 | 43 | −707 | Negative |
1050862900 | 2058 | 143 | 260 | −1655 | Negative |
1050881714 | 5 | 2 | 3 | 0 | Neutral |
1069689715 | 1 | 22 | 4 | 25 | Positive |
1085799815 | 361 | 14 | 52 | −295 | Negative |
1088980567 | 182 | 35 | 5 | −142 | Negative |
1094141601 | 91 | 18 | 0 | −73 | Negative |
1146768700 | 231 | 22 | 55 | −154 | Negative |
1117480841 | 239 | 37 | 83 | −119 | Negative |
1159065410 | 2 | 6 | 7 | 11 | Positive |
1172610962 | 370 | 20 | 12 | −338 | Negative |



Thus, we can classify users in microblogging platform by considering their microblogging flux. For example, if one’s microblogging flux is greater than 0, we see him/her as a positive microblogger; if one’s microblogging flux is less than 0, we see him/her as a negative microblogger; otherwise, she/he is defined as a neutral microblogger. Figure 5 shows the distribution of three kinds of microbloggers randomly selected from the Sina Weibo sample, from which we can see that most microbloggers (59%) are positive; about 41% of microbloggers in the sample are negative; and only 1% of microbloggers are neutral.

4. Popularity Evaluation Model for Microbloggers
In Section 2, we introduced the concept of microblogging source and classified microbloggers into three kinds according to their microblogging flux. We first propose Hypothesis 1 as follows.
Hypothesis 1. The popularity of a microblogger is decided by his/her microblogging flux.
In order to verify Hypothesis 1, we sort the microbloggers by their microblogging flux in descending order. Table 2 shows the microbloggers whose microblogging flux is arranged in top 16. We can manually compare their popularities by browsing their homepages in the microblogging platform. Take the user pair of ID1671526850 and ID1660209951 as an example. The microblogging flux of the former is much greater than that of the latter as shown in Table 2. ID1671526850 has 158098 blogs and about 499 thousand fans. And ID1660209951 only has 57981 blogs and about 466 thousand fans. In this context, we believe that ID1660209951 is more popular than ID1671526850. That is because every blog of ID1660209951 attracts more feedbacks; even the number of his fans is less than that of ID1660209951. From this example, we can find that Hypothesis 1 may not be very reasonable which needs to be improved.
User_ID | Blog_Num | Fan_Num | Microblogging flux | Microblogging flux per blog |
---|---|---|---|---|
1644395354 | 61,879 | 10,775,394 | 321,090,728 | 5,189 |
1671526850 | 158,098 | 4,995,525 | 147,531,344 | 933 |
1660209951 | 57,981 | 4,663,119 | 101,086,971 | 1,743 |
1567852087 | 88,358 | 3,805,427 | 92,930,548 | 1,051 |
1252373132 | 53,299 | 5,051,721 | 90,050,387 | 1,689 |
1657421782 | 46,420 | 4,369,588 | 70,721,633 | 1,523 |
1266286555 | 2,464 | 23,143,090 | 66,814,481 | 27,116 |
1657430300 | 50,027 | 3,373,765 | 63,745,465 | 1,274 |
1644572034 | 33,486 | 4,512,287 | 63,616,947 | 1,899 |
1134796120 | 53,801 | 1,288,228 | 50,956,066 | 947 |
1197161814 | 10,582 | 31,891,066 | 43,980,575 | 4,156 |
1195230310 | 5,060 | 28,644,548 | 43,755,799 | 8,647 |
1644574352 | 35,958 | 1,780,998 | 42,868,683 | 1,192 |
1182389073 | 51,302 | 13,662,475 | 41,258,257 | 804 |
1192329374 | 5,676 | 30,464,892 | 39,550,836 | 6,968 |
Hypothesis 2. One’s microblogging surface A is mainly affected by the number of blogs.
Hypothesis 3. One’s microblogging surface A is mainly affected by the number of fans.
Hypothesis 4. One’s microblogging surface A is mainly affected by the number of blogs and fans.
MBlog_Fan fully considers all the factors that could affect one’s popularity. It reduces the influences from both twuilt IDs and zombie IDs. The detailed analysis of the three metrics will be discussed in the next section.
5. Experiments and Analysis
In order to validate the effectiveness of the above three popularity evaluation metrics, we will use real microblogging data and conduct a series of experiments. The data used in this paper is crawled from Sina Weibo, which is a Chinese microblogging website. Akin to a hybrid of Twitter and Facebook, it is one of the most popular sites, in China, in use by well over 30% of Internet users, with a market penetration similar to what Twitter has established in the USA. It was launched by SINA Corporation on August 14, 2009, and has 503 million registered users as of December 2012. About 100 million messages are posted each day on Sina Weibo. In this paper, the collected data includes 8,945 users (include 901 VIP users), 20,147,746 blogs, and 925,669,059 comments.
Currently the performance of popularity evaluation methods is evaluated by manual inspection. For each microblogger an effort is made to interpret him/her as a “real” popular star by browsing their homepages in the microblogging platform. For example, we check how many original blogs she/he has posted, how often his/her fans give the feedbacks, and how do his/her fans like his/her posted blogs. However, such anecdotal evaluation procedures that require extensive manual effort are noncomprehensive and limited to small networks. In the Sina Weibo, to improve their popularities, some microbloggers might pursue the VIP privilege. Those VIP users are more easily accessed by other bloggers. We found in Sina Weibo that most of those popular microbloggers are VIP users. Although it cannot prove that all the VIP bloggers are very popular in Sina Weibo, we can use the number of VIP users as indirect indicators to verify the effectiveness of our proposed measure models.
5.1. Comparison of Four Hypotheses
To verify the effectiveness of the blog based flux density MBlog, we select the microbloggers whose MBlog are arranged in top 1500 (about 15% users in the data set). Figure 6 shows those microbloggers with their fan number and MBlog, in which the red nodes represent VIP users. The number of VIP users in those 1500 microbloggers is 348, accounted for about 39% VIP users. Though some bloggers’ MBlog are very high, they are not VIP users, because the number of their fans is huge (see the blue crosses in Figure 6).

We take a similar experiment to verify the effectiveness of the fan based flux density MFan. Figure 7 shows microbloggers whose MFan are arranged in top 1500 with their blog number, in which the red nodes represent VIP users. The number of VIP users in those blogger set is 656, accounted for about 73%. Some microbloggers are not VIP users even when their MFan are very high (see the blue crosses in Figure 7).

Then, we select the microbloggers whose MBlog_Fan are arranged in top 1500. Figure 8 shows them with their blog number, fan number, and MBlog_Fan. The number of VIP users (see the red nodes) is 862, which is accounted for about 95% VIP users. That means most of VIP users are included in those 1500 microbloggers.

To take the further validation of Hypothesis 1, we select the microbloggers whose microblogging flux ΦM is arranged in top 1500. Those users are shown in Figure 9, associated with their blog number, fan number, and the microblogging flux. The number of VIP users in those 1500 microbloggers is only 246, accounted for about 18% VIP users. That means most of VIP users are not included in those 1500 microbloggers.

According to the above experiments, we can conclude that Hypothesis 1 < Hypothesis 2 < Hypothesis 3 < Hypothesis 4. MBlog_Fan fully considers all the factors that could affect one’s popularity, and it is the best metric to evaluate one’s popularity in the microblogging platform.
5.2. Comparison with PageRank Algorithm







5.3. The Current Popularities of Microbloggers
Blog_ID | User_ID | Post date | Forwarded_Num | Commented_Num |
---|---|---|---|---|
3342638187376430 | 1087770692 | 2011-08-04 | 686 | 0 |
3342642545392029 | 1087770692 | 2011-08-04 | 1033 | 407 |
3342652108123960 | 1087770692 | 2011-08-04 | 2849 | 380 |
3342956334136552 | 1087770692 | 2011-08-05 | 359 | 146 |
3342969214160990 | 1087770692 | 2011-08-05 | 566 | 300 |
3343021802451966 | 1087770692 | 2011-08-05 | 7153 | 4589 |
3343090463253335 | 1087770692 | 2011-08-05 | 1146 | 420 |
3350669889615619 | 1087770692 | 2011-08-26 | 823 | 1263 |
3350672028496051 | 1087770692 | 2011-08-26 | 465 | 649 |
3350705175929327 | 1087770692 | 2011-08-26 | 801 | 262 |
3351097033072419 | 1087770692 | 2011-08-27 | 572 | 392 |
3577907916358736 | 1087770692 | 2011-08-27 | 387 | 610 |
3577963012832148 | 1087770692 | 2013-05-14 | 770 | 888 |
3578260099304910 | 1087770692 | 2013-05-14 | 1187 | 627 |
3578283050850408 | 1087770692 | 2013-05-15 | 9768 | 2969 |
3578306295561973 | 1087770692 | 2013-05-15 | 560 | 466 |
3578324054064691 | 1087770692 | 2013-05-15 | 916 | 1392 |
We chose four microbloggers to compare their current popularities and analyze their changing over the time. As shown in Figure 13, the current popularities of the four microbloggers may vary in a period, but the overall trends are relatively stable. The current popularities of ID1087770692 are mainly concentrated in the range between 10−5 and 10−4; the current popularities of ID1038330705 grow from 10−5 to 10−3; the current popularities of ID1025582437 drop from 10−1 to 10−3; the current popularities of ID1041508671 are mainly concentrated around 10−4. The microbloggers may sometimes post some interesting original blogs, which will attract many feedbacks from others; then his/her current popularity on that day may be a very high value. In general, the feedbacks of one’s blogs mainly come from his/her close friends; therefore, the number of feedbacks of one’s blog will remain stable over the time.




6. Conclusion
The popularities of microbloggers are hard to evaluate based on transitional network structure models (PageRank algorithm [14]). Some microbloggers may build a very long watch lists, the network built based on fan lists and watch lists cannot truly reflect one’s connection strength, or popularity. In this paper, we proposed an electromagnetic field theory based model to analyze the popularities of microbloggers. The concept of the source in microblogging field is first put forward, which is based on the concept of source in the electromagnetic field; then, one’s microblogging flux is calculated according to his/her behaviors (send or receive feedbacks) on the microblogging platform; finally, we used three methods (MBlog, MFan, and MBlog_Fan) to calculate one’s microblogging flux density, which can represent one’s popularity on the microblogging platform. In the experimental work, we evaluated our model using real microblogging data and found that MBlog_Fan can best reflect one’s popularity compared with other two metrics. We also compared our model with the classic PageRank algorithm; and the results show that our model is more effective and accurate to evaluate the popularities of the microbloggers. The contributions of this paper can be summarized as follows: (1) the proposed popularity evaluation metric-MBlog_Fan is effective and reliable to evaluate the real influence of bloggers in Sina microblogging platform; (2) the popularities of microbloggers are different over the time; however, their overall trends are relatively stable. Some big oscillations may happen due to the contents of their released blogs.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work is supported by the NUAA Fundamental Research Funds (NS2013090).