S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea...

37
C Consiglio Nazionale delle Ricerche Social Sensing: using Social Media for an Early Warning system S. Cresci, M. La Polla, A. Marchetti, M. Tesconi IIT TR-19/2013 Technical report Ottobre 2013 Iit Istituto di Informatica e Telematica

Transcript of S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea...

Page 1: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

C

Consiglio Nazionale delle Ricerche

Social Sensing: using Social Media for an Early Warning system

S. Cresci, M. La Polla, A. Marchetti, M. Tesconi

IIT TR-19/2013

Technical report

Ottobre 2013

Iit

Istituto di Informatica e Telematica

Page 2: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Social Sensing: using Social Media foran Early Warning system

Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi

keywordssocial sensing, social media analysis, early warning

event detection, burst detection, classification

1

Page 3: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Abstract

Social Media represent today the way used by millions of users to interact according to new paradigms of communication and participation: users are then an ideal ground for the study of the diffusion of new topics of discussion and new dynamics of communication.

Social Media platforms can be used as starting point for Social Sensing applications in which users can be considered as providers of information.Social Sensing is based on the idea that communities or groups of people provide a set of information similar to those obtainable from a single sensor; this amount of information generate a complex and adequate knowledge of one or more specific issues.

A possible field of application for Social Sensing is the Emergency Management. This field is interesting for a variety of stakeholders: government agencies, industry information, ordinary citizens. Using the SM for Emergency Management, these subjects can gather updated information on emerging situations of danger, in order to gain greater situational awareness, the possibility of alerting interested parties promptly or verify information obtained through other channels.A system able to predict or identify events that are of social concern can be referred as an Early Warning system.

In this work we propose a general architecture for an Early Warning system and, as a proof­of­concept we describe an implementation of this architecture for a real scenario.We use Twitter as source of information for the detection of earthquakes in the Italian territory.

We compare our results with official data provided by the National Institute of Geophysics and Volcanology (INGV), the authority responsible of the monitoring of seismic events in Italy.Results show an high ability of the system in the identification of events with intensity equal or greater than 3.5th degree on the Richter scale with 10% of False Positives.

2

Page 4: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Table of contents

1. Introduction1.1 Social Media and the new paradigms of communication1.2 Social Sensing1.3 Early Warning Systems for Emergency Management1.4 Twitter as source of information for Social Sensing

2. State of the art2.1 Generic approaches2.2 Sector­based experiences2.3 Applications2.4 Event Detection

3. Goals3.1 Classification of events3.2 Framework Design3.3 Italian Scenario

4. Framework design and definition4.1 Proposed architecture4.2 Data Acquisition

4.2.1 Search Keywords4.2.2 Twitter APIs

4.3 Event Detection4.3.1 Data filtering4.3.2 Temporal analysis4.3.3 Spatial analysis4.3.4 Event reliability

4.4 Event Handling4.5 Simulator4.6 Damage Assessment

5. Results5.1 Collected Data5.2 Analysis of the results

6. Conclusions and future works7. References

3

Page 5: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

1. Introduction

1.1 Social Media and the new paradigms of communicationThe sharing of information and opinions on Internet has become an everyday reality. Over the last few years Social Media (SM), web platforms that allow the creation and the exchanging of user generated content (UGC), affected the way in which people communicate: these platforms, allowing the interactions between people, are new vectors for the spreading of information.In this sense, SM play an important role for a real democratization of information transforming people from consumers to publishers of content.

The SM represent today the way used by millions of users to interact according to new paradigms of communication and participation: users are then an ideal ground for the study of the diffusion of new topics of discussion and new dynamics of communication.

Figure 1.1 shows an interesting infographic describing the SM world, updated to 2013 .The most popular social platforms, such as Facebook, Twitter, YouTube, Wikipedia, MySpace, etc., which already had millions of users, allow not only the possibility of sharing text messages and reviews, but also photos, videos, links, and more.

The most important SM, in terms of number of subscribers is Facebook: in May 2013 registered users were 1.11 billion .1

According to statistics , in July 2013 Twitter had 554 million active users, that publish, on the 2

average, 58 millions tweets per day (9.100 tweets per second).

1 Source: http://en.wikipedia.org/wiki/Facebook_statistics2 Source: statisticbrain.com

4

Page 6: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figure 1.1: The world wide overview of Social Media in 2013.

Each SM user leaves traces of his activities and his own experiences. For this reason, recently, SM has increasingly attracted the attention of researchers that want to exploit the enormous amount of fresh information that every day is published and shared online.All these data, freely shared by users, represent a rich pool of information that can be analyzed from different points of view: from a sociological one, or a marketing one, to the more social and participatory oriented, such as the alerting in specific situations of distress or emergency.

1.2 Social SensingConsidering the above mentioned reasons, SM platforms can be used as starting point for all analysis focused on the use of the users as providers of information. We can refer to these kind of studies as Social Sensing.Social Sensing is based on the idea that communities or groups of people provide a set of information similar to those obtainable from a single sensor; this amount of information generate a complex and adequate knowledge of one or more specific issues. [1]SM users can be therefore treated as real sensors that provide information about specific issues in real time.

5

Page 7: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

The important role played by the Internet in the communication between people, in conjunction with the new role of the SM as sensing platforms, pose a new challenge in the research of new experimental forms of analysis. The amount of data involved, the increasing number of involved users and used platforms require the adoption of new methods of collection, storage and processing of the data.

1.3 Early Warning Systems for Emergency ManagementAmong the various fields of study regarding Social Sensing there is one related to the Emergency Management. This field is interesting for a variety of stakeholders: government agencies, industry information, ordinary citizens. Using the SM for Emergency Management, these subjects can gather updated information on emerging situations of danger, in order to gain greater situational awareness, the possibility of alerting interested parties promptly or verify information obtained through other channels [2] [3].A system capable of performing these functions may be a valuable decision support tool.

In case of an event or a disaster the information needs to be collected as soon as possible. Acquiring the information rapidly leaves more time to make correct decisions about the situation to deal with and allows a timely notification to the involved population.

In this work we aim to collect and analyze spontaneous reports of users related to phenomena which cause alarm or danger to society such as: hazardous events, natural disasters, etc, both in small and large scale.The idea comes from the intuition that the first people that “announce” these phenomena are people directly involved; this, together with the central role of SM in modern paradigms of communication, can help locate the event and collect important information about it.For the purposes of this document we will refer to Early Warning systems as those architectures able to predict or identify events that are of social concern. An Early Warning system is a chain of information communication systems comprising sensor, detection, decision, and broker subsystems, in the given order, working in conjunction, forecasting and signalling disturbances adversely affecting the stability of the physical world; and giving sufficient time for the response system to prepare resources and response actions to minimise the impact on the stability of the physical world. [4]We will focus on the study of a specific type of Social Media: Social Networks (SN). A Social Network is a platform that allows virtual relationships among users in a similar way to what happens in the real world [5]. Using these platforms, users can exchange messages, share their interests and talk about their daily activities.Social networks are particularly suited to this type of analysis due to the large number of users involved (the two SM with the highest number of subscribers in the world are just two SN: Facebook and Twitter) and the high level of interaction.

6

Page 8: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

1.4 Twitter as source of information for Social SensingAnalyzing the characteristics and main differences between different SN we can notice that Twitter presents some peculiarities that make it particularly suitable as a source of data for platforms of Social Sensing [6] [7].An important feature of Twitter for the goal of this work lies in the type of messages that users can exchange. While on Facebook people are invited to talk about their thoughts and opinions (Fig. 1.2), Twitter users generally talk about their activities and therefore of what is happening around them (Fig. 1.3). Studying messages shared by Twitter users can make it easier to acquire a good context awareness, that means understanding what is happening and where.

Figura 1.2: Sharing content on Facebook.

Figura 1.3: Sharing content on Twitter.

Moreover, Twitter is more interactive and responsive than other SN: due to the limitations imposed on the length of public messages exchanged (maximum 140 characters), the lifetime of a tweet (as they call exchanged messages on Twitter) is rather short, and therefore users tend to tweet more frequently.

7

Page 9: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figura 1.4: Comparison of tweet and post lifetime.

Figure 1.4 shows the results of a study on the lifetime of tweets and posts via box­plots . The 3

lifetime we refer to is the time that elapses from the time of creation of a message to the time of the last interaction with it (e.g. like, comment, retweet, etc.).The study analyzes the interactions on a sample of 10,000 posts (Facebook) and 10,000 tweets (Twitter): the results underline a marked difference in longevity between content published on the two SN. Considering the whole sample analyzed the longest post has had a life time of 21 days, while the longest tweet only 31 hours.In conclusion, Twitter is more immediate than other major SN and give us a more updated picture of what is happening.

3 http://en.wikipedia.org/wiki/Box­plot

8

Page 10: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

2. State of the artThe use of SM for Social Sensing is currently object of a lot of studies. In the past years different initiatives, both in the scientific and in the application environment, has been developed with the aim of exploiting the information available on these platforms to create new knowledge on specific issues and offer new services.

In literature descriptive and general approaches are opposed to practical, sector­based experiences (e.g. epidemiology, earthquakes, etc.).What is missing is a generic architectural approach.A complete Early Warning System is based on four connected elements [8]:

1. knowledge of risk factors;2. a monitoring system;3. a system for communication and the spreading of information;4. reaction ability to the event.

A lot of state­of­the­art studies focuses on points 2 and 3, and on the ability to quickly identify the event and/or the ability to effectively tip off users involved.

2.1 Generic approachesThe European initiative Alert4All [8] aims to create a framework that will improve the effectiveness of warning messages and communications with the population in case of disasters and/or emergencies at pan­European level. The focus is on the role of SM in the emergency communications [9, 10].The SMART­C project [11] [12] focuses on the identification of an emergency situation. This project describes an high­level, multi­modal framework able to collect and integrate data from different sources such as Social Networks, blogs, telephone land line communications, SMS, MMS. Collected data is cross­checked with databases of police, hospitals and other public offices in order to gain additional information about the occurred event. A mobile application is used to notify emergency situations to the users that subscribe to the service.Differently from Alert4All, the SMART­C project focus the attention on urban events.

2.2 Sector-based experiencesDifferently from the above described approaches, following detailed experiences are targeted to specific fields of application; these works addressed technical issues and proposed specific solutions to problems found.

One of the most interesting works [13] concerns epidemiology: its goal is the prediction of the spreading level of flu in the United Kingdom. Researchers were looking to a correlation between the spread of an index derived from tweets and the official rate of diffusion data published by the Health Protection Agency (HPA). Obtained results show an accuracy in the prediction in the range of 95%. It’s important to underline that this is not a real­time analysis, but instead it is

9

Page 11: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

performed on a daily basis.

The work presented in [14] has the goal of the creation of an Early Warning System for real­time detection of earthquakes and tornadoes in Japan. The approach chosen by the Japanese researchers is based on Bayesian statistics for both the event detection and the estimation of the position of the event. According to the results reported, the system was able to detect the 89.7% (70 of 78) of earthquakes of scale JMA (Japan Meteorological Agency) 2 or more occurred in two months.The system was able to send event notifications with delays ranging from 20 seconds to a minute, compared with delays in the order of 8 minutes for the official announcement of the JMA.

The SHIELD system [15] was designed and developed as part of the fight against crime. The system aims to exploit mobile devices to reduce the response and rescue times for people victims of micro­criminality. This work focuses on the mechanisms of communication, on the spreading of information and on the ability to react to an event.The case study is represented by accidents and by the phenomena of micro­criminality that occur within the U.S. university campuses.In this system the Event Detection component is not present since the victim itself reports the event interacting with the user­interface of the mobile application.

2.3 ApplicationsFrom the application point of view, one of the most interesting initiatives is represented by Emergenza24 [16]. The platform, active since 2 years, is the experimental version of the "Social Network for Emergency Management", for the Italian territory. The initiative, involving people on a voluntary basis, aims to contribute in real time to the collection, verification and dissemination of information related to different kinds of emergencies.Twitter users can collaborate with the network bringing to the attention of the operators news and reports via mentions of the Twitter account @emergenza24.4

In particular the staff of the platform has established a specific format that has to be used to signal events, including the mentions of the Twitter account @emergenza24. This strict requirement dramatically limits the usefulness of the service.Similarly, the SMEM platform (Social Media Emergency Manager) [17] tries to promote the use of an hashtag, #smem, to report emergencies or other events of social interest. The initiative is currently discontinued.City Sourced [18] is a web and mobile application, widespread in some cities in the U.S., that allows citizen to send reports of discomfort or emergencies at urban level. Reports sent through the mobile application are geo­referenced automatically by the devices and displayed on a map.

Other italian initiatives oriented to the reporting of Urban Management issues are: ePart [19] Urban Decor [20] and Uptu [21].

4 A mention is an explicit citation of a Twitter account in a tweet.

10

Page 12: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

2.4 Event DetectionOn of the most important technical aspects in the study of Early Warning systems is the Event Detection.An interesting work about this topic is EventRadar [22] that proposes a novel method to detect local events (those that involve a limited number of people).The detection of an event by the system is based on temporal and spatial analysis in real­time of the tweets collected. The temporal analysis is done comparing the current frequency of tweets that contain some specific keywords with the frequency of the same keywords in the last week. Spatial analysis is conducted using a clustering algorithm .Even if, nominally, EventRadar is able to identify events starting from at least 3 tweets, the system was able to correctly identify only 68% of the analyzed events.Starting from the assumption that the detection of events on Twitter is directly related to the recognition of a burst of tweets with the same keywords, we also investigated the different approaches to Burst Detection in the literature. In particular, the work presented in [23] describes approaches that can be used in the field of real­time Event Detection.

11

Page 13: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

3. GoalsSM represents, in the context of Web 2.0, the most effective, sophisticated and powerful way to collect preferences, tastes and activities of groups of users [24]. The advantage of social networks, compared to traditional methods of investigation, lies in the way of user participation that is not induced or guided in any way. In this scenario, users act as sensors that provide real­time data about events, news and everything that happens in real life.

The analysis of the literature has highlighted the lack of a comprehensive approach to the problem of Emergency Management on SM. What is missing is an architectural approach that details the components addressing the practical problems of typical applications.

3.1 Classification of eventsIn order to design a system based on events, ones must identify in advance what are the characteristics of the events to be treated. With the term event we refer to a significant occurence that occurs in a finite period of time. Typically, a lot of the events that happen in the real world are reported in the SM world.We firstly need a classification of the different events that can occur. In the work presented in [22] researchers show a possible classification of event as shown in Figure 3.1.

Events are firstly divided into real and virtual. Virtual events are phenomena that occur just in the online world. Real events can be further divided into Local and Global. A Local event has a specific location both temporally and geographically, while Global events represent events not related to a specific location such as some global celebrations and holidays (Christmas, Valentine's Day, etc.).

Figure 3.1: A possible classification of events [22].

According to this classification, events processed by an Early Warning System for Emergency

12

Page 14: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Management are local, with a precise time and space.

Local events differs in the number of users and/or the geographical area involved and this can affect the ability of detection.Since we exploit people as sensors, events that involve fewer people are more difficult to be detected. Moreover, developing a system that can manage both types of events, those of small and medium­large scale, is a challenging task.We also have to take into account the importance of quick identification of an event.

3.2 Framework DesignThe main goal of this work is to define a framework to exploit Twitter as a platform for sensing and to use collected data, properly analyzed, to detect and report events that arouse public alarm.

The system is not designed to deal with a particular event or family of events. This is a challenging task: as demonstrated in [22], identifying generic events is much more complex than detecting a specific event or family of events.

Another important feature that our solution provides is the responsiveness of the system: the detection of an event is not sufficient, but we need to perform the detection in real time.Assessing the response time of the system is not an easy task: our objective is to detect an event before the publication of the news by official channels, such as official offices involved in the management of the event, newspapers, blogs, etc. [4].

3.3 Italian ScenarioWe will focus our attention on the italian scenario. The Italian civil defense categorizes the risks that typically happen in Italy in 8 main categories :5

seismic; volcanic; weather: hydro geological and water; fire; health; nuclear; environmental; industrial.

Among the above mentioned, the most interesting, both for national authorities and media is the one related to earthquakes.According to the data of the Civil Defense , during the last 2500 years Italy was affected by more 6

5 Source: http://www.protezionecivile.gov.it/jcms/it/rischi.wp6 Source: http://www.protezionecivile.gov.it/

13

Page 15: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

than 30,000 earthquakes with medium and high intensity (greater than IV­V on the Mercalli intensity scale), and about 560 events with intensity equal to or greater than VIII level on the Mercalli intensity scale. In the XX century, 7 earthquakes with Richter magnitude 6.5 or greater (X and XI degree Mercalli ) were registered.

Data about the number and the frequency of earthquakes in Italy and their impact on the territory and the population become more relevant if cross­checked with data about the use of Twitter in Italy.In 2013, Twitter registered an increase of 40% of active users [25]. In Italy the increase reached the 50%.

Figure 3.2: World distribution of (1)Twitter users (on the left) and (2) earthquakes (on the right).

Figure 3.2 shows the worldwide distribution of earthquakes and the worldwide use of Twitter. It is interesting to note that there are some regions of the globe where the density of seismic events and the use of Twitter are overlapped. In addition to Japan, as pointed out in [14], there are other areas of the globe with an high level of overlapping between the two graphs: Indonesia, Turkey, Iran, Italy and some of the U.S. west coast cities, such as Los Angeles and San Francisco.

For the evaluation of our results, we will use official data published by the National Institute of Geophysics and Volcanology (INGV), the authority responsible of the monitoring of seismic events in Italy.INGV uses different channels, including Twitter, to collect additional information about all seismic events identified with their equipments. Using the account @INGVterremoti, the authority provides the Twitter community with the following information (for each detected event):

magnitude (type and value); date and time of the earthquake; geographical coordinates of the epicenter; depth; administrative province of Italy; link to a page with a detailed description of the earthquake, provided by the INGV.

14

Page 16: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

4. Framework design and definition

4.1 Proposed architectureFigure 4.1 illustrates the proposed architecture. This schema was designed taking into account previous experiences in the field of Emergency Management. The architecture is designed to solve problems raised in the literature.The architecture is composed by different modules:

Data Acquisition: collection and storage of Twitter data; Event Detection: detection of an event starting from the analysis of collected data; Event Handling: calculation and updating of the information about the detected event as

new data are collected; Early Warning: publishing of the news related to the detection of the event and related

information; Damage Assessment: starting from collected data, estimation of the consequences of

the event; Recalibration: improving of the data acquisition process.

Figure 4.1: Overall architecture of an Early Warning System for Emergency Management.

15

Page 17: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

4.2 Data AcquisitionThe role of this module in the overall system is crucial: following modules use the data collected in this phase to perform their functionalities. Errors at this stage, especially regarding the loss of data, have to be reduced.

In order to make this module as much independent as possible from the platform, collected data are stored in the same format.

We can split the collecting phases: the selection of the searching keywords; the use of the interface provided by the SN for the collection of the data.

In our work, we will use Twitter as source of information.

In our application, in addition to tweets related to seismic events, we also collected a wide range of other information in order to conduct a more comprehensive study.In order to do this, we set up 4 different crawlers that collect the following information:

tweets with 2 keywords in Italian (terremoto, scossa) related to seismic events; tweets with 2 keywords in English (earthquake , shaking) related to seismic events; tweets with 54 Italian keywords related to 11 different categories of events; tweets published or addressed to the INGV account (@INGVterremoti).

4.2.1 Search KeywordsThe process of data collection is based on the collection of public messages shared by users. Considering all messages (statistics show an average production rate on Twitter in the region of 9100 new tweets per second ), only a small part of it is related to a specific event. Moreover, 7

collection and storage of these data involves use of resources.

In order to filter messages produced on SN and collect only those interesting for our scope, it is necessary to identify some keywords related to the event. The name of the event and its synonyms (eg, earthquake, seism) could represent a good starting point. We associate one or more keywords to an event. To detect multiple events, the system will maintain an association between each event and its related keywords.The extraction of information from a SM is implemented using APIs provided by every platform.

In our application in order to select a comprehensive set of keywords we started both from terms reported in the literature [14] and other words related to earthquakes in the Italian language.We firstly collect tweets containing at least one of the following candidate keywords:

1. crollo2. terremoto

7 Source: statisticbrain.com

16

Page 18: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

3. scossa4. sisma5. magnitudo6. trema7. tremando8. crepa9. crepe

We progressively restrict this set eliminating those who had no peaks corresponding to seismic events reported by official agencies and/or because more related to the task of Damage Assessment then Event Detection (crollo, crepa, crepe) or often used in official communications (sisma, magnitudo) or because too generic and potentially used in a lot of contexts other than the events we want to detect (trema, tremando).At the end of this selection process we pinpoint two terms to monitor: terremoto and scossa.

4.2.2 Twitter APIsWe will use the Twitter API v1.1 (the latest at the time of this writing) consisting of 104 methods divided into 16 categories.Methods that can be used for the acquisition of tweets can be grouped into 4 categories:

Timelines : the methods of this class are used to retrieve tweets of a given user;8

Tweets: this category enables the collection of a tweet starting from its id; Search: a single method that returns published tweets matching some criteria; Streaming: allows the searching and the collection of tweets in real­time. Retrieved

tweets are filtered using some parameters.

Among the above mentioned categories, Search API and Streaming APIs are interesting for our scope due the fact that these methods allow a search based on specific keywords.

Search API performs a search on a set of indexed tweets, recently published, that match the specified query. Problems in the use of this API are related to the impossibility of having access to all published tweets.

Differently from the Search API, Streaming API opens a persistent connection with a stream of Twitter: using this connection new tweets, matching searching parameters, can be collected.Tweets published before the opening of the connection can not be retrieved in this way. This limitation is not a problem for our system due to the real­time nature of this study.Another limitation of the Streaming APIs can potentially affects our solution. Due to the fact that the connection can potentially access to the entire flow of tweets produced in the world, Twitter delivers at most 1% of the total traffic and automatically cuts off the excess of traffic, indicating the number of excluded tweets. Anyway, our system collected tweets for over two months and the generated traffic never exceeded the 1% threshold, thus we never encountered such

8 A timeline is a set of tweets published by an user and ordered by the time of publishing.

17

Page 19: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

limitation.

Moreover, we need to take into account the problem of falling­behind: after receiving the tweets from the API, clients have to process them rapidly. During this processing, the client cannot accept other tweets: if this delay exceed a non specified threshold, Twitter disconnect the client.To avoid this problem, tweets are stored in ad­hoc structures in order to be processed.In our work we used the Streaming API.

Since 11 June 2013 , access to every Twitter API requires an authentication mechanism. The 9

authentication is implemented using Twitter accounts that exchange messages with the platform using the OAuth protocol . Each account can perform a limited number of requests in a time 10

window of 15 minutes.In order to guarantee the robustness and the reliability of the system we also implemented additional mechanisms that manage rate­limit and generic connection problems in the use of the APIs.

4.3 Event DetectionThe Event Detection module performs elaboration as described by the UML Activity Diagram in Figure 4.2.

Figure 4.2: Activity Diagram of the Event Detection module.

9 https://dev.twitter.com/blog/api­v1­retirement­date­extended­to­june­1110 https://dev.twitter.com/docs/auth/oauth

18

Page 20: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

4.3.1 Data filteringUsing keywords to query the platform allows us to gather data potentially related to an event. However, not all of the messages gathered in this process talk about an ongoing event: some messages can be misleading for the event detection module and can be considered as noise.We identified two different sources of noise in the collected messages: messages in which the keyword is used with a different meaning from the one related to the searched event and messages in which the keyword actually refers to the type of event we look for, but not to an event in progress.Due to this observation it is essential that we provide mechanisms for data filtering aimed to reduce the noise.

In the application developed, even if tweets were collected using targeted keywords, a “cleaning” phase was necessary in order to reduce noise caused by messages not related to earthquakes in progress. This cleaning involves 2 steps:

a pre­filter phase; a phase of filtering by means of a classifier.

In the pre­filtering process we discard tweets having at least one of the following characteristics: retweet messages, because we need independent and distinct tweets; reply messages as retweets are not independent; tweets published by accounts listed in a blacklist: this blacklist includes 345 accounts of

offices and/or official channels that periodically publish information about seismic events; tweets containing words listed in a blacklist: due to the fact that the language detection

mechanism offered by Twitter is best­effort and is not always able to filter out tweets in languages other than Italian, we used a list of words belonging to other languages. We filled this list using the most frequent words appearing in wrongly collected tweets in other languages.

Another, more sophisticated, filtering phase is necessary to further reduce the amount of noise in the collected set of tweets.We accomplish this task using the classifier Weka, as described in Section 5.2.2.As shown in Figure 4.3, the usage of the classifier involves two phases: an offline training phase and a real­time predictive phase.

19

Page 21: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figure 4.3: Classifier workflow.

During the offline training phase, the classifier is trained using two distinct sets of tweets: tweets related and tweets not related to a seismic event in progress. Tweets of the training­set were manually classified using an ad­hoc interface. The training­set contains 1412 tweets: in order to obtain a balanced set 706 tweets of the set are classified as useful and 706 are classified as not useful.

Then a set of features was defined. Each feature is represented using a numerical expression. We used, in our work, the following set of features:

URL count: number of URLs in the text of the tweet; mention count: number of mentions in the text of the tweet; word count: length of the tweet in terms of words used; character count: length of the tweet in terms of characters used; punctuation count: number of punctuation characters used ; slang/offensive word count: number of words in the text of the tweet belonging to a

dictionary of vulgar terms or expressions of fear.

We then associate to each tweet a set of numbers representing these features. The list containing the features and the specification of the class of the tweet (useful or not useful) is then processed using Weka in order to obtain a model that defines the class of the tweet based on its attributes. An example of WEKA file, in Attribute Relationship File Format (ARFF), is shown in Figure 4.4.

20

Page 22: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figura 4.4: Excerpt from the training­set .arff file.

We tested different training algorithms available in Weka: the best result led to an accuracy of 90.085% (1272/1412 tweets) and was obtained using the decision tree J48, corresponding to the Java implementation of the C4.5 algorithm [29] (see figure 4.5).

Figure 4.5: Training phase results.

21

Page 23: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

The obtained model is used during the prediction phase to infer the class of tweets. The prediction is performed by invoking the classifier in real­time every time that a pre­filtered tweet is received. Due to the fact that Weka generally need less than a second to predict the class of a tweet, we can use the filtering by classifier in real­time applications.

4.3.2 Temporal analysisThis task is about the analysis of collected messages, which were not previously discarded in the filtering process, to assess the occurrence of an event. To do this, the component must be able to recognize an unexpected growth in the number of messages related to a category of events.A key point in this phase is the identification of the minimum number of messages needed to state that an event has occurred. This number represents a trade off between the sensitivity of the system (a small number of messages is required) and the remaining noise after the filtering process.

To address the problem of detecting events we investigated numerous techniques, such as the realization of a temporal model based on Bayesian statistics [14], the use of Peak­Detection algorithms, the Corrected Conditional Entropy (CCE) [26] and different algorithms belonging to the fields of Change­Detection and Burst Detection.

Peak­Detection algorithms are targeted to the identification of a peak in a graph, where a peak is similar to a point of relative maximum. Even if this approach guarantee an accurate detection of a peak, it doesn’t represent the best solution in our scenario: the response time of peak detection algorithms clash with our real time requirements.

For similar reasons we discarded the Corrected Conditional Entropy (CCE) approach. These kind of algorithms allows the detection of a disturbance in a time series. In our scenario, in situations with no events in progress, split times (how much time has passed between a message and the next ) between the collected messages are regular. When an event occurs users tend to publish a large amount of messages on the event in a short time, and this regularity is perturbed.By measuring the split time between messages and calculating the CCE on this set of data it is possible to detect the disturbance, therefore leading to the detection of a new event.Unfortunately, the calculation of CCE is computationally expensive and, therefore, not suitable for real­time environments.

Another possible way of detecting disturbances on the channel is by monitoring the presence of “bursts” of tweets by means of a Burst Detection algorithm. This family of algorithms is based on the calculation of the frequencies of data in a time window. To locate a burst the current frequency is compared with a reference frequency or with the frequency in the previous time window to locate a burst.These algorithms are computationally simple and thus lend themselves well to being used in the

22

Page 24: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

field of real­time studies.

Figure 4.6 shows the arrival times of filtered tweets during an earthquake occurred at 00:09 a.m, August 27, 2013 in Umbria and Marche. After T1, the occurrence time of the earthquake, a big burst of tweets was recorded by the system.

Figure 4.6: Tweet burst occurred in correspondence of the earthquake (27.09.2013 in Umbria e Marche).

In our application the detection of an event is realized through the recognition of a burst of tweets on the channel. A burst is defined as the occurrence of a large number of events within a time window [30].According to the time in which calculations are performed, Burst­Detection algorithms can be divided into two categories:

1. algorithms running at regular time intervals;2. algorithms triggered by the receiving of new data (tweets in our scenario).

The algorithms in the second category, such as the one proposed in the paper [23], can be easily implemented in our system due the fact that we already use a triggered mechanism, the Streaming API, for the collection of the tweets.

Another interesting feature of some Burst Detection algorithms resides in the possibility to use more than one window for the detection of a burst. In fact, the use of a small window allows the early detection of events at large scale of great capacity, but is not efficient for the detection of smaller events. On the other hand a large window delays the identification of events at large scale.In addition to the beginning of a burst it is also important to identify the end of a burst. Typically,

23

Page 25: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

algorithms in literature “declare” the end of the burst when the current frequency is lower than a specified threshold.The estimated duration of the burst is strictly dependent by the chosen comparison value. Figure 4.7 shows two bursts, with their related duration, detected using two algorithms, described in [23] and [30]. The comparison value is, respectively, previous temporal window, and a reference value.In our scenario, the best solution is obtained using the comparison with a reference value.

Figure 4.7: Burst duration according to previous temporal window (sx) and reference value (dx).

4.3.3 Spatial analysisAn important information that we need to associate to an event is the even location. Even if Twitter offers the possibility to automatically geocode messages, unfortunately georeferenced tweets collected in our study were only 1,5% of the total. Considering only the georeferenced messages would drastically reduce the probability to detect medium/small­scale events.

To solve this problems, in [14], researchers propose a solution based on the location associated to the accounts of Twitter users. If a tweet is not georeferenced, the position of the event is inferred from the “position” of the twitter account. This solution is not reliable: the “location” field of the account is not mandatory and, even if filled, there is no control over it and users can type whatever they want.Out of 331,000 Twitter accounts we studied, 59.9% of these have a filled location field, but only 24.6% of the total appears to be filled correctly.

We propose a solution based on the analysis of the content of messages published in relation to an event.To implement our solution we need to extract the text of the messages, recognize names of places and then georeferencing these names.

24

Page 26: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

To accomplish this task we make use of a tool called TagMe , a service of disambiguation of 11

texts developed at the University of Pisa [27]: starting from a tweet, this service provides a list of tags with the description of the term and other related information. Moreover, to disambiguate the tag, TagMe exploits DBpedia to extract the coordinates.12

4.3.4 Event reliabilityThe system can report events that did not actually occur (for example as a consequence to a peak of unfiltered noisy tweets). For each detected event the application provides a value that indicates the reliability of the prediction. The reliability of a prediction is expressed as a percentage and depends on the size of the peak that generated the alert. Events identified on the basis of small peaks have lower reliability compared to events characterized by very large peaks or bursts.The mapping between the frequency of tweets in a time window and the percentage expressing the reliability of a forecast is accomplished by the usage of a 4­parameters logistic function (4PL).

4.4 Event HandlingThe goal of this module is to “monitor” an ongoing event in order to (1) collect as much information as possible on it, and (2) stop the monitoring when the event is “terminated”.

The Event Handling module is therefore executed every time a new event is detected. It is responsible for updating the information available on the event as long as it is still in progress, as showed by the UML Activity Diagram in Figure 4.8.

11 http://tagme.di.unipi.it/12 http://it.dbpedia.org/

25

Page 27: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figura 4.8: Activity Diagram of the Event Handling module.

In order to perform a monitoring of the official channels we implemented a crawler that collects tweets from the official INGV account (@INGVterremoti).The crawler collects all messages from or addressed to that account. We parse the text of the tweet and its associated metadata in order to obtain official informations regarding earthquakes.Figure 4.9 shows an example of an INGV tweet reporting a seismic event.

26

Page 28: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figure 4.9: Example of INGV tweet reporting an earthquake.

The information reported in the tweet are extracted and used in the validation phase. The validation is performed each time that the crawler receives a tweet from INGV.To determine the detection of an event two conditions are needed:

1. the event has to be detected by the system AFTER the registration of the event provided by INGV equipments;

2. the event has to be detected by the system BEFORE the publication of the related tweet by INGV.

4.5 SimulatorThe application for the detection of events is real­time and is designed and built to perform its processing each time a new tweet is received. This fundamental characteristic makes it impossible to analyse previously stored data.Since in this study we collected more than 2 months of data and the application has been developed and tuned in a progressive manner over time, we considered useful and interesting to realize a system that allows the testing of the application with the whole dataset collected.

For this reason we built a simulator that reads data from the DB instead of the Twitter stream and provides all the tweets already collected as if they had just arrived from an opened stream.The use of the simulator allows a comprehensive analysis of the entire dataset in less than 3 hours.

27

Page 29: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

4.6 Damage Assessment

“The Preliminary Damage Assessment (PDA) process is a mechanism used to determine the impact andmagnitude of damage and resulting needs of individuals, businesses, public sector, and community as a

whole.” 13

As stated in the above­mentioned extract from a report by FEMA (Federal Emergency Management Agency in U.S.), the Damage Assessment (DA) is the process that allows to quantify the impact and the consequences of an emergency on communities and infrastructures.Typically, the DA process takes place in the aftermath of an emergency and involves specialized personnel that assess the consequences visiting the location of the event.We plan to use information shared by people involved in the event to perform DA: in particular we will analyze messages with generic keywords, not related to a specific type of event, but that can be related to the possible consequences of an emergency in terms of damages to infrastructure and/or people involved.

As a proof­of­concept of this approach, we studied what happened on Twitter during the tragic motorcycle crash that caused the death of the Italian pilot Andrea Antonelli, occurred in July 21, 2013 during the World Supersport Championship.Figure 4.10 shows trends of the tweets containing the keyword “incidente” (crash) and tweets containing generic keywords (pericolo, morto, ferito, danno) published in the same time window of the crash. There is a clear correspondence between the trend of the keyword “incidente” and the trend of the keyword “morto” (death).

13 www.fema.gov/media­library­data/06a812f5087951d7ed1f955d905f07c2/PDA_Report_FEMA­4135­DR­IA.pdf

28

Page 30: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

Figure 4.10: Temporal evolution of tweets with keyword “incidente” and tweets containing generic keywordsfor the DA.

Analyzing tweets collected from 12:30 to 15:30 we extracted the most frequent terms, with the frequency associated, listed below:

1. Antonelli (375);2. Andrea (343);3. Mosca (180);4. morto (159);5. pilota (155);

29

Page 31: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

6. supersport (139);7. gara (123);8. grave (98);9. superbike (95);10. italiano (90).

According to our hypothesis extracted terms was related to the crash.

In order to perform this task we semantically analyze texts extracted from tweets: we use the library Morph­it! to extract N­grams from the tweet corpus. We refer to an N­gram as a 14

sequence of N consecutive terms forming part of the text of a message .15

Morph­it! is a free tool for morphological analysis of texts in Italian language and is made from a dictionary of inflected forms with related lemmas and morphological characteristics (Fig. 4.11) [28].

Figure 4.11: Information extracted from a text using Morph­it!

After processing the corpus with Morph­it! we extract the most frequent N­grams. Analysis on these N­grams may highlight interesting information since some terms have special or stronger meaning if used in combination with others.

14 http://dev.sslmit.unibo.it/linguistics/morph­it.php15 http://it.wikipedia.org/wiki/N­gramma

30

Page 32: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

5. ResultsThis work addressed three aspects:

The definition of a general architecture of an Early Warning System for Emergency Management.

The design and development of an application, based on this architecture, aimed at the detection of seismic events in Italy. The application is a prototype.

The implementation of a simulator to assess the functioning of the prototype.

During the period of functioning of our Data Acquisition module 426 earthquakes were reported by INGV on Twitter. We chose to limit our analysis to earthquakes occurred on the Italian territory and having a magnitude greater than 3rd degree on the Richter scale.After this filtering we have 403 reports used for the evaluation of the system performance.

INGV shares on Twitter every event with a time delay of 10­20 minutes. Our system was very reactive: we provided reports on average in 30 seconds ­ 1 minute and, anyway, always within the first few minutes of the occurrence of the event.

5.1 Collected DataThe Data Acquisition module was active for more than two months, from July 19, 2013 to September 27, 2013.

During this period we collected tweets containing keywords belonging to all categories of events monitored. The total number of tweets collected is around 1.5 million, for an average of more than 21,400 tweets per day and less than 15 tweets per minute (disk space > 600Mb).

For the category of seismic events we collected 64,878 tweets (average: 926 tweets per day). The filtering process eliminates around the 88% of the collected tweets.These numbers help to better understand the the phenomenon of noise that affects the channel and the importance of the phases of data filtering.

In addition to the actual tweets coming from the stream of Twitter we also stored account information of users posting analyzed tweets (around 331,000 different Twitter users).

The entities extracted from the text of the tweets include photos and videos (media type), URLs, hashtags and mentions. We collected over 1.9 million entities (220Mb.)

All collected data occupy more 1Gb of disk space.

5.2 Analysis of the resultsWe cross­checked the information produced by the simulator about identified events with official

31

Page 33: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

reports from INGV. We identified 3 evaluation metrics of the system: True Positives (TP): events recognized by the system before the reporting of INGV and

subsequently confirmed by the institution; False Positives (FP): events detected by the system, but not confirmed by INGV; False Negatives (FN): events not detected by the system but reported by INGV.

These metrics are expressed in absolute values.

To take in account the number of events we also considered the Accuracy and the Recall measure. The Accuracy indicates the proportion of events correctly identified in relation to the total number of detected events.The Recall indicates the proportion of events correctly identified in relation to the total number of occurred events.Moreover, we consider the F­Measure metric that represents the harmonic mean of precision and recall.

Figure 5.1 summarizes the results of our study.

Magnitudo EventiUfficiali

TP FP FN Precisione(%)

Richiamo(%)

F­Measure(%)

> 2.0 403 16 41 387 28,07 3,97 6,96

> 2.5 101 15 41 86 26,79 14,85 19,11

> 3.0 25 12 18 13 40,00 48,00 43,64

> 3.5 10 8 3 2 72,73 80,00 76,19

> 4.0 6 4 0 2 100,00 66,67 80,00

> 4.5 1 1 0 0 100,00 100,00 100,00

Figure 5.1: Results.

Events with intensity lower than 3rd degree on the Richter scale are difficult to detect.

The False Positives represent the 10% of the total; it is reasonable to infer that these FP are caused by the noise on the channel.

Results show an high ability of the system in the identification of events with intensity equal or greater than 3.5th degree on the Richter scale. Despite the experimental nature of the system and the small size of the statistical sample, the results obtained are better than those achieved

32

Page 34: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

by Japanese researchers in the work described in [14] .We have to notice, however, the failure of the system in the detection of two events with intensity grade greater than 4. These events occurred during the night in a sparsely populated and mountainous areas where it is reasonable to assume that the use of Twitter is not common.

33

Page 35: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

6. Conclusions and future worksThis work demonstrates the feasibility of the proposed approach, defining and implementing a general architecture for an application aimed at the Early Warning Emergency Management in the field of seismic events in Italy.The results achieved by the prototype are overall better than those reported in similar works available in the literature.

The Data Acquisition module is one of the best results of this work.

During the realization some critical issues, requiring additional investigations, have emerged. Among these the task of data filtering and the task of event detection are particularly relevant.

The proposed architecture is very general and can be applied to other contexts (e.g. detection of fires, traffic problems).

Due to the growing interest in the use of social platforms this work can have both scientific and application results and paves the way for further studies and investigations.

34

Page 36: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

7. References

[1] http://beautifuldata.net/2013/01/social­sensors/

[2] http://www.thehomelandsecurityblog.com/2011/01/24/social­media­as­a­sensor­%E2%80%93­leveraging­crowd­sourced­data­for­early­warning­and­response/

[3] http://www.bordermail.com.au/story/1224941/opinion­social­media­a­life­saver/

[4] Nuwan Waidyanatha, “Towards a typology ofintegrated functional early warning systems”, Int. J.of Critical Infrastructures, 2010

[5] http://it.wikipedia.org/wiki/Rete_sociale

[6] http://fcw.com/articles/2013/07/09/fema­social­media.aspx

[7] http://www.ilmessaggero.it/PRIMOPIANO/CRONACA/terremoto­nord­italia­allarme­twitter­facebook/notizie/294954.shtml

[8] http://www.alert4all.eu/

[9] Nilsson Susanna (et al.), “Making use of New Mediafor pan­European Crisis Communication”,Proceedings of the Ninth International Conferenceon Information Systems for Crisis Response andManagement (ISCRAM), 2012

[10] Fredrik Johansson, Joel Brynielsson, and MaribelNarganes Quijano, “Estimating Citizen Alertness inCrises Using Social Media Monitoring and Analysis”,Proceedings of the 2012 European Intelligence andSecurity Informatics Conference (EISIC), 2012

[11] Nabil R. Adam, Basit Shafiq, Robin Staffin, "SpatialComputing and Social Media in the Context ofDisaster Management", IEEE Intelligent Systems, vol.27, no. 6, pp. 90­96, Nov.­Dec., 2012

[12] N. Adam, J. Eledath, S. Mehrotra, and N.Venkatasubramanian, “Social media alert andresponse to threats to citizens (SMART­C)”,Proceedings of CollaborateCom, 181­189, 2012

[13] V. Lampos, N. Cristianini, “Tracking the flu pandemicby monitoring the Social Web”, 2nd InternationalWorkshop on Cognitive Information Processing,2010

[14] Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo,“Earthquake shakes Twitter users: real­time eventdetection by social sensors”, Proceedings of the19th international conference on World wide web(WWW '10), 2010

[15] Gautam S. Thakur, Mukul Sharma, Ahmed Helmy,“SHIELD: Social sensing and Help In Emergency usingmobiLe Devices”, GLOBECOM, 2010

[16] http://www.emergenza24.org/

[17] http://www.socialmediaemergencymanager.com

[18] http://www.citysourced.com/

[19] http://www.epart.it/

[20] http://www.decorourbano.org/

[21] http://www.uptu.com

[22] Alexander Boettcher and Dongman Lee,“EventRadar: A Real­Time Local Event DetectionScheme Using Twitter Stream”, Proceedings of the2012 IEEE International Conference on GreenComputing and Communications (GREENCOM '12),2012

[23] Ryohei Ebina, Kenji Nakamura, and Shigeru Oyanagi,“A Real­Time Burst Detection Method”, Proceedingsof the 2011 IEEE 23rd International Conference onTools with Artificial Intelligence (ICTAI '11), 2011

[24] A. Rosi, M. Mamei, F. Zambonelli, S. Dobson, G.Stevenson,J. Ye, “Social sensors and pervasiveservices: Approaches and perspectives”, 2nd IEEEWorkshop on Pervasive Collaboration and SocialNetworking, 2011

[25] https://www.globalwebindex.net/twitter­now­the­fastest­growing­social­platform­in­the­world/

35

Page 37: S. Cresci, M. La Polla, A. Marchetti, M. Tesconi · Stefano Cresci, Mariantonietta La Polla, Andrea Marchetti, Maurizio Tesconi keywords social sensing, social media analysis, early

[26] A. Porta, G. Baselli, D. Liberati, N. Montano, C.Cogliati, T. Gnecchi­Ruscone, A. Malliani, and S.Cerutti, “Measuring regularity by means of acorrected conditional entropy in sympatheticoutflow”, Proceedings of Biological Cybernetics,71­78, 1998

[27] Paolo Ferragina, Ugo Scaiella, “Fast and AccurateAnnotation of Short Texts with Wikipedia Pages”,IEEE Software 29(1): 70­75, 2012

[28] Eros Zanchetta, Marco Baroni, “Morph­it! A freecorpus­based morphological resource for the Italianlanguage”, Proceedings of Corpus Linguistics,University of Birmingham, Birmingham, UK, 2005

[29] Quinlan, J. Ross, “C4.5: Programs for MachineLearning”, Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 1993

[30] Zhang, Xin and Shasha, Dennis, “Better BurstDetection”, IEEE Computer Society, 2006

[31] J. Kleinberg, “Bursty and hierarchical structure instreams”, Proceedings of the eighth ACM SIGKDDInternational Conference on Knowledge Discoveryand Data Mining (KDD ‘02), 2002

[32] Y. Zhu and D. Shasha, “Efficient elastic burstdetection in data streams”, Proceedings of the NinthACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD ‘03),2003

36