Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio...

15
C Consiglio Nazionale delle Ricerche #selfie: mapping the phenomenon S. Cresci, M. N. La Polla, M. Mazza, M. Tesconi, F. Del Vigna IIT TR-08/2016 Technical Report Maggio 2016 Iit Istituto di Informatica e Telematica

Transcript of Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio...

Page 1: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

C

Consiglio Nazionale delle Ricerche

#selfie: mapping the phenomenon

S. Cresci, M. N. La Polla, M. Mazza, M. Tesconi, F. Del Vigna

IIT TR-08/2016

Technical Report

Maggio 2016

Iit

Istituto di Informatica e Telematica

Page 2: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

#selfie: mapping the phenomenon

M. Mazza, S. Cresci, F. Del Vigna, M. N. La Polla, M. Tesconi

Abstract The selfie era ALOS project

Data gathering Selfie statistics

Conclusion Bibliografia Sitografia

ACM Classification H.2.8 Database Applications ­ Data mining H.3.1 Content Analysis and Indexing

Abstract The introduction of smartphones equipped with a front camera and constantly connected to social networks has encouraged the spread of user­generated content of multimedia nature. This led to the emergence of new social phenomena that have a strong impact on society., like the selfie, a modern evolution of the self­portrait usually taken with a digital camera or a camera phone. In the last three years, especially on Instagram, the selfie trend got popularity. Defined in 2013 as "a photograph that one has taken of oneself, typically one taken with a smartphone or webcam and shared via social media", by the Oxford Dictionary, which added selfie to its lexicon and then named it name of the year. This work analyzes the phenomenon inside Instagram, a mobile social­based network created specifically for image sharing. Given the strong social implications, we tried to understand the origin of this practice, the psychological and sociological reasons that gave birth to the trend. Then, through the web­based platform ALOS (A Lot Of Selfies) we collected information from instagram, performed face recognition and calculated statistics about the trend. As a case study, over 2 million selfies shared on Instagram in January­February 2015 were analyzed. This highlighted how the selfie phenomenon is perceived differently in various cultures and societies. In particular, the results show that factors such religion, sex, customs and geo­political situations affect the space­time distribution of selfies around the world.

Page 3: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

The selfie era Social Network The emergence and spread of new technologies affects our daily lives, creating new ways to communicate, relate, think and act. Social networking websites (SNSs) have become an integral channel for communication and self­expression in the life of many; the term SNS refers to a web­based services that allow individuals to construct a public or semi­public profile within a bounded system, articulate a list of other users with whom they share a connection, and view and traverse their list of connections and those made by others within the system (boyd and Ellison, 2007). We use SNSs to find satisfaction of certain needs, such as association and self­esteem through individuals who share similar interests and with whom there are often bonds even outside the social networks, like friends, family and peers (Krasnova and others, 2008). Thanks to social networks we can stay in touch with the world outside, keep us updated on what's happening, tell about us and compare us with others. The use of social networks has undergone an exponential growth in the last decade and it involves an increasing number of individuals: Facebook, the most popular social network, counts 1 and a half billion users around the world, with an average of nearly 1 billion daily active users (Facebook, Company Info); on Twitter there are 316 million monthly active users that produce 500 million tweets per day (Twitter, Press); Instagram, a mobile photo sharing application, exhibits 400 million monthly active users, which load about 85 million of pictures every day, which sum to the 40 billion pictures already hosted on the platform from October 2010 (Instagram, Press Page). Unlike most popular social networks, Instagram allows only to share pictures and videos, to which the user can apply filters and associate hashtags (terms preceded by the # character) that allow the user to categorize the media shared by topics. At the same time it’s a social network where you can follow the photographic work of other users, leave a comment and show preferences for specific pictures. Filters modify the images recorded. Sometimes they are used to make only subtle changes to images; other times the image would simply not be possible without them (Wikipedia, Optical Filter). Instagram filters, unlike traditional filters, are not accessories but a set of settings and software changes that are applied to the digital image after the shot. The user can choose which filter apply to the picture among the available, each of which brings different aesthetic alterations. All filters however have in common, the tendency to give your image a more analogic and antique look (Crouch, 2012). An hashtag is a word or a short phrase preceded by the character # (hash), it works as a label for the media to which it refers, in the case of Instagram a picture or a video. Through hashtags is possible to classify and search for pictures or videos related to a certain subject, they can be used to follow or tracing an event, but also to make what a user shares accessible to a wider audience exploiting the popularity of certain words.

Page 4: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

The selfie exposes the individual's image on the network satisfying his need to tell about him and confront himself with others; it’s a self­representation tool, but also a communication tool. Thus confirms the thesis of Marshall McLuhan, according to which the transmitted message consists in the same nature of the medium, namely that "the medium is the message (McLuhan, 1967) ": the phenomenon of the selfie has become in short time the most appropriate way to convey the message that is "myself". This message is not only formed by the self­portrait, but is enriched by metadata: additional information that may be provided by the same user or automatically generated by the service used. For the purposes of this work, we developed ALOS (A Lot Of Selfies), a web­based platform with the aim to collect users’ selfies shared on Instagram and analyze them in order to obtain information about the phenomenon and its relationship with the today society.

ALOS project The aim of this project is to collect information provided by Instagram users through their selfies in order to derive a database on which perform analyses regarding the selfie phenomenon and its distributions. ALOS is a web platform which collects information from the pictures shared on Instagram with the hashtag #selfie and enriches it with data coming from the analysis of faces in the pictures and geographical locations associated to media. The project is divided into two functionally distinct parts: data gathering and data analysis.

Data gathering The gathering process is performed by a crawler that downloads data periodically using the Instagram APIs. The APIs provided by the service are RESTful; they require the registration and authentication via the OAuth protocol through user private key. OAuth is a protocol through which an application (or a web service) can securely manage authorized access to sensitive data and is compatible with all types of applications: desktop, web and mobile. The API are questioned by the crawler (written in PHP5), which hourly requires to the endpoint (a URL address through which a user can access a specific online service) data about the most recent pictures shared on the platform accompanied by the hashtag #selfie. Instagram API usage is limited, and it is not possible to exceed the rate limit of calls imposed by the social network, which corresponds to 5000 hourly calls. We polled the endpoint about once per hour to retrieve media about the hashtag “#selfie”, exploiting pagination and maximizing the page size to reduce calls. We observed that the number of requests maintains less than 700 calls per hour, with an average of 500 calls per hour. Collected data is stored temporarily in a mongoDB instance, a NOSQL database. This choice was driven by the need a fast storage for data, with the advantage of using the same format returned by the Instagram API: the JSON. Face recognition The data collected by the crawler is then processed and integrated with additional metadata obtained through a face recognition and analysis API and with geographical information.

Page 5: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

We used Face++ , a facial recognition service, that provides detection and analysis of faces 1

in images. The integration of this information to crawled data enriches media content for finer analyses. Given a face, the service provides information about gender, age, race, smile of the subjects analyzed and recognizes whether the subject is wearing eyeglasses or sunglasses. Face++ API requires authentication through a pair of keys to limit the access to the endpoints from no registered applications. The informations about the gender and race are accompanied both by a confidence attribute, a numeric value which ranges from 0 to 100 that measures the assurance that the algorithm has about the response. For the age attribute, on the contrary, the endpoint provides the confidence interval, thus smaller values mean greater accuracy of the algorithm. The Face++ API requests are not limited, but the analysis takes time to be accomplished. For each face found in the picture, the algorithm requires from 10 to 20 seconds to provide the results. To speed up the execution, ALOS uses a scheduler wrote in PHP with the aim to run and manage 10 different analysis script at the same time. Each script analyzes a block of 1000 pictures and, once the results are produced and saved, reports to the scheduler the ending of the task. The scheduler check the correctness of termination and runs a new script that analyzes the next block of pictures. In case of failure or improper execution of a script, the scheduler runs a recovery script which starts the analysis from the picture of the block not analyzed yet. In this way, every 15 seconds are analyzed about 10 faces. In this phase, after the face recognition, data is stored into a relational database to make more simple query operations throught SQL, structuring the data in different tables in relationship between them. Then data is enriched with geographic information exploiting the coordinates provided by the Instagram API and using the service offered by Geonames , a geographical database 2

accessible via web service. The Geonames API is used to perform reverse geocoding starting from the coordinates provided by the Instagram APIs. In this way we obtain information about the country where the selfie was shot and, through this, the time zone in order to determine the time at which the selfie was captured. Geonames service requires an authentication through a username, that is the same for both the endpoints we used: one to obtain the country where the coordinates belong to and one to determine the time zone expressed as the difference in hours compared to the Greenwich Mean Time. About the data collected During the data gathering phase we monitored the time distribution of media and we ended up that are not evenly distributed throughout the day. Compared to the daily average that consist of about 17,000 selfies shared every hour, the volume is larger of about 2,000 units from 2:00 PM to 10:00 PM.

1 http://www.faceplusplus.com 2 http://www.geonames.org/

Page 6: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

The crawler has been running from the 00:00 PM of 01/18/2015 to 00:00 PM of the 02/15/2015 (4 weeks), collecting a total of 14,155,544 pictures. We selected only selfies that contains geo data since they were more significant to the analysis, thus reducing the dataset to 2,889,232 selfies, which correspond to the 20% of the total. During the face analysis phase, we discarded the pictures that didn’t contain informations about the faces. This was caused by several reasons: false positives were generated by pictures with the hashtag selfie that were not selfies, algorithmic limits of Face++, low image quality, or the case of a picture removed from Instagram during the time elapsed from the gathering phase to the analysis phase. At the end of the cleaning process the number of selfies decreased to 1,032,535, which contain 2,043,273 faces, with a face/picture ratio that comes close di 2:1 (1.98:1).

Selfie statistics The first observed data relate to the number of selfies shared in each country (tab. 1). United States appears to be the nation were most selfies were shared during the four weeks, remarkable is the difference from the second country in the ranking, Italy, that is close to 100,000 selfies. A justification for this significant gap between United States and the other countries can be given by its high population of over than 310 million people, making it the third most populous country after China and India. In China however, access to Instagram is blocked by the government (Ghezzi, 2015), while in India purchase a smartphone is a luxury reserved to a small part of the population, since nearly 75% of it live with less than 2$ per day (The Times of India, One­third of world’s poor in India: Survey).

Country Number of selfies

United States 175,605

Italy 76,505

Turkey 67,090

Brazil 65,651

United Kingdom 53,414

Table 1. The five countries with most selfies

Selfies per country volumes were then normalized and plotted using a choropleth map (fig. 1). The shade of colors of the map are proportional to the “selfie density”, namely the ratio among the number of selfies in the country and its population. The map put in evidence that countries with low “selfie density” are generally the most economically depressed areas on the planet, or the countries where there are strong personal freedom limits. China, for instance, despite it’s the country with the highest

Page 7: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

population value has a low “selfie density”, ranging from 0.000013 to 0.000029 per inhabitant. The observe the very same situation in North Korea, which is ruled by a government that restricts internet usage (Jacobs, 2013). We observe also that low “selfie density” values are present in countries with very low HDI (Human Development Reports, 3

Data) like the countries located in the central part of Africa, in particular Niger, Republic of Congo, Central Africa, Chad and Sierra Leone which occupy the last five positions in the world rankings based on the HDI value. Also Afghanistan, involved in a war (Zurleni, 2014), exhibits a small number of shared selfies compared to the number of inhabitants. Instead, countries with a stable economic situation and with free internet access, have the highest “selfie density”. Notably, Italy is the second country for SIM card, with an average of 1.59 SIM per person (Ceglia, 2013) and is visited every month by thousands of tourists, and these could some reasons why it is represented as the darkest country on the map, with a “selfie density” that ranges from 0.015 to 0.032.

Figure 1. Choropleth map of selfie density

The 61.84% of subjects present in the selfies according with the Face++’s APIs resulted as females, while the remaining 38.16% as males (tab. 2). It’s evident that the selfie is more practiced by women, confirming the research result of Giuseppe Riva, which demonstrated disparities between the number of selfies made by women and men by showing that women are more affected by inner motivation to share a selfie (Riva, 2014). Analyzing the selfies in each country, we observed that in Islamic countries there’s a more balanced distribution between sexes. This is presumably due to the customs in those country that forces women to wear a veil to cover their faces, that makes selfies

3 Human Development Index

Page 8: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

meaningless for any social purposes; in some countries like Iraq and Syria, the male percentage exceeds the 80%.

Male Female

Global 38.16% 61.84%

United States 34.06% 65.94%

Italy 38.65% 61.35%

Turkey 46.44% 53.56%

United Arab Emirates 52.44% 47.56%

Syria 80.75% 19.25%

Iraq 85.13% 14.87%

Table 2. Sex distribution

Face++’s APIs algorithm found that the 60.14% of selfies have only one subject, the 12.23% have a couple of subjects and the 16.74% have group of three or more subjects (tab. 3).

Alone Couple Group

Global 60.14% 23.12% 16.74%

United States 71.45% 19.66% 8.89%

Italy 57.58% 26.38% 16.04%

United Kingdom 66.01% 23.50% 10.49%

Indonesia 36.61% 20.47% 42.92%

Table 3. Number of subjects distribution

Face++ API recognized the race of 64.09% of faces as white, the 28.38% as asian and the 7.53% as black (tab. 4). Recognition performance may be affected by light conditions and accessories weared by the subjects that cover the face. Anyway in central Africa countries, the percentage of black race is higher than in the other, and the same occurs with asian race in oriental countries.

White Black Asian

Global 64.09% 7.53% 28.28%

Page 9: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

United States 65.01% 12.18% 22.81%

Italy 81.51% 4.83% 13.59%

Nigeria 15.56% 72.23% 12.21%

South Korea 12.52% 1.60% 85.89%

Table 4. Race distribution

The phenomenon was then analyzed through its distribution among different age ranges (fig. 2). From the results is visible that the selfie practice is spread especially among young people, which are defined as those aged from 18 to 24 years (typically college age). The practice is also diffused among young adults, those aged from 25 to 29, is more diffused among adults who range from 30 to 34 than teenagers who range between 13 and 17. Is noticeable that selfie practice decreases as the age range increases, maybe due to the difficulties that aged people encounter in front of new technologies such as smartphones and Social Networks.

Figure 2. Age range distribution

A temporal distribution based on days (fig. 3) shows that the number of selfies decreases during the week (starting from Monday) until the weekend (Friday, Saturday, Sunday) where instead it begin to rise, reaching the highest peak on Sunday, the public holiday day in the Christian religion which is the most spread one (Adherents, Major Religions of the World Ranked By Number of Adherents).

Page 10: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

Figure 3. Weekly distribution

In this regard, it is significant observing those countries where the public holiday day is different from Sunday. For example, in some Islamic countries like Iran, Egypt and the United Arab Emirates in which Friday is a non­working day, the distribution of selfies across the week is more homogenous. In the case of the United Arab Emirate (fig. 4)s, where the 76% of the population is Muslim (CIA, The world factbook) an higher peak is recorded on Friday, the Islamic prayer day.

Figure 4. Weekly distribution in United Arab Emirates

Even in the Jewish countries there’s a relationship between religion and distribution. In Israel, 75% of the population is Jewish and according with this religion the public holiday day

Page 11: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

is the Sabbath (Saturday), the same day where the highest distribution peak is recorded (fig. 5). These data highlights a correlation between the daily distributions and the public holidays days of the most practiced religions in countries.

Figure 5. Weekly distribution in Israel

The hourly global distribution (fig. 6) shows us that the the concentrations are located between 8:00 PM and 11 PM which coincides with the dinner and post dinner time in most countries of the world.

Figure 6. Hourly distribution

Page 12: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

We exploited a Face++ algorithm function to measure the smiling degree of subjects in selfies and compiled a ranking of the countries with the greater average of smiling in countries that shared at least 500 selfies (tab. 5).

Smiling Average

El Salvador 57

Brazil 56

Panama 56

Venezuela 56

Dominican Republic 54

Table 5. Top 5 countries with the highest smile average

It’s worth to note that the top five countries with the highest average smile are located in Southern America, 3 of which can be found in the top 10 Happy Planet Index list (fig. 7). This list measures happiness of the countries in a scale from 1 to 100 through three factors: life expectancy, wellbeing and ecological footprint.

Figure 7. Top 10 happiest countries according with the Happy Index Planet.

Source: Movehub, Happiness Map The platform is located at the following url: http://wafi.iit.cnr.it/mappaselfie.

Page 13: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

Conclusion The development of ALOS highlighted how much tight are different societies and the selfie phenomenon. The analyses we carried out highlighted connections among cultural traits and the way we share selfies. This work opens future perspectives also about the way faces features affect the post engagement in social media, linked also to the work of Nicola Bruno about self portraits and the tendency of subjects to show the right side of the face (Bruno, Bertamini, 2012). Although it is not the aim of the selfie, environment could affect the frequency of the selfie practice: some locations might be preferred to others, and through the geographical data it’s possible to understand whether there exist such preferences. Selfie actually looks like a living mirror of the digital era that we are living, where everyone can capture his own image that goes to join to those of other, giving origin to a reflection of our society.

Page 14: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

Bibliography Boyd, Danah & Nicole Ellison. 2007. Social network sites: Definition, history, and scholarship, “Journal of Computer­Mediated Communication”, vol. 13(1), articolo 11, p. 2. Bruno, Nicola & Marco Bertamini. 2012. Self­Portraits: Smartphones Reveal a Side Bias in Non­Artists. “PLoS ONE”, vol. 8(2). Crouch, Ian. 2012. Instagram’s Instant Nostalgia. “The New Yorker”. http://www.newyorker.com/culture/culture­desk/instagrams­instantnostalgia (Examined the 12/09/2015). De Ceglia, Vito. 2013. India e Italia, più Sim che bancomat. “La Repubblica”. http://www.repubblica.it/economia/rapporti/cloudeconomy/ricerca/2013/09/03/news/india_e_italia_pi_sim_che_bancomat­65814055/ (Examined the 22/10/2015). Fan, Haoqiang, Cao Z., Jiang Y., Yin Q. and Doudou C. 2014. Learning Deep Face Representation, “CoRR", abs/1403.2802. Fan, Haoqiang, Yang M., Cao Z., Jiang Y. and Yin Q. 2014. Learning Compact Face Representation: Packing a Face into an int32, “MM”. Ghezzi, Cecilia Attanasio. 2015. Cina, quando la censura diventa protezionismo. “Il fatto quotidiano”. http://www.ilfattoquotidiano.it/2015/01/30/cina­quando­la­censura­diventa­protezionismo/1382797/ (Examined the 25/09/2015). Krasnova, Hannah, Hildebrand H., Günther O., Kovrigin A. & Nowobilska A. 2008. Why Participate in an Online Social Network: An Empirical Analysis, “Proc. 16th European Conf. on Information Systems”. (ECIS 2008). McLuhan, Marshall. 1964. Gli strumenti del comunicare. Milano, Il Saggiatore. Riva, Giuseppe. 2104. The human brain and social media: how are individuals changing? “Fondazione IBSA”. Jacobs, Andrew. 2013. Visit by Google Chairman May Benefit North Korea. “The New York Times”. http://www.nytimes.com/2013/01/11/world/asia/eric­schmidt­bill­richardson­north­korea.html (Examined the 24/10/2015). Zhou, Erjin, Fan H., Cao Z., Jiang Y., & Qi Y. Extensive Facial Landmark Localization with Coarse­to­fine Convolutional Neural Network. 2013. “IEEE International Conference on Computer Vision Workshops (ICCVW)”, pp. 386­391.

Page 15: Consiglio Nazionale delle Ricerche - IIT - CNR IIT 08-2016... · 2016-05-19 · C Consiglio Nazionale delle Ricerche . #selfie: mapping the phenomenon. S. Cresci, M. N. La Polla,

Zhou, Erjin, Cao Z. & Qi Y. 2015. Naive­Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?, “CoRR", abs/1501.04690. Zurleni, Michele. 2014. Afghanistan: finita la guerra, la guerra continua. “Panorama”. http://www.panorama.it/news/esteri/obamamania/afghanistanfinita­guerra­guerra­continua/ (Examined the 27/09/2015).

Webography Adherents. Major Religions of the World Ranked by Number of Adherents. http://www.adherents.com/Religions_By_Adherents.html (Website examined the 24/09/2015). CIA. The world factbook. https://www.cia.gov/library/publications/theworld­factbook/geos/ae.html (Website examined the 12/10/2015). Facebook. Company Info. http://newsroom.fb.com/company­info/ (Website examined the 12/09/2015). Human Development Report. Data. http://hdr.undp.org/en/data (Website examined the 26/10/2015). Instagram. Press. https://instagram.com/press/ (Sito consultato a 12/09/2015) The Times of India. 2008. One­third of world’s poor in India: Survey. http://timesofindia.indiatimes.com/india/One­third­of­worldspoor­in­India­Survey/articleshow/3409374.cms (Website examined the 14/10/2015). Twitter. About. https://about.twitter.com/company/press (Website examined the 12/09/2015). Wikipedia. Optical Filter. https://en.wikipedia.org/wiki/Optical_filter (Website examined the 18/09/2015).