Papers & Posters
1. Sustainability Implications of Open Government Data: A Cross-Regional Study
Alison Koczanski, MODUL University Vienna, Vienna, Austria
Marta Sabou, Vienna University of Technology, Vienna, Austria
Complementing studies on the economic impact of Open Government Data (OGD), we investigate how this novel Web-enabled movement supports sustainability. An analysis of OGD-based applications reveals that: (1) OGD supports all three pillars of sustainability; (2) citizens and app developers alike are receptive and motivated by sustainability implications of OGD; and (3) few regional differences exist between Vienna and New York City. We derive recommendations for further improving the sustainability impact of OGD.
2. Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis
Derek Greene, School of Computer Science & Informatics, University College Dublin, Ireland
James P. Cross, School of Politics & International Relations University College Dublin, Ireland
This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those speeches. To detect latent themes in legislative speeches over time, speech content is analyzed using a new dynamic topic modeling method, based on two layers of matrix factorization. This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that the political agenda of the EP has evolved significantly over time, is impacted upon by the committee structure of the Parliament, and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis have a significant impact on what is being discussed in Parliament.
Paul Laufer, Graz University of Technology Graz, Austria
Claudia Wagner, GESIS & U. of Koblenz Cologne, Germany
Fabian Flöck, GESIS Cologne, Germany
Markus Strohmaier, GESIS & U. of Koblenz Cologne, Germany
For many people, Wikipedia represents one of the primary sources of knowledge about foreign cultures. Yet, different Wikipedia language editions offer different descriptions of cultural practices. Unveiling diverging representations of cultures provides an important insight, since they may foster the formation of cross-cultural stereotypes, misunderstandings and potentially even conflict. In this work, we explore to what extent the descriptions of cultural practices in various European language editions of Wikipedia differ on the example of culinary practices and propose an approach to mine cultural relations between different language communities trough their description of and interest in their own and other communities' food culture. We assess the validity of the extracted relations using 1) various external reference data sources (i.e., the European Social Survey, migration statistics), 2) crowdsourcing methods and 3) simulations.
4. Information and Communication Technologies (ICTs) and Peacebuilding: a Conceptual Framework
Jennifer R Welch, Web Science Doctoral Training Centre, University of Southampton, UK
Susan Halford, Department of Sociology, Social Policy and Criminology, University of Southampton, UK
Mark Weal, Electronics and Computer Science, University of Southampton, UK
The emerging practice of using information and communication technologies (ICTs), including the web, SMS, Geographic Information Systems and others, in peacebuilding projects has over the past few years generated growing interest from donors, practitioners and more recently academia. This is in large parts due to three trends: the observed role of new media in conflict situations; the attention given to digital data for conflict analysis, humanitarian and development work; and the emerging use of new forms of ICTs in peacebuilding activities. This interest however leaves implicit the range of constructive contributions ICTs can make to peacebuilding and conflict transformation processes and ways to conceptualise this emerging field. Moreover work undertaken in this area often falls within disciplinary silos that are not conducive to gaining a holistic perspective of the wider implications of using ICTs in peacebuilding contexts. Using an interdisciplinary approach, this paper proposes a framework for understanding some of the constructive contributions ICTs have to make to peacebuilding and conflict transformation processes. Grounded in current debates around the ‘liberal peace’ this framework allows us to conceptualise ICTs as sociotechnical phenomena, moving beyond ideas of ‘solving problems’ through technology or a focus on external interventions. Instead the analytical emphasis shifts to the co-evolutive nature of local and other uses of technology, in situations where complex power dynamics are at play, and as such allows us to better understand the technologies' emergent properties, providing a more comprehensive account of their wider societal impacts.
5. Avoiding Chinese Whispers: Controlling End-to-End Join Quality in Linked Open Data Store
Jan-Christoph Kalo, Technische Universität Braunschweig, Germany
Silviu Homoceanu, Technische Universität Braunschweig, Germany
Jewgeni Rose, Technische Universität Braunschweig, Germany
Wolf-Tilo Balke, Technische Universität Braunschweig, Germany
Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holistic view for a wide range of data items or entities. But parallel to the problems of database integration or schema matching, linking data over several sources remains a challenge and is currently severely hampering the vision of a working Semantic Web. One possible solution are instance matching systems that automatically create owl:sameAs links between data stores. According to existing benchmarks, the matching quality has even reached a satisfying level. However, our extensive analysis shows that instance matching systems are not yet ready for large-scale data interlinking. This is because query processors joining even via a single incorrectly created link implicitly use also all transitive owl:sameAs links that may in turn be mismatched again. The result is similar to the game Chinese Whispers: watered-down sameAs semantics step-by-step lead to a terrible end-to-end quality of joins. We develop innovative structural mechanisms on top of instance matching systems to significantly improve query processing avoiding Chinese Whispers.
6. Big Data? Big Issues: Degradation in Longitudinal Data and Implications for Social Sciences
Matthew S. Weber, Rutgers University, USA
Hai Nguyen, Rutgers University, USA
This article analyzes the issue of degradation of data accuracy in large-scale longitudinal data sets. Recent research points to a number of issues with large-scale data, including problems of reliability, accuracy and quality over time. Simultaneously, large-scale data is increasingly being utilized in the social sciences. As scholars work to produce theoretically grounded research utilized “small-scale” methods, it is important for researchers to better understand the critical issues associated with the analysis of large-scale data. In order to illustrate the issues associated with this type of research, a case study analysis of archival Internet data is presented focusing on the issues of degradation of data accuracy over time. Suggestions for future studies are given.
7. A Linked Data Scalability Challenge: Concept Reuse Leads to Semantic Decay
Paolo Pareti, University of Edinburgh, UK
Ewan Klein, University of Edinburgh, UK
Adam Barker, University of St Andrews, UK
The increasing amount of available Linked Data resources is laying the foundations for more advanced Semantic Web applications. One of their main limitations, however, remains the general low level of data quality. In this paper we focus on a measure of quality which is negatively affected by the increase of the available resources. We propose a measure of semantic richness of Linked Data concepts and we demonstrate our hypothesis that the more a concept is reused, the less semantically rich it becomes. This is a significant scalability issue, as one of the core aspects of Linked Data is the propagation of semantic information on the Web by reusing common terms. We prove our hypothesis with respect to our measure of semantic richness and we validate our model empirically. Finally, we suggest possible future directions to address this scalability problem.
8. Ranking Buildings and Mining the Web for Popular Architectural Patterns
Ujwal Gadiraju, L3S Research Center, Leibniz Universität Hannover, Germany
Stefan Dietze, L3S Research Center, Leibniz Universität Hannover, Germany
Ernesto Diaz-Aviles, IBM Research Dublin Research Lab, Ireland
Knowledge about the reception of architectural structures is crucial for architects and urban planners. Yet obtaining such information has been a challenging and costly activity. However, with the advent of the Web, a vast amount of structured and unstructured data describing architectural structures has become available publicly. This includes information about the perception and use of buildings (for instance, through social media), and structured information about the building's features and characteristics (for instance, through public Linked Data). Hence, first mining (i) the popularity of buildings from the social Web and (ii) then correlating such rankings with certain features of buildings, can provide an efficient method to identify successful architectural patterns. In this paper we propose an approach to rank buildings through the automated mining of Flickr metadata. By further correlating such rankings with building properties described in Linked Data we are able to identify popular patterns for particular building types (airports, bridges, churches, halls, and skyscrapers). Our approach combines crowdsourcing with Web mining techniques to establish influential factors, as well as ground truth to evaluate our rankings. Our extensive experimental results depict that methods tailored to specific structure types allow an accurate measurement of their public perception.
9. An Ethnomethodologically-Informed Approach to Interface Design to Support Collective Web Practice Around Video
Anna Zawilska, Department of Computer Science, University of Oxford, UK
Steven Albury, Department of Education, University of Oxford, UK
As video on the Web becomes a more interactive medium, as opposed to broadcast only, there is an opportunity to incorporate the interactive features of video annotation into Web video interfaces. Existing studies into collaborative video annotation provide a rather interactionally decontextualized view of collaboration: there exists only minimal understanding of the situated practice of collaborative video annotation, as it may be applied to the design of Web interfaces. At the same time, studies of situated practice in other research areas such as Computer-Supported Collaborative Work have provided substantive improvements in Web interface design to support collaboration. Therefore, we propose there is an opportunity to use an understanding of the situated practice of collaborative video annotation to design a Web video annotation interface. A method that is commonly used for these studies is ethnomethodology, which examines in detail the observable-reportable characteristics of practice of social activity as accomplished by the participants in the activity. We discuss three important issues that need to be addressed so that an ethnomethodologically-informed approach can be applied to the development of a Web video annotation interface: establishing a site for data elicitation, generalization, and the paradox of technomethodology. Having addressed each issue in turn, we then use a fragment of data to illustrate an ethnomethodologically-informed approach to surfacing insights into collaboration, as well as implications for Web video annotation interface design, which would be difficult if not impossible to surface with other approaches not informed by ethnomethodology.
10. Self Curation, Social Partitioning, Escaping from Prejudice and Harassment: the Many Dimensions of Lying Online
Max Van Kleek, Web and Internet Science, University of Southampton, UK
Dave Murray-Rust, School of Informatics, University of Edinburgh, UK
Amy Guy, School of Informatics, University of Edinburgh, UK
Daniel A. Smith, Web and Internet Science, University of Southampton, UK
Kieron O'Hara, Web and Internet Science, University of Southampton, UK
Nigel R. Shadbolt, Web and Internet Science, University of Southampton, UK
Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online society.
11. Anonymity and Online Commenting: The Broken Windows Effect and the End of Drive-by Commenting
Rolf Fredheim, Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge, UK
Alfred Moore, Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge, UK
John Naughton, Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge, UK
In this study we ask how regulations about commenter identity affect the quantity and quality of discussion on commenting fora. In December 2013, the Huffington Post changed the rules for its comment forums to require participants to authenticate their accounts through Facebook. This enabled a large-scale ‘before and after’ analysis. We collected over 42m comments on 55,000 HuffPo articles published in the period January 2013 to June 2014 and analysed them to determine how changes in identity disclosure impacted on discussions in the publication's comment pages. We first report our main results on the quantity of online commenting, where we find both a reduction and a shift in its distribution from politicised to blander topics. We then discuss the quality of discussion. Here we focus on the subset of 18.9m commenters who were active both before and after the change, in order to disentangle the effects of the worst offenders withdrawing and the remaining commenters modifying their tone. We find a ‘broken windows’ effect, whereby comment quality improves even when we exclude interaction with trolls and spammers.
12. Time to Introduce Myself! Impact of Self-disclosure Timing of Newcomers in Online Discussion Forums
Di Lu, University of Pittsburgh, USA
Rosta Farzan, University of Pittsburgh, USA
Newcomers face various difficulties entering any communities and online forums are no exception. Due to the lack of familiarity and commitment to the group, newcomers are particularly sensitive to their early-on experiences in the forums. As a support mechanism to help newcomers blend into the group, online forums often encourage newcomers to introduce themselves upon joining the group. In this work we explored how the timing of these introduction influences newcomers' incorporation to the group. We found that providing introduction after some initial activities in the forum is associated with positive outcomes in terms of newcomers' contribution and commitment.
13. Observing Social Machines Part 2: How to Observe?
David De Roure, Oxford e-Research Centre, University of Oxford, UK
Clare Hooper, IT Innovation Centre, University of Southampton, UK
Kevin Page, Oxford e-Research Centre, University of Oxford, UK
Ségolène Tarte, Oxford e-Research Centre, University of Oxford, UK
Pip Willcox, Bodleian Libraries, University of Oxford, Oxford, UK
Social machines are increasingly attracting study. In our paper “Observing Social Machines Part 1: what to observe?” we scoped the task of observing them. Several exercises that have followed have further informed our thinking and methodologies. Here, in Part 2, we reflect on how to observe? We promote a variety of methodologies that transcend the study of individual social machines, recognizing social machines as co-constituted processes within the evolving Web, and the intersection of social machines with the physical world through the Internet of Things. Our approaches emphasize the importance of sociality and human-centric perspectives.
14. What can be Found on the Web and How: A Characterization of Web Browsing Patterns
Alexey Tikhonov, Yandex Moscow, Russia
Liudmila Ostroumova Prokhorenkova, Yandex Moscow, Russia
Arseniy Chelnokov, Yandex Moscow, Russia
Ivan Bogatyy, Google Mountain View, USA
Gleb Gusev, Yandex Moscow, Russia
In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin, the last page before the user gets to the domain of the target page, and the target page referrer. Each point can be of several types, which leads to 56 possible patterns. The distribution of the browsing paths over these patterns forms the navigational profile of a web page. We conducted a comprehensive large-scale study of navigational profiles of different web pages. First, we demonstrated that the navigational profile of a web page carry crucial information about the properties of this page (e.g., its popularity and age). Second, we found that the Web consists of several typical non-overlapping clusters formed by pages of similar ranges of incoming traffic. These clusters can be characterized by the functionality of their pages.
15. Online Footsteps to Purchase: Exploring Consumer Behaviors on Online Shopping Sites
Munyoung Lee, Dept. of Computer Science and Engineering, Seoul National University, South Korea
Taehoon Ha, Dept. of Computer Science and Engineering, Seoul National University, South Korea
Jinyoung Han, Dept. of Electrical and Computer Engineering University of California, Davis, USA
Jong-Youn Rha, Dept. of Consumer Science Seoul National University, South Korea
Ted “Taekyoung” Kwon, Dept. of Computer Science and Engineering Seoul National University, South Korea
As an important part of the Internet economy, online markets have gained much interest in research community as well as industry. Researchers have studied various aspects of online markets including motivations of consumer behaviors on online markets. However, due to the lack of log data of consumers' online behaviors including their purchase, it has not been thoroughly investigated or validated on what drives consumers to purchase products on online markets. Our research moves forward from prior studies by analyzing consumers' actual online behaviors that lead to actual purchases, and using datasets from multiple online shopping sites that can provide comparisons across different types of online shopping sites. We analyzed consumers' buying process and constructed consumers' behavior trajectory to gain deeper understanding of consumer behaviors on online mar kets. We find that a substantial portion (24%) of consumers in a general-purpose marketplace (like eBay) discover items from external sources (e.g., price comparison sites), while most (>95%) of consumers in a special-purpose shopping site directly access items from the site itself. We also reveal that item browsing patterns and cart usage patterns are the important predictors of the actual purchases. Using behavioral features identified by our analysis, we developed a prediction model to infer whether a consumer purchases item(s). Our prediction model of purchases achieved over 80% accuracy across four different online shopping sites.
16. Building a Social Machine: Co-designing a TimeBank for Inclusive Research
Clare J. Hooper, IT Innovation Centre, University of Southampton, UK
Melanie Nind, Southampton Education School, University of Southampton, UK
Sarah Parsons, Southampton Education School, University of Southampton, UK
Andrew Power, Geography & Environment, University of Southampton, UK
Anne Collis, Barod Community Interest Company, Bangor, UK
This paper discusses the construction of a Social Machine, a socio-technical system in which people achieve new, creative goals enabled by automated processes that are handled by technology. Specifically, the Social Machine is an online TimeBank, a time-based way for people to give and receive services; it is designed for use in the context of inclusive research (initially) with people with learning disabilities. We describe the use of physical and digital (online) focus groups to gather inputs to drive the construction of the TimeBank, and the processes by which we analysed the data to inform the design of the TimeBank. Our goal is to create an online community with a sense of connectedness, and we discuss this work through that lens, presenting insights gained towards: building the TimeBank itself; methodological implications of related but separate physical and digital focus groups; and building Social Machines.
17. From Chirps to Whistles: Discovering Event-specific Informative Content from Twitter
Debanjan Mahata, Department of Information Science, University of Arkansas at Little Rock, USA
John R. Talburt, Department of Information Science, University of Arkansas at Little Rock, USA
Vivek Kumar Singh, Department of Computer Science, South Asian University New Delhi, India
Twitter has brought a paradigm shift in the way we produce and curate information about real-life events. Huge volumes of user-generated tweets are produced in Twitter, related to events. Not, all of them are useful and informative. A sizable amount of tweets are spams and colloquial personal status updates, which does not provide any useful information about an event. Thus, it is necessary to identify, rank and segregate event-specific informative content from the tweet streams. In this paper, we develop a novel generic framework based on the principle of mutual reinforcement, for identifying event-specific informative content from Twitter. Mutually reinforcing relationships between tweets, hashtags, text units, URLs and users are defined and represented using TwitterEventInfoGraph. An algorithm TwitterEventInfoRank is proposed, that simultaneously ranks tweets, hash-tags, text units, URLs and users producing them, in terms of event-specific informativeness by leveraging the semantics of relationships between each of them as represented by TwitterEventInfoGraph. Experiments and observations are reported on four million (approx) tweets collected for five real-life events, and evaluated against popular baseline techniques showing significant improvement in performance.
18. Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly
Abigail Z. Jacobs, University of Colorado Boulder, USA
Samuel F. Way, University of Colorado Boulder, USA
Johan Ugander, Microsoft Research Stanford University, USA
Aaron Clauset, University of Colorado Boulder Santa Fe Institute, USA
Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a network's assembly is driven by simple growth? How does a network's structure change as it matures? How does network structure vary with adoption rates and user heterogeneity, and do these properties play different roles at different points in the assembly? We investigate these and other questions using a unique dataset of online connections among the roughly one million users at the first 100 colleges admitted to Facebook, captured just 20 months after its launch. We first show that different vintages and adoption rates across this population of networks reveal temporal dynamics of the assembly process, and that assembly is only loosely related to network growth. We then exploit natural experiments embedded in this dataset and complementary data obtained via Internet archaeology to show that different subnetworks matured at different rates toward similar end states. These results shed light on the processes and patterns of online social network assembly, and may facilitate more effective design for online social systems.
19. Taming a Menagerie of Heavy Tails with Skew Path Analysis
Josh Introne, Department of Media & Information, Michigan State University, USA
Sean Goggins, iSchool at The University of Missouri Columbia, USA
The discovery of stable, heavy-tailed distributions of activity on the web has inspired many researchers to search for simple mechanisms that can cut through the complexity of countless social interactions to yield powerful new theories about human behavior. A dominant mode of investigation involves fitting a mathematical model to an observed distribution, and then inferring the behaviors that generate the modeled distribution. Yet, distributions of activity are not always stable, and the process of fitting a mathematical model to empirical distributions can be highly uncertain, especially for smaller and highly variable datasets. In this paper, we introduce an approach called skew-path analysis, which measures how concentrated information production is along different dimensions in community-generated data. The approach scales from small to large datasets, and is suitable for investigating the dynamics of online behavior. We offer a preliminary demonstration of the approach by using it to analyze six years of data from an online health community, and show that the technique offers interesting insights into the dynamics of information production. In particular, we find evidence for two distinct point attractors within a subset of the forums analyzed, demonstrating the utility of the approach.
20. Developing the ‘Pro-human’ Web
Michael J. Day, University of Southampton, UK
Leslie Carr, University of Southampton, UK
Susan Halford, University of Southampton, UK
Questions about the power relations between individuals, corporations and governments within the Web are increasingly prevalent, introducing unique political and philosophical challenges for a platform that exists beyond nation-states and with few conventional mechanisms of control. Arising from this, a call for a ‘pro-human’ Web by Berners-Lee has led to a campaign to develop a ‘Web We Want’. This proposes individual digital rights and responsibilities, suggesting a globalised, post-national digital reform ‘for humanity’. Whilst such ambitions offer significant appeal, their scope means that a great deal of work must be done to develop them in practical terms. In this paper we suggest that an essential part of this work will be to interrogate the conceptualisations of a ‘pro-human’ Web, highlighting both implications and sociotechnical changes that might move us closer to a ‘Web We Want’.
21. RoboCode-Ethicists – Privacy-friendly robots, an ethical responsibility of engineers?
Christoph Lutz, Institute for Media & Communications Management, University of St. Gallen, Switzerland
Aurelia Tamò, Chair for Information and Communication Law, University of Zurich, Switzerland
This article asks why engineers building robots should consider privacy aspects when programming their gadgets. We start with a definition of robots, differentiating active, social robots from passive, non-social robots. We then discuss the related literature on the privacy implications of social robots. Two aspects are of fundamental concern in this context: the pervasiveness and intrusiveness of robots on the one hand and a general lack of awareness and knowledge about how robots work, collect and process sensitive data on the other hand. We explain how the existing literature on robot ethics provides a suitable framework to address these two issues. In particular, robot ethics are useful to point out how engineers' and regulators' mindset towards privacy protection differs. The paper argues that different – at first sight incommensurable – rationalities exist when it comes to robotic privacy. As a contribution to the emerging field of robotic privacy, we propose an interdisciplinary and collaborative approach that bridges the two rationalities. This approach considers the role of code as the central governing element of robots. RoboCode-Ethicists, trans-disciplinary experts trained in the technical/computational, legal and social aspects of robotics, should lead the way in the discussion on robotic privacy. They could mediate between different stakeholders – mainly regulators, users and engineers – and address emerging privacy issues as early as possible.
22. Insights on Privacy and Ethics from the Web's Most Prolific Storytellers
Christopher Wienberg, Institute for Creative Technologies University of Southern California, USA
Andrew S. Gordon, Institute for Creative Technologies University of Southern California, USA
An analysis of narratives in English-language weblogs reveals a unique population of individuals who post personal stories with extraordinarily high frequency over extremely long periods of time. This population includes people who have posted personal narratives everyday for more than eight years. In this paper we describe our investigation of this interesting subset of web users, where we conducted ethnographic, face-to-face interviews with a sample of these bloggers (n, 11). Our findings shed light on a culture of public documentation of private life, and provide insight into these bloggers' motivations, interactions with their readers, honesty, and thoughts on research that utilizes their data. We discuss the ethical implications for researchers working with web data, and speak to the relationship between large social media datasets and the real people behind them.
23. Storyscope: Supporting the authoring and reading of museum stories using online data sources
Paul Mulholland, Knowledge Media Institute, The Open University, UK
Annika Wolff, Computing and Communications, The Open University, UK
Eoin Kilfeather, Digital Media Centre, Dublin Institute of Technology, Ireland
Museum staff tell stories to assist visitor interpretation of artworks. Visitors also tell their own stories to articulate their understanding and opinion of artworks. Additional knowledge about the concepts mentioned or tagged in these stories can be found from online data sources. These could be used to assist reader interpretation or author development of stories. However, the potentially vast network of heterogeneous knowledge that can be created around the tags or annotations of a story could be bewildering for the story reader or author. Here we present Storyscope, a test-bed environment for the authoring, reading and semantic annotation of museum stories. The integration of online knowledge within the task of story authoring or interpretation is facilitated by mapping the available knowledge to a set of facts and simple events related to each story annotation. Narrative principles of theme and setting are used to discover and highlight aspects of the knowledge of potential value to the author or reader. Preliminary studies indicate the potential of the approach for providing a form of semantic navigation across stories and concepts having a better cognitive fit to story related tasks than existing forms of navigation.
24. Archetypal Narratives in Social Machines: Approaching Sociality through Prosopography
Ségolène Tarte, e-Research Centre, University of Oxford, UK
Pip Willcox, Bodleian Libraries, University of Oxford, UK
Hugh Glaser, Seme4 & Ethos Valuable Outcomes, UK
David De Roure, e-Research Centre, University of Oxford, UK
Introducing Social Machines as web-enabled entities integrating social energies and computational powers into a sociotechnical system (whether purposeful or not) where social dynamics animate communities, this paper proposes a theoretical framework in which to observe them. Attempting to strike a balance between the roles of humans and nonhumans, and aware of the difficulties that this heterogeneity presents, we propose to approach the questions of capturing the social dynamics of a social machine through prosopography. Prosopography is a method, used in particular by historians, that allows to systematically study a collection of biographies, be they of persons, artefacts, infrastructures of groups thereof. Systematization is achieved through designing an appropriate questionnaire to gather homogeneous data across the biographies. Our questionnaire design relies on the identification of five archetypal elements in biographical narratives. Illustrating our method with three examples, we demonstrate how our archetypal narratives have the potential to describe at least aspects of the social dynamics in social machines.
25. How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter
Han-Teng Liao, Oxford Internet Institute, University of Oxford, UK
King-wa Fu, Journalism and Media Studies Centre, University of Hong Kong, Hong Kong
Scott A. Hale, Oxford Internet Institute, University of Oxford, UK
This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese, and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying "how much can be said" in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.
26. ‘/Command’ and Conquer: Analysing Discussion in a Citizen Science Game
Ramine Tinati, Web and Internet Science, University of Southampton, UK
Markus Luczak-Roesch, Web and Internet Science, University of Southampton, UK
Elena Simperl, Web and Internet Science, University of Southampton, UK
Nigel Shadbolt, Web and Internet Science, University of Southampton, UK
Wendy Hall, Web and Internet Science, University of Southampton, UK
Citizen science is changing the process of scientific knowledge discovery. Successful projects rely on an active and able collection of volunteers. In order to attract, and sustain citizen scientists, designers are faced with the task of transforming complex scientific tasks into something accessible, interesting, and hopefully, engaging. In this paper, we examine the citizen science game EyeWire. Our analysis draws up a dataset of over 4,000,000 completed game and 885,000 chat entries, made by over 90,000 players. The analysis provides a detailed understanding of how features of the system facilitate player interaction and communication alongside completing the gamified scientific task. Based on the analysis we describe a set of behavioural characteristics which identify different types of players within the EyeWire platform.
27. Analyzing Discourse Communities with Distributional Semantic Models
Igor Brigadir, Insight Centre, University College Dublin, Ireland
Derek Greene, Insight Centre University College Dublin, Ireland
PaÌdraig Cunningham, Insight Centre, University College Dublin, Ireland
This paper presents a new corpus-driven approach applicable to the study of language patterns in social and political contexts, or Critical Discourse Analysis (CDA) using Distributional Semantic Models (DSMs). This approach considers changes in word semantics, both over time and between communities with differing viewpoints. The geometrical spaces constructed by DSMs or "word spaces" offer an objective, robust exploratory analysis tool for revealing novel patterns and similarities between communities, as well as highlighting when these changes occur. To quantify differences between word spaces built on different time periods and from different communities, we analyze the nearest neighboring words in the DSM, a process we relate to analyzing &rlquo;concordance lines” This makes the approach intuitive and interpretable to practitioners. We demonstrate the usefulness of the approach with two case studies, following groups with opposing political ideologies in the Scottish Independence Referendum, and the US Midterm Elections 2014.
28. How much is Wikipedia Lagging Behind News?
Besnik Fetahu, L3S Research Center, Leibniz University of Hannover, Germany
Abhijit Anand, L3S Research Center, Leibniz University of Hannover, Germany
Avishek Anand, L3S Research Center, Leibniz University of Hannover, Germany
Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of The New York Times dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12% of the entities mentioned in the event page are created after the creation of the event page.
29. Considering a Wider Web? Employing Multimodal Critical Discourse Analysis in Exploration of Multiple Online Spaces
Rebecca Nash, University of Southampton, UK
What sets the Web apart from ‘traditional’ mass media is almost instantaneous access to diverse spaces that users navigate in customized ways. Users are often bound up as producers and consumers of materials online. As a result, new avenues for research have emerged for both large (‘Big Data’ and small-scale Web studies. Research across this spectrum, however, has tended to focus on singular types of Web platform (i.e. Twitter data, online forums etc.). Web users, conversely, are unlikely to relegate browsing to discrete types of Web space. What will be argued here – with reference to an ongoing case study researching the role of the Web on production and consumption of aesthetic surgery - is usefulness and significance of multimodal critical discourse analysis (MMCDA) for qualitative research across multiple online spaces. MMCDA examines intersecting visual media and texts to recognize and comprehend (re)production of dominant meanings in various contexts. Employing MMCDA across a selection of different types of websites – assembling a ‘snapshot’ of a topic(s) - enables wider qualitative exploration of complementary, competing, and contradictory visual and textual sources confronting users on an everyday, experiential level. This raises important epistemological and ethical issues pertinent to undertaking qualitative research on the Web. How do different Web spaces contribute to construction of dominant discourses? How do we - as researchers - gather, analyze and use various data ethically? From this emerges potential for developing more intricate understandings of diverse content available at the click of a hyperlink.
30. Habits vs Environment: What Really Causes Asthma?
Mengfan Tang, Department of Computer Science, University of California, Irvine, USA
Pranav Agrawal, Department of Computer Science = University of California, Irvine, USA
Ramesh Jain, Department of Computer Science, University of California, Irvine, USA
Despite considerable number of studies on risk factors for asthma onset, very little is known about their relative importance. To have a full picture of these factors, both categories, personal and environmental data, have to be taken into account simultaneously, which is missing in previous studies. We propose a framework to rank the risk factors from heterogeneous data sources of the two categories. Established on top of EventShop and Personal EventShop, this framework extracts about 400 features, and analyzes them by employing a gradient boosting tree. The features come from sources including personal profile and life-event data, and environmental data on air pollution, weather and PM2.5 emission sources. The top ranked risk factors derived from our framework agree well with the general medical consensus. Thus, our framework is a reliable approach, and the discovered rankings of relative importance of risk factors can provide insights for the prevention of asthma.
31. Emotional States vs. Emotional Words in Social Media
Asaf Beasley, Indiana University, USA
Winter Mason, Stevens Institute of Technology, USA
A number of social media studies have equated people's emotional states with the frequency with which they use affectively positive and negative words in their posts. We explore how such word frequencies relate to a ground truth measure of both positive and negative emotion for 515 Facebook users and 448 Twitter users. We find statistically significant but very weak (Ï in the 0.1 to 0.2 range) correlations between positive and negative emotion-related words from the Linguistic Inquiry Word Count (LIWC) dictionary and a well validated scale of trait emotionality called the Positive and Negative Affect Schedule (PANAS). We test this for tweets and Facebook status updates, focus on different time slices around the completion of the survey, and consider participants who report expressing emotions frequently on social media. With rare exception, this pattern of low correlation persists, suggesting that for the typical user, dictionary-based sentiment analysis tools may not be sufficient to infer how they truly feel.
32. Assessing the Value of Social Media for Organisations: The Case for Charitable Use
Christopher Phethean, Web Science Institute, University of Southampton, UK
Thanassis Tiropanis, Web Science Institute, University of Southampton, UK
Lisa Harris, Web Science Institute, University of Southampton, UK
Social media offer opportunities for organisations of all sectors to communicate with their audiences. There is little understanding, however, of what value these services actually provide for many of these organisations. Focusing on the charitable sector, this paper brings together the results of a number of studies into a triangulation whose own results and findings are discussed, and an overall model of value assessment for social media is presented. Emphasis is placed on eliciting the motivations and aims of both the charity and their supporters, along with observing the actual behaviour that then occurs from each side. By comparing these phenomena, and appreciating how they all interact with each other, it is argued that greater understanding around how valuable a particular organisation will find social media can be obtained.
P1. Estimating tourism statistics with Wikipedia page views
Christian M Alis, University College London, UK
Adrian Letchford, Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, UK
Helen Susannah Moat, Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, UK
Tobias Preis, Data Science Lab, Behavioural Science, Warwick Business School, University of Warwick, UK
Decision makers depend on socio-economic indicators to shape the world we inhabit. Reports of these indicators are often delayed due to the effort involved in gathering and aggregating the underlying data. Our increasing interactions with large scale technological systems are generating vast datasets on global human behaviour which are immediately accessible. Here we analyse whether data on how often people view Wikipedia articles might help us to improve estimates of the current number of tourists leaving the UK. Our analyses suggest that in the absence of sufficient history, Wikipedia page views provide an advantage. We conclude that when using adaptive models, Wikipedia usage opens up the possibility to improve estimates of tourism demand.
P2. DNA: From Search to Observation Revisited
Ian Brown, Web Science Institute, University of Southampton, UK
Lisa Harris, Web Science Institute, University of Southampton, UK
Wendy Hall, Web Science Institute, University of Southampton, UK
In this paper, we describe extensions to the process model first described in the paper &rlquo;From Search to Observation” based on additional field interview work. This process model forms part of a triad of perspectives under the banner of a methodology known as DNA, which looks at structure (Definition), process (Nature) and motivations of actors (Archetypes) for Web Observatories (hereafter WO) and more generally the class of Social Machines. We discuss the rationale for the model enhancements, enumerate and summarise the changes and close with an introduction to future work around use of open source tools and languages for implementing and analyzing social machine processes using this model. The additional perspectives we are now considering are an extensive revision to the model (which now addresses more than three times the number of factors in the previous model) and hence a revised paper is called for in this space.
P3. Does Dialectal Variation Matter in Term-Based Feature Selection of Sentiment Analysis? An Investigation into Multi-dialectal Chinese Microblogs
Kwun Cheung Chan, Journalism and Media Studies Centre, The University of Hong Kong, Hong Kong
King Wa Fu, Journalism and Media Studies Centre, The University of Hong Kong, Hong Kong
Chung Hong Chan, Journalism and Media Studies Centre, The University of Hong Kong, Hong Kong
This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.
P4. Predicting Political Polarization from Cyberbalkanization: Time series analysis of Facebook pages and Opinion Poll during the Hong Kong Occupy Movement
Chung-hong Chan, Journalism and Media Studies Centre, The University of Hong Kong, Hong Kong
King-wa Fu, Journalism and Media Studies Centre, The University of Hong Kong, Hong Kong
The purpose of this study is to investigate the temporal association between cyberbalkanization and real life polarization of public opinion during the Hong Kong Occupy Movement in 2014. 1,387 Facebook Pages about Hong Kong during July 1 to December 15, 2014 were collected, their publicly accessible posts were retrieved, and a post sharing network (1,397 nodes and 41,404 edges) was constructed. Network communities were computationally extracted to detennine the community membership for each Facebook page. Daily degree of cyberbalkanization was quantified with the number of sharings through strong ties (intra-community sharing) connections. The level of political polarization was derived from the opinion polls data with the proportion of respondents who gave extreme ratings to the government leader in Hong Kong. In a time series analysis, the daily degree of cyberbalkanization, as measured by the number of sharing through the strong ties, was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result. This is the first study that provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone survey, during social controversy.
P5. Automatic Identification of Personal Life Events in Twitter
Thomas Dickinson, Knowledge Media Institute Open University, UK
Miriam Fernandez, Knowledge Media Institute Open University, UK
Lisa A Thomas, Northumbria University Newcastle upon Tyne, UK
Paul Mulholland, Knowledge Media Institute Open University, UK
Pam Briggs, Northumbria University Newcastle upon Tyne, UK
Harith Alani, Knowledge Media Institute Open University, UK
New social media has led to an explosion in personal digital data that encompasses both those expressions of self chosen by the individual as well as reflections of self provided by other, third parties. The resulting Digital Personhood (DP) data is complex and for many users it is too easy to become lost in the mire of digital data. This paper studies the automatic detection of personal life events in Twitter. Six relevant life events are considered from psychological research including: beginning school; first full time job; falling in love; marriage; having children and parent's death. We define a variety of features (user, content, semantic and interaction) to capture the characteristics of those life events and present the results of several classification methods to automatically identify these events in Twitter.
P6. Cross-Social Network Collaborative Recommendation
Aleksandr Farseev, School of Computing, National University of Singapore, Singapore
Denis Kotkov, Dept. of Comput. Sci. & Inf. Syst, University of Jyvaskyla, Finland
Alexander Semenov, Dept. of Comput. Sci. & Inf. Syst, University of Jyvaskyla, Finland
Jari Veijalainen, Dept. of Comput. Sci. & Inf. Syst, University of Jyvaskyla, Finland
Tat-Seng Chua, School of Computing, National University of Singapore, Singapore
Online social networks have become an essential part of our daily life, and an increasing number of users are using multiple online social networks simultaneously. We hypothesize that the integration of data from multiple social networks could boost the performance of recommender systems. In our study, we perform cross-social network collaborative recommendation and show that fusing multi-source data enables us to achieve higher recommendation performance as compared to various single-source baselines.
P7. Spread and Skepticism: Metrics of Propagation on Twitter
Samantha Finn, Computer Science, Wellesley College, USA
Panagiotis Takis Metaxas, Computer Science, Wellesley College, USA
Eni Mustafaraj, Computer Science, Wellesley College, USA
Social media has become part of modern news reporting, used by journalists to spread information and find sources, or as a news source by individuals. The quest for prominence and recognition on sites like Twitter can sometimes eclipse accuracy and lead to the spread of false information. Could we use the so-called “wisdom of crowds” to predict the likelihood that a claim may be true or false? This paper, part of ongoing research, offers evidence that most false claims do not spread like true ones, and that the reaction of the audience to a claim on Twitter is correlated with its validity.
P8. A values and psychological attribute analysis of the Scottish Independence Referendum context in Twitter
Caroline Halcrow, University of Southampton, UK
Qingpeng Zhang, City University of Hong Kong, Hong Kong
Schwartz (Andrew) argues that inter-disciplinary approaches involving computational linguistics and the social sciences are needed to make sense of big data in social networks. The social psychology tool, the Schwartz (Shalom) Values Model is used here alongside linguistic psychological attribute analysis to investigate a context in ‘Twitter’ The topic of the Scottish Independence Referendum (September 18th, 2014) was selected as the context because it divided opinion into camps. This study's main hypothesis is that the camps of contexts can be values-profiled. Secondary hypotheses are: the values profiles correlate with psychological attribute profiles in the different voting camps; and the psychological textual analysis adds a wider psychological dimension to topic modeling in ‘Twitter’ The methodology combined two processes: the assignment of values to the camps of the Referendum context using the Schwartz Values Model; and the content analysis of the tweets, using the psychological textual analysis tool, LIWC.
P9. The influence of visual salience on video consumption behavior: A survival analysis approach
Rafael Huber, University of Basel, Switzerland
Benjamin Scheibehenne, University of Basel, Switzerland
Alexandre Chapiro, ETH ZuÌˆrich, Switzerland
Seth Frey, Disney Research, ZuÌˆrich, Switzerland
Robert W. Sumner, Disney Research, ZuÌˆrich, Switzerland
In an increasingly competitive media environment, producers of online content need analytics that can predict the success of a video. In recent years the field of visual computation has produced a variety of mathematical models that quantify an image's salience, that is, its potential to capture attention. To test how a video's content might predict its success, we applied the standard saliency model of Itti, Koch, and Niebur to more than 1000 video clips that were broadcast on a large video streaming website. We also obtained fine-grained data on the viewership of these clips. Based on a survival analysis, we find that people prefer more salient videos. The results were robust towards the inclusion of other predictors such as the genre of the video, but not to video length, which remains correlated with salience even after comparing videos only within show and genre. Our analyses suggest that visual salience provides an objective and easy-to-compute supplement to previously suggested predictors of video consumption behavior.
P10. What do academics ask their online networks? An analysis of questions posed via Academia.edu
Katy Jordan, The Open University, UK
Social networking sites (SNS) aimed at academics have the potential to enhance academic practice through developing an online academic identity and as a portal to further opportunities for collaboration and communication. This paper explores part of the communicative affordance offered by academic SNS through an analysis of the questions posed by academics via the Academia.edu website.
P11. Diversity Analysis of Web Search Results
Suneel Kumar Kingrani, Dept of Computer Science & Information Systems, Birkbeck, University of London, UK
Mark Levene, Dept of Computer Science & Information Systems, Birkbeck, University of London, UK
Dell Zhang, Dept of Computer Science & Information Systems, Birkbeck, University of London, UK
Are web search results usually dominated by major websites and therefore lacking diversity? In this paper, we aim to answer this question by quantitatively modelling the diversity of search results for popular queries using two diversity measures well-studied in ecology, namely Simpson's diversity index and Shannon's diversity index. Our theoretical analysis shows how the diversity of search results is determined by the Zipfian distribution of websites. Our empirical analysis reveals that comparing Google and Bing, the former is more diverse in the top-50 search results, while the latter is more diverse in the top-10 search results.
P12. Prosopography is Greek for Facebook: The SNAP:DRGN Project
K Faith Lawrence, Dept. of Digital Humanities, Kings College, London, UK
Gabriel Bodard, Dept. of Digital Humanities, Kings College, London, UK
In this paper, we present SNAP:DRGN, a pilot project intended to support Ancient World Linked Open Data through the creation of persistent identifiers for person and person-like entities. We introduce the linked data landscape as it exists with respect to the digitized Classical world and SNAP:DRGN's place within it.
P13. The Web Practice of Mathematicians on the Web: An Insight into Significant but Neglected Web Groups
Mandy Lo, University of Southampton, UK
Hugh Davis, University of Southampton, UK
Julie-Ann Edwards, University of Southampton, UK
Christian Bokhove, University of Southampton, UK
In this paper, we describe the findings from a three-year multi-phased investigation into the Web practice of online mathematics communities. Our results indicate that the equivalent technologies that enable text-input or image-uploads without the need to understand programming languages have not been made available for the mathematics/scientific communities to enable fluid communications. Given the global importance of mathematical and scientific collaborations, we argue that the mathematical and scientific communities are significant but neglected groups, and that more attention should be given to the user-interface designs to support fluid online mathematics communications.
P14. Real-time Social Media Analytics through Semantic Annotation and Linked Open Data
Diana Maynard, Dept. of Computer Science, University of Sheffield, UK
Mark A. Greenwood, Dept. of Computer Science, University of Sheffield, UK
Ian Roberts, Dept. of Computer Science, University of Sheffield, UK
George Windsor, Policy and Research, Nesta, London, UK
Kalina Bontcheva, Dept. of Computer Science, University of Sheffield, UK
This paper describes an open source framework for analysing large volume social media content, which comprises semantic annotation, Linked Open Data, semantic search, dynamic result aggregation, and information visualisation. In particular, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. These functionalities are presented in more detail in the context of analysing tweets from UK politicians and party candidates in the run up to the 2015 UK general election.
P15. Webscraping as an Investigation Tool to Identify Potential Human Trafficking Operations in Romania
Ruth McAlister, Ulster University, UK
Information communication technology has enabled criminals to remain distant from the crimes they commit with reduced risk. However, by moving this underground criminal activity online, digital evidence of communication with members of the crime group, and also victims, presents an interesting research opportunity into human trafficking and may reveal actionable information for law enforcement agencies. Specifically, this research paper investigates whether a webscraping tool could be employed to gather intelligence on organized crime groups at the recruitment stage of the trafficking operation as a means to understand their modus operandi. Preliminary findings presented in this paper indicate that the UK is a popular destination country for job advertisements hosted in Romania and further analysis will be undertaken to identify if there are in fact indicators of trafficking evident in these identified websites.
P16. Supply and demand of independent UK music artists on the web
Matt McVicar, Department of Engineering Mathematics, University of Bristol, UK
CeÌdric Mesnage, Department of Engineering Mathematics, University of Bristol, UK
Jefrey Lijffijt, Department of Engineering Mathematics, University of Bristol, UK
Eirini Spyropoulou, Department of Engineering Mathematics, University of Bristol, UK
Tijl De Bie, Department of Engineering Mathematics, University of Bristol, UK
As in any dynamic market, supply and demand of music are in a constant state of disequilibrium. Music charts have for many years documented the demand for the most popular music, but a more comprehensive understanding of this market has remained beyond reach. In this paper, we provide a proof of concept for how web resources now make it possible to study both demand and supply sides, accounting also for smaller, independent artists.
P17. Reddit.com - A census of subreddits
Richard A. Mills, The Psychometrics Centre, Psychology Department, Cambridge University, UK
The Social News site reddit.com is composed of thousands of independent user-created subreddits where people use the site's submission and voting features in a variety of ways. This paper offers a brief overview of different types of subreddit and how user activity is distributed between these.
P18. Social Influencer Analysis with Factorization Machines
Ming-Feng Tsai, Department of Computer Science & Program in Digital Content and Technology, National Chengchi University, Taiwan
Chuan-Ju Wang, Department of Computer Science, University of Taipei, Taiwan
Zhe-Li Lin, Department of Computer Science, National Chengchi University Taipei, Taiwan
How will the reputations of individuals in a social network be influenced by their communities in a quantitative way? This work attempts to observe the collaborative events occurring at individuals involved in a social network to obtain such crucial knowledge. We propose a Factorization Machine approach to find out the latent social influence among the individuals based on their collaborations. Experiments conducted on a real-world DBLP dataset verify that the proposed approach can discover the latent social influence among individuals and provide a better predictive model than several baselines.
P19. Twitter as a Political Network – Predicting the Following and Unfollowing Behavior of German Politicians
Julia Perl, University of Koblenz-Landau, Germany
Claudia Wagner, University of Koblenz-Landau, Germany
Jérôme Kunegis, University of Koblenz-Landau, Germany
Steffen Staab, University of Koblenz-Landau, Germany
It has widely been observed that many public figures and in particular politicians use Twitter as a medium for communication with their fans or followers. However, Twitter is also used by public figures for communication among themselves, allowing Twitter to be used as a tool to observe the social network among such public figures – a network which is otherwise much more difficult to observe. Accordingly, we study in this paper the behavior of German politicians with respect to their social interconnections on Twitter, by way of asking the question whether the following and unfollowing between them can be predicted with accuracy. We show which measures are useful for predicting the formation and dissolution of social ties in the network of German politicians, and quantify the added value of unlinking information for both prediction tasks. Our results show that interesting differences exist in the factors that are related with the formation and dissolution of social ties.
P20. Quotes in forum.rpg.net
Mattia Samory, Univ. Padova, Italy
Enoch Peserico, Univ. Padova, Italya
We analyse the usage of quotes in forum.rpg.net, the largest online forum on tabletop roleplaying games. Quote usage appears pervasive and surprisingly consistent over time and users; it seems to have a role in aiding intra-thread navigation; and it reveals an underlying “social” structure in a community that otherwise lacks all trappings (from friends and followers to reputations) of today's social networks. This is the first work to investigate community structure and interaction through the lens of quotes in an online forum.
Sabrine Saad, UIR Web Science CEMAM / FLSH Saint-Joseph University Beirut, Lebanon
Muriel Chamoun, UIR Web Science CEMAM / FLSH Saint-Joseph University Beirut, Lebanon
SteÌphane B. Bazan, UIR Web Science CEMAM / FLSH Saint-Joseph University Beirut, Lebanon
The Middle-East has witnessed a tremendous increase in Information Warfare Operations on the Web in the last two years. The strategy developed by the ISIS group to increase visibility and reach takes advantage of various core competencies of digital media communication. By identifying actions and observing their impact in the specific context of the Middle East, this ongoing research tries to understand how ISIS conceived its Web communication strategy to target populations and spread its message to the online world.
P22. Linguistic influence patterns within the global network of Wikipedia language editions
Anna Samoilenko, GESIS – Leibniz-Institute for the Social Sciences, Germany
Fariba Karimi, GESIS – Leibniz-Institute for the Social Sciences, Germany
Jérôme Kunegis, University of Koblenz-Landau, Germany
Daniel Edler, Integrated Science Lab, Department of Physics, UmeaÌŠ University, Sweden
Markus Strohmaier, GESIS – Leibniz-Institute for the Social Sciences, University of Koblenz-Landau, Germany
The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English–German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.
P23. Tweet if you will – the real question is, who do you influence?
Johanna Schacht, Karlsruhe Service Research Institute, Karlsruhe Institute of Technology, Germany
Margeret Hall, Karlsruhe Service Research Institute, Karlsruhe Institute of Technology, Germany
Martin Chorley, School of Computer Science & Informatics Cardiff University Wales, UK
Large numbers of today's businesses use social media in advertising. There is a belief in a great opportunity, even if return on investment is difficult to quantify. To fill this gap we consider a cross-media-platform-analysis across Facebook, Twitter and Foursquare. Rationale for and against different characteristics within social media advertisement are addressed. The paper finds correlation from posts and tweets to Foursquare check-ins. Results show that posts or tweets containing pictures have higher return on investment than posts or tweets without, and that when the text of a post or tweet raises curiosity or attracts individuals or groups Foursquare check-ins increase.
P24. Digitizing »Digital Methods« - The Journey of a Research Domain from a Book into the Semantic Web
Miriam Schmitz, Cologne University of Applied Sciences, Germany
Kristian Fischer, Cologne University of Applied Sciences, Germany
This paper introduces an approach to classification and formalization of interdisciplinary social research with the web. The research project built upon an initial arraying work of Richard Rogers that introduced digital methods as a new form of research with the web as a source of perceptions about society. Our work formalized the digital methods domain by construing an ontology with help of the Web Ontology Language (OWL), and interpreted the resulting representation for universal perceptions about web-based social research, such as the identification of accumulations of research activities, and predictions about epistemological shifts in the future.
P25. Exploring Long Running News Stories using Wikipedia
Jaspreet Singh, Forschungszentrum L3S, Germany
Abhijit Anand, Forschungszentrum L3S, Germany
Vinay Setty, Max-Planck-Institut fuÌˆr Informatik SaarbruÌˆcken, Germany
Avishek Anand, Forschungszentrum L3S Hannover, Germany
A significant portion of today's news articles are part of long running stories. To better understand the context of these stories journalists, social scientists and other scholars use news collections to find temporal and topical insights. However these insights are devoid of user impressions, derived from click-through data and query logs, and are only reliable if the collection is complete and consistent. In this work we introduce the notion of combining user impressions from Wikipedia with news collection based insights for long running news story exploration and outline promising new research directions. We also demonstrate our initial attempts with a prototype system called NEWSEX.
P26. Abstractions, Expressions and Online Collectives
Nirmal Kumar Sivaraman, Web Sciences Lab, International Institute of Information Technology Bengaluru, India
Srinath Srinivasa, Web Sciences Lab, International Institute of Information Technology Bengaluru, India
Groups of people or collectives, possess a number of interesting properties even in the online world. While there are associated with positive connotations like “The Wisdom of the Crowd,” not all collectives are wise. In this paper, we analyze collectives in terms of two cognitive dimensions called abstraction and expression. Based on the extent of “coagulation” of abstractions and expressions in the collective, we identify four extreme points that we call: crowds, herds, mobs and gangs respectively. We also propose and compare two computational models to score collectives along the above characterization.
P27. Prediction of Malware Propagation and Links within Communities in Social Media Based Events
Abinaya Sowriraghavan, School of Computer Science & Informatics, Cardiff University, UK
Pete Burnap, School of Computer Science & Informatics, Cardiff University, UK
This paper is aimed at studying malware propagation on social media and community link prediction. Twitter is taken as the social media platform and data is collected using Twitter4j and MongoDB. A high interaction client honeypot is used to classify benign and malicious URL's. The retweet volume and links between the users are then analyzed. Further to this, the work aims to detect communities that arise from these links between users with the help of BIGClam algorithm.
P28. Towards Real-time Lifetime Prediction of Information Diffusion
Io Taxidou, University of Freiburg, Germany
Anas Alzoghbi, University of Freiburg, Germany
Peter M. Fischer, University of Freiburg, Germany
Christoph Schöller, University of Freiburg, Germany
In this paper, we provide the first steps towards real-time, large-scale prediction of the lifetime of information diffusion processes.
P29. Crowdsourcing ground truth for Question Answering using CrowdTruth
Benjamin Timmermans, VU University, Amsterdam, CAS, IBM Netherlands, Netherlands
Lora Aroyo, VU University Amsterdam, Netherlands
Chris Welty, Google, USA
Gathering training and evaluation data for open domain tasks, such as general question answering, is a challenging task. Typically, ground truth data is provided by human expert annotators, however, in an open domain experts are difficult to define. Moreover, the overall process for annotating examples can be lengthy and expensive. Naturally, crowdsourcing has become a mainstream approach for filling this gap, i.e. gathering human interpretation data. However, similar to the traditional expert annotation tasks, most of those methods use majority voting to measure the quality of the annotations and thus aim at identifying a single right answer for each example, despite the fact that many annotation tasks can have multiple interpretations, which results in multiple correct answers to the same question. We present a crowdsourcing-based approach for efficiently gathering ground truth data called CrowdTruth, where disagreement-based metrics are used to harness the multitude of human interpretation and measure the quality of the resulting ground truth. We exemplify our approach in two semantic interpretation use cases for answering questions.
P30. TweetSense: Context Recovery for Orphan Tweets by Exploiting Social Signals in Twitter
Manikandan Vijayakumar, Arizona State University, USA
Tejas Mallapura Umamaheshwar, Arizona State University, USA
Subbarao Kambhampati, Arizona State University, USA
Kartik Talamadupula, IBM T.J. Watson Research Center, USA
As the popularity of Twitter, and the volume of tweets increased dramatically, hashtags have naturally evolved to become a de facto context providing/categorizing mechanism on Twitter. Despite their wide-spread adoption, fueled in part by hashtag recommendation systems, lay users continue to generate tweets without hashtags. When such “orphan” tweets show up in a (browsing) user's time-line, it is hard to make sense of their context. In this paper, we present a system called TweetSense which aims to rectify such orphan tweeets by recovering their context in terms of their missing hashtags. TweetSense enables this context recovery by using both the content and social network features of the orphan tweet. We characterize the context recovery problem, present the details of TweetSense and present a systematic evaluation of its effectiveness over a 7 million tweet corpus.
P31. Men eat on Mars, Women on Venus? An Empirical Study of Food-Images
Claudia Wagner, GESIS & Univ. of Koblenz, Germany
Luca Maria Aiello, Yahoo Labs, London
Culinary preferences contribute significantly to the sense of ourself. While gender, race, sexuality and ethnicity describe our “major identity” preferences in music, style and food define our “minor identity”. However, we find that only certain parts of them can be explained by gender-specific differences in the food consumption behavior, while other parts can be better explained by the media portrayal of food consumption. This work sets out to investigate gender which is part of our major identity and how it effects the way we define our minor identity online by exploring a large set of user-generated geo-tagged food pictures.
P32. ‘Digital Wildfires’: a challenge to the governance of social media?
Helena Webb, Department of Computer Science, University of Oxford, UK
Marina Jirotka, Department of Computer Science, University of Oxford, UK
Bernd Carsten Stahl, Department of Informatics, De Montfort University, UK
William Housley, School of Social Sciences, Cardiff University, UK
Adam Edwards, School of Social Sciences, Cardiff University, UK
Matthew Williams, School of Social Sciences, Cardiff University, UK
Rob Procter, Department of Computer Science, University of Warwick, UK
Omer Rana, School of Computer Science and Informatics, Cardiff University, UK
Pete Burnap, School of Computer Science and Informatics, Cardiff University, UK
The increasing popularity of social media platforms such as Facebook, Twitter, Instagram and Tumblr has been accompanied by concerns over the growing prevalence of ‘harmful’ online interactions. The term ‘digital wildfire’ has been coined to characterise the capacity for provocative content on social media to propagate rapidly and cause offline harm. The apparent risks posed by digital wildfires create questions over the suitable governance of digital social spaces. This paper provides an overview of some preliminary findings of an ongoing research project that seeks to build an empirically grounded methodology for the study and advancement of the responsible governance of social media.
P33. Wikipedia Page View Reflects Web Search Trend
Mitsuo Yoshida, Toyohashi University of Technology, Japan
Takaaki Tsunoda, University of Tsukuba, Japan
Yuki Arase, Osaka University, Japan
Mikio Yamamoto, University of Tsukuba, Japan
The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.
P34. Observing Social Web for Smog Disaster Forecasting
Yalin Zhou, College of Computer Science, Zhejiang University, China
Jiaoyan Chen, College of Computer Science, Zhejiang University, China
Huajun Chen, College of Computer Science, Zhejiang University, China
Smog disasters are greatly affected by social activities such as driving. In this poster, we observe social web to enhance smog disaster forecasting. Different kinds of social indicators are measured from social web data with a social web data processing framework, and then evaluated for smog disaster forecasting with two experiments.
P35. On Publication Usage in a Social Bookmarking System
Daniel Zoller, Data Mining and Information Retrieval Group, University of WuÌˆrzburg, Germany
Stephan Doerfel, ITeG, Knowledge and Data Engineering (KDE) Group, University of Kassel, Germany
Robert Jäschke, L3S Research Center, Germany
Gerd Stumme, ITeG, Knowledge and Data Engineering Group (KDE), University of Kassel, Germany
Andreas Hotho, Data Mining and Information Retrieval Group, University of WuÌˆrzburg, Germany
Scholarly success is traditionally measured in terms of citations to publications. With the advent of publication management and digital libraries on the web, scholarly usage data has become a target of investigation and new impact metrics computed on such usage data have been proposed – so called altmetrics. In scholarly social bookmarking systems, scientists collect and manage publication meta data and thus reveal their interest in these publications. In this work, we investigate connections between usage metrics and citations, and find posts, exports, and page views of publications to be correlated to citations.