Sociology of Data

View Segments Segment :

  • Citations
  • Add to My List
  • Embed
  • Link
  • Help
  • Citations
  • Add to My List
  • Embed
  • Link
  • Help
Successfully saved clip
Find all your clips in My Lists
Failed to save clip
  • Transcript
  • Transcript

    Auto-Scroll: ONOFF 
    • 00:07

      [Sociology of Data]

    • 00:11

      DR. SUSAN HALFORD: I'm Susan Halford.I'm Professor of Sociology at the University of Southamptonand I'm also one of the directors of the web site'sinstitute, which is at the University of Southampton.[Dr. Susan Halford, Professor of Sociology and Directorof Web Science, University of Southampton]So the worldwide web is now embeddedacross the world in our economic systems,in markets, in business, in social life, in culture.And if we're going to understand the web and how it's changing

    • 00:33

      DR. SUSAN HALFORD [continued]: and how it's impacting on our world,we need to draw together experts,not just from computer science, but also from sociology,from political science, from mathematics, from economics,from business and law.In this case study, I'm going to be talking about someof the data that are generated on the weband the data that have driven the growthof the web, actually, which is the social media data that we

    • 00:55

      DR. SUSAN HALFORD [continued]: all generate every day when we use Facebook or Twitteror Instagram.And these data are potentially really excitingfor social science researchers.But at the moment, we don't have very good methodsfor engaging with these data at scale or dealing with the factthat they're very dynamic, that they change over time,and that they should give us insightsinto emergent social phenomena.

    • 01:16

      DR. SUSAN HALFORD [continued]: So emergent identities, social group, social practices,political movements, and so on.This case study will explore the development of a new methodfor social media research built as a collaborationbetween computational science and social science.The case study is based on Twitter, the micro blogging

    • 01:37

      DR. SUSAN HALFORD [continued]: platform which was first started in 2006,and which by 2014 had over 300 million active monthly usersand around 500 million tweets every day.[Using Big Data in Social Science Research]So social scientists' response to these datahas been quite mixed.On the one hand, some social scientists

    • 01:58

      DR. SUSAN HALFORD [continued]: have been very excited about the data.Bruno Latour, for instance, famously said,it's as if the inner worlds of individualshave been prized open for social scientists to research.Whereas other people have said that really these data arequite insignificant and ephemeral compared with someof the usual forms of data and the usual kinds of questions

    • 02:19

      DR. SUSAN HALFORD [continued]: that social scientists are used to asking.So what's particularly exciting about social media datafor social research is this, that theypromise to tell us something about the social at scale.It's a much bigger scale than we're used to having, usually.But also, something about the social as it emerges over time.Social media data give us the digital traces of the thingsthat people say and do every day, rather than

    • 02:41

      DR. SUSAN HALFORD [continued]: the things they say that they do in our questionnairesand surveys when we come along and ask them later.Social media data also, of course, are digital data.So they allow us to interrogate those data at scaleand very quickly using computational techniques.Existing research on social media,and in particular on Twitter, tends

    • 03:02

      DR. SUSAN HALFORD [continued]: to use one of two methods.It tends to take samples of the data,so deciding in advance who the interesting actors are,and taking a sample of those actors.And it tends to use small scale content analysislooking at a number of tweets or a number of users,and using linguistic or conversational techniquesor discourse analysis to explore what'sgoing on in those tweets.

    • 03:23

      DR. SUSAN HALFORD [continued]: And those are both perfectly good methodsdepending on what kind of research questions you have.But those methods don't allow youto look at the data at scale.And they don't allow you to trace the dataas it emerges over time to see which the important actors are,and what kind of information is flowingacross the network in real time.Another group of academics who have

    • 03:44

      DR. SUSAN HALFORD [continued]: some interesting techniques for social media analytics,and who are very interested in social mediais computer scientists.And unlike social scientists, theydo have large scale data mining and data modeling techniqueswhich might be useful.But the problem from this side of the disciplinary divideis that these techniques are not underpinned by social science

    • 04:06

      DR. SUSAN HALFORD [continued]: research questions.They have no theoretical grounding to them.And often they're used in ways thatlack systematic interpretation based in the social sciences.So for instance, we see computational studiesof social media data which make claims about identity.And yet for those of us who are sociologists or psychologists,

    • 04:30

      DR. SUSAN HALFORD [continued]: we know that identity is a really deeply contestedconcept.Lots of arguments.Is identity something you're born with?Is identity something that you acquire?Is identity individual or is it social or group based?Do we have one identity or many identities?These are questions that sociologists and psychologistshave been concerned with for many, many years,

    • 04:50

      DR. SUSAN HALFORD [continued]: but which don't make it into most of the computationanalyses of social media data.[Collaborating Computational and Social Science Methodsfor Analyzing Twitter]So to try and address this, working togetheracross the social and the computational sciences,we built a new analytics tool that we can usefor working with Twitter data.

    • 05:11

      DR. SUSAN HALFORD [continued]: And the tool that we built had threekey methodological principles.So the first principle was start with the whole network.Which means that we don't select in advancewhat we think the important informationor the important actors are.The second principle was that the tool should be dynamic,so it should allow us to capture the flow of information

    • 05:31

      DR. SUSAN HALFORD [continued]: and the emergent roles of actors over time, rather than takingsnapshots of the network, which isthe method most commonly used in social science research.And the third principle that we built into the toolwas trying to overcome the divisionbetween qualitative and quantitative research.So computer scientists are very good at modeling networks

    • 05:53

      DR. SUSAN HALFORD [continued]: at scale, doing quantitative analysisand producing quantitative metrics.And social scientists are particularlygood at the qualitative aspect of social media researchand looking into depth at what the actual contentof social media is.The tool that we decided to buildhad to overcome that binary and be

    • 06:14

      DR. SUSAN HALFORD [continued]: able to do both the quantitative analysisand the qualitative analysis.[The Case of the 2011 Student Protest]To show you what our tool does and the kindsof sociological insights that we can gather from it,I'm going to give you an example using a particular casestudy from Twitter.This is using a hashtag from Twitterwhich was the hashtag developed around the student

    • 06:36

      DR. SUSAN HALFORD [continued]: protests in 2011 where there was riots in Londonand some public order issues around those protests.We collected all the tweets that weremade using the hashtag of the student protestfor over a period of around three weeks, after whichthe hashtag died away.

    • 06:57

      DR. SUSAN HALFORD [continued]: And the total volume of tweets is around 12,000 tweetsfrom about 5,000 discreet users, 5,000 different users.Our tool, which is called Flow140,allows us to model the flow of information between usersas that took place in real time over time

    • 07:18

      DR. SUSAN HALFORD [continued]: during the days before, during, and after the student protests.So now I'm going to tell you a bit more about Flow140and show you how the method works with the student feeshashtag and the collection of tweetsthat we made using that hashtag during the periodaround the protest.So, to give you an example of what a network graph looks

    • 07:40

      DR. SUSAN HALFORD [continued]: like using Flow140, this is a real example using the studentfees protest hashtag.And this shows the evolution of the network over time.The network, in this case, is a re-tweet network.So this graph is showing you individual usersthat are represented by the circular dots in the graph.The size of the dot represents how many times

    • 08:02

      DR. SUSAN HALFORD [continued]: the user has been-- or that particular tweethas been re-tweeted.And the lines between the users show which usershave been doing the re-tweets.And you can see, of course, that some tweets havebeen re-tweeted many times and are picked upby many different users over time.[Analyzing Tweets During the 2011 StudentProtest Using Flow140]What I want to do now is to look at someof the analysis of the network graph that we did,

    • 08:25

      DR. SUSAN HALFORD [continued]: and to explore some of the findingsthat we can make about the student fees protest,and how it's reflected on Twitter,that Flow140, our method, allows us to explore that we wouldn'thave found if we'd been using conventional methodsfor social media analytics.Looking at the network graph, you'llsee that there's some different colorednodes, that the dots are represented

    • 08:46

      DR. SUSAN HALFORD [continued]: in different colors.And turning first to the blue nodes.Blue nodes represent the amplifiers.Now these are users who are not necessarilytweeting anything themselves in the first instance.But they're very quick to pick up the tweets that other peoplehave made and to re-tweet them.They're the first re-tweeters and they're pushing informationout across the network, sometimes

    • 09:07

      DR. SUSAN HALFORD [continued]: using other hashtags to try and interest other groups of peoplein the tweets that they are re-tweeting.And sometimes embedding URLs into the tweets, embeddingwebsite addresses so that the people who see the tweetsare also directed to other sources of information,whether that's about the protests or not.They piggyback the URLs onto the original tweets.

    • 09:30

      DR. SUSAN HALFORD [continued]: The second category that emerges,the second role that emerges over time as the network grows,is that of the aggregators.And the aggregators are shown using the yellow nodesin the network graph.The aggregators are people who draw togetherdifferent networks of informationor different communities who are tweeting within the network.And if you look at the graph, you'll

    • 09:51

      DR. SUSAN HALFORD [continued]: see that right at the beginning, weget some very distinct communitiesthat are not connected at all.You'll see different bursts of activity,but they're not connected to each other.What the aggregators do is draw together the tweets, sothat you're putting together perhaps a whole seriesof communities re-tweeting around a BBCor around a mass media reporting of the protest,

    • 10:13

      DR. SUSAN HALFORD [continued]: with the student organizers themselves, who are re-tweetingeach other over here.Or with the arguments about the peopletweeting around the police.And you might have three distinct communities.But the aggregators are pulling them togetherby joining the tweets together.So the second point that we can see in this graphis about the information that is flowing.

    • 10:34

      DR. SUSAN HALFORD [continued]: And whose information is flowing the most widely.So what we see from this graph is that actually, it'sonly a very small number of peoplewho are very widely re-tweeted.This is a very common phenomenon well knownto computer scientists, sometimes referred toas the long tail.But in a more sociological sense, what we seeis that only a small number of voices

    • 10:55

      DR. SUSAN HALFORD [continued]: were being very widely taken notice ofand re-tweeted across the network.Most people were never re-tweeted at all.And of those who were re-tweeted,most were only re-tweeted once or twice.We can focus down onto a group of around eight userswho were particularly highly re-tweeted during the studentfees protest.

    • 11:15

      DR. SUSAN HALFORD [continued]: The really interesting thing about this groupthat Flow140 shows us, that none of the other methodswould have got at all, is that while several of them are,if you like, the usual suspects, so they're either the BBC,or they are the leaders of the studentprotest, who you might well have decided to sample in advance.You might have guessed that thesewould be the important actors.A significant number of them-- three or four of them--

    • 11:36

      DR. SUSAN HALFORD [continued]: and in fact, including the most highly tweeted individual,are completely unknown in that respect.They're not users that you could have guessedor discovered using any well known sampling strategy.The only way you would have discoveredthese users and their role in the networkwould have been by tracking the total flow of informationover time, which is what we've been doing with this method.

    • 11:57

      DR. SUSAN HALFORD [continued]: The second thing that we can say about the information usingthis method, and drilling down from the macro levelof the whole network to the micro level of the contentof the tweets, is we can see that what's being re-tweetedover time changes.So in the days leading up to the protests,we have a lot of organizational issues.We have some quite mainstream political arguments about what

    • 12:21

      DR. SUSAN HALFORD [continued]: the protest is about.During the course of the protest,that changes really dramatically.And the most dominant flow of informationis around policing, police tactics,and around alleged forms of police brutality.In quite a political and resistant way.So the most highly re-tweeted tweet across the whole pieceis from one of those unknown users, which

    • 12:42

      DR. SUSAN HALFORD [continued]: says something along the lines of,I was warned not to re-tweet this photograph.And it's a photograph of the police involvedin cattling the students.The final thing that we can tell from this re-tweet chainis how long information stays in the network.So some tweets are re-tweeted a certain number

    • 13:02

      DR. SUSAN HALFORD [continued]: of times in a few minutes, and then they disappear completely.Other tweets keep being re-tweeted for hours, or evendays, or even weeks.So as well as looking at how often something's re-tweeted,we can see the longevity of the tweet,how long it stays in the public sphere,and how long it generates public debate.[Conclusion]

    • 13:23

      DR. SUSAN HALFORD [continued]: So in conclusion, the case study that we'veoffered using Flow140 in the Twitter analyticsis designed to show that the methods that we use to workwith big data really matter.We need to think very carefully about what kinds of methodswe use and what kinds of questionswe can ask with those methods and what kinds of resultswe will get.

    • 13:44

      DR. SUSAN HALFORD [continued]: So using Flow140, we found some resultsthat we wouldn't have found if we'd used other methods.And in particular, these pay attentionto the emergent role of actors, the emergent roleof information, that we can't necessarilypredict in advance who the important actorsor important information is going to be.

    • 14:04

      DR. SUSAN HALFORD [continued]: And in particular, it draws attention to two things.One is the temporal nature of Twitter networks.That these things change over time and that taking snapshotswill give us a different result to looking at,even if it's sped up, a real time following of a network.And the second thing is the importanceof looking at both the quantitative large scale

    • 14:26

      DR. SUSAN HALFORD [continued]: network, and being able to drill down and look qualitativelyat what's going on, what that means,and what the content of those information flows is.So lastly, I'd just like to offer two points for youto think about a bit more.The first point is that of course,this is just the beginning.

    • 14:47

      DR. SUSAN HALFORD [continued]: There's lots of other things we could do with Twitter.Those of you who use Twitter will know about followers.And one of the things that we could dois look at the relationship between followersand tweets and re-tweets.Do people get more followers when they tweet, or re-tweet?Who are the followers?Are the followers the people who are doing the re-tweeting?Or is it separate in other individuals

    • 15:09

      DR. SUSAN HALFORD [continued]: who just happen to be on the protest?So if we're interested, for example, in political protest,or in friendship, or in social groups,there's a lot of other things that we could do.And Flow140 just scratches the surfaceat the moment in terms of investigating those networks.We might also want to think about the relationshipbetween the online and the offline.

    • 15:32

      DR. SUSAN HALFORD [continued]: So of course, Twitter is just one social network.But I mentioned that users are linking into websites.What do people say on Twitter?And how does that compare to Facebook or whatthey put on public websites?Or what they post on Instagram?We don't live our social media lives just on one platform.We live across multiple platforms.

    • 15:52

      DR. SUSAN HALFORD [continued]: And we need analytics that can start to thinkabout what's going on there.But we also need to think about what's happening offline.So we don't put everything on social media.And we don't necessarily know what people meanor what they do once they've posted somethingon social media.So it's really important that we doresearch which actually follows people offline,

    • 16:13

      DR. SUSAN HALFORD [continued]: and looks at how social media are used,and what the meaning and practices of social media are,as well as the digital traces that areproduced by those social media.And there's one final point that I want to make.Which is, it's critical that we understandthat the data produced by social media are technically produced.

    • 16:35

      DR. SUSAN HALFORD [continued]: So sometimes people talk about theseas naturally occurring data, which is not the case.These are data that are generatedwithin particular social media platforms.Twitter only allows you 140 characters.It only allows you to re-tweet or direct message or mentionsomebody.So the technology itself is shaping the kind

    • 16:56

      DR. SUSAN HALFORD [continued]: of data that are produced.And we need to think about that, just as the methodsthat we use are shaping the kinds of datathat are produced.And network analytics are a particular kind of methodwhich produce a particular kind of data.So as social scientists and critical methodologists,it's really important that we think

    • 17:17

      DR. SUSAN HALFORD [continued]: about how technology has shaped the dataand shaped the methods.

Sociology of Data

View Segments Segment :

Abstract

Professor Susan Halford discusses the opportunity that online communication offers to researchers. She explains problems with current internet research, then describes a new tool her team has designed to aid collaboration between computer science and social science researchers.

SAGE Video Cases
Sociology of Data

Professor Susan Halford discusses the opportunity that online communication offers to researchers. She explains problems with current internet research, then describes a new tool her team has designed to aid collaboration between computer science and social science researchers.

Copy and paste the following HTML into your website

Back to Top