The SAGE Handbook of Online Research Methods
This handbook is the first to provide comprehensive, up-to-the-minute coverage of contemporary and developing Internet and online social research methods, spanning both quantitative and qualitative research applications. The editors have brought together leading names in the field of online research to give a thoroughly up to date, practical coverage, richly illustrated with examples. The chapters cover both methodological and procedural themes, offering readers a sophisticated treatment of the practice and uses of Internet and online research that is grounded in the principles of research methodology. Beginning with an examination of the significance of the Internet as a research medium, the book goes on to cover research design, data capture, online surveys, virtual ethnography, and the internet as an archival resource, and concludes by looking at ...
- Front Matter
- Back Matter
- Subject Index
- The Internet as a Research Medium: An Editorial Introduction to The Sage Handbook of Online Research Methods
- The Ethics of Internet Research
- Understanding and Managing Legal Issues in Internet Research
- Research Design and Tools for Internet Research
- General Approaches to Data Quality and Internet-generated Data
- Middleware for Distributed Data Management
- Distilling Digital Traces: Computational Social Science Approaches to Studying the Internet
- Analyzing Social Networks via the Internet
- Nonreactive Data Collection on the Internet
- Overview: Online Surveys
- Sampling Methods for Web and E-mail Surveys
- Internet Survey Design
- Internet Survey Software Tools
- Virtual Ethnography: Modes, Varieties, Affordances
- Internet-based Interviewing
- Online Focus Groups
- Fieldnotes in Public: Using Blogs for Research
- Research Uses of Multi-user Virtual Environments
- Distributed Video Analysis in Social Research
- The Provision of Access to Quantitative Data for Secondary Analysis
- Secondary Qualitative Analysis Using Internet Resources
- Finding and Investigating Geographical Data Online
- Data Mining, Statistical Data Analysis, or Advanced Analytics: Methodology, Implementation, and Applied Techniques
- Artificial Intelligence and the Internet
- Longitudinal Statistical Modelling on the Grid
- Qualitative e-Social Science/Cyber-Research
- New Cartographies of ‘Knowing Capitalism’ and the Changing Jurisdictions of Empirical Sociology
- The Internet and the Future of Social Science Research
- Online Research Methods and Social Theory
Editorial arrangement © Nigel Fielding, Raymond M. Lee, Grant Blank 2008
Chapter 1 © Raymond M. Lee, Nigel Fielding, Grant Blank 2008
Chapter 2 © Rebecca Eynon, Jenny Fry, Ralph Schroeder 2008
Chapter 3 © Andrew Charlesworth 2008
Chapter 4 © Claire Hewson, Dianna Laurent 2008
Chapter 5 © Karsten Boye Rasmussen 2008
Chapter 6 © Alvaro A.A. Fernandes 2008
Chapter 7 © Howard T. Welser, Marc Smith, Danyel Fisher, Eric Gleave 2008
Chapter 8 © Bernie Hogan 2008
Chapter 9 © Dietmar Janetzko 2008
Chapter 10 © Vasja Vehovar, Katja Lozar Manfreda 2008
Chapter 11 © Ronald D. Fricker Jr 2008
Chapter 12 © Samuel J. Best, Brian D. Krueger 2008
Chapter 13 © Lars Kaczmirek 2008
Chapter 14 © Christine Hine 2008
Chapter 15 © Henrietta O'Connor, Clare Madge, Robert Shaw, Jane Wellens 2008
Chapter 16 © Ted Gaiser 2008
Chapter 17 © Nina Wakeford, Kris Cohen 2008
Chapter 18 © Ralph Schroeder, Jeremy Bailenson 2008
Chapter 19 © Jon Hindmarsh 2008
Chapter 20 © Keith Cole, Jo Wathan, Louise Corti 2008
Chapter 21 © Patrick Carmichael 2008
Chapter 22 © David Martin, Samantha Cockings, Samuel Leung 2008
Chapter 23 © Bert Little, Michael Schucking 2008
Chapter 24 © Ed Brent 2008
Chapter 25 © Rob Crouchley, Rob Allan 2008
Chapter 26 © Nigel Fielding, Raymond M. Lee 2008
Chapter 27 © Michael Hardey, Roger Burrows 2008
Chapter 28 © Michael Fischer, Stephen Lyon, David Zeitlyn 2008
Chapter 29 © Grant Blank 2008
First Published 2008
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.
SAGE Publications Ltd
1 Oliver's Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road, Post Bag 7
New Delhi 110 044
SAGE Publications Asia-Pacific Pte Ltd
33 Pekin Street #02-01
Far East Square
Library of Congress Control Number: 2007941069
British Library Cataloguing in Publication data
A catalogue record for this book is available from the British Library
Typeset by CEPHA Imaging Pvt. Ltd., Bangalore, India
Printed in Great Britain by The Cromwell Press, Trowbridge, Wiltshire Printed on paper from sustainable resources
Review PanelBill Dutton,Director, Oxford Internet Institute, 1 St Giles, Oxford, UKNiall O'Dochartaigh,National University of Ireland, Galway, Republic of IrelandBarry Wellman,Department of Sociology, University of Toronto, Toronto, CanadaDave Garson,College of Humanities and Social Sciences, North Carolina State University, Raleigh, North Carolina, USARon Anderson,Professor Emeritus, University of Minnesota, Minnesota, USAKarl M. Van Meter,LASMAS/CNRS, Paris, FranceAlan Bryman,Management Centre, University of Leicester, University Road, Leicester, UKNina Wakeford,Department of Sociology, Goldsmith's College, University of London, UKMichael D. Fischer,Professor of Anthropological Sciences, Director, Centre for Social Anthropology and Computing, Department of Anthropology, University of Kent at Canterbury, Canterbury, UK
List of Contributors
Rob Allan is leader of the Grid Technology Group at STFC Daresbury Lab. Rob is a Chartered Physicist with a background in high-performance computational modelling. More recently he has developed a practical knowledge of Grid and Web services middleware and helped to set up STFC's e-Science Centre and the UK's Grid Support Centre. Rob is currently acting as e-Research representative on the Joint Information Services Committee Working Group for the international e-Framework for Education and Research and is chair of the Operations Board of the NW-GRID. He is also technical architect and deputy programme project manager for the STFC Facilities e-Science Programme and e-Infrastructure Deployment Project at the Diamond Light Source. His group is also contributing to deployment of the UK e-Infrastructure for e-Social Science. To see Rob's recent publications go to http://epubs.cclrc.ac.uk/search?st=browse-by-person&pid=1338.
Jeremy Bailenson earned a PhD in cognitive psychology from Northwestern University in 1999. After receiving his doctorate, he spent four years at the Research Center for Virtual Environments and Behavior at the University of California, Santa Barbara as a Post-Doctoral Fellow and then as Assistant Research Professor. He currently is the director of Stanford's Virtual Human Interaction Lab, where his main area of interest is the phenomenon of digital human representation, especially in the context of immersive virtual reality.
Samuel J. Best, PhD, is Director of the Center for Survey Research and Analysis, University of Connecticut. He has conducted and overseen numerous surveys both in his own research and on behalf of the Center, and is co-author with Brian S. Krueger of Internet Data Collection, a title in the Sage Quantitative Applications in the Social Sciences series.
Grant Blank is with Applied Social Research Associates, Bethesda, Maryland, and has previously held academic positions at American University in Washington DC. His previous books include New Technology in Sociology (1989), and Critics, Ratings and Society (2006).
Edward Brent is Professor of Sociology at the University of Missouri and president of Idea Works, Inc, a software company specialising in software using artificial intelligence strategies for teaching and research. He is the Associate Editor for Sociology of the Social Science Computer Review. Recent publications include SAGrader(tm), Qualrus(tm) and coursepacks for SAGrader in introductory sociology, social research methods and social psychology.
Roger Burrows is Professor of Sociology in the Department of Sociology at the University of York and the co-Director of the Social Informatics Research Unit (SIRU). He has research [Page ix]interests in: urban and housing studies; the sociology of health and illness; and social informatics. He was the Coordinator of the UK ESRC e-Society Research Programme 2005-2007.
Patrick Carmichael has a PhD in science education from Leeds University and worked for many years in UK primary, secondary and special schools before moving into higher education. He is currently Senior Research Associate at the Centre for Research into Education Technologies at the University of Cambridge where he directs research programmes in technology-enhanced learning and technology-enhanced research. His particular interests are the role of technology in social science, particularly in participatory approaches to research and evaluation; the nature, representation and transfer of professional expertise across disciplinary contexts; and the role of new technologies in developing ideas and strategies for the preservation of collective memory.
Andrew Charlesworth is Senior Research Fellow in IT and Law, and Director of the Centre for IT and Law (CITL) at the University of Bristol, where he holds a joint post in the Law School and the Department of Computer Science. He has been researching and writing on legal issues arising from the use of ICTs in teaching and research for over a decade, and has presented on a range of IT and E-commerce Law topics at conferences and seminars in Europe, North America, the Middle East and Australia.
Samantha Cockings is Lecturer in Geography at the University of Southampton. She has extensive experience in research and teaching concerned with the collation, manipulation, analysis and presentation of spatially referenced population-related data. She has provided specialist advice on the handling of geographically referenced data to a range of researchers including epidemiologists, statisticians, environmental scientists, clinical medics and public health specialists and has also coordinated various activities aimed at improving awareness of georeferencing issues and skills amongst academic and non-academic researchers and practitioners working in the health field. She is a past Secretary/Treasurer of the Royal Geographical Society (with the Institute of British Geographers) Geography of Health Research Group.
Kris Cohen is a PhD student in Art History at the University of Chicago. His work focuses on how contemporary images and image-making practices engender feelings of belonging and un-belonging, how images become sites for attachment and gathering, and how images have become scenes in which normative personhood is both produced and interrupted. While working as a research fellow at INCITE, Kris completed a one-year ESRC project about ordinary photography and the Internet, entitled ‘Photos Leave Home’. He has also written about contemporary arts in the UK.
Keith Cole is Director of Mimas, a Joint Information Services Committee and Economic and Social Research Council-funded national data centre based at the University of Manchester. Mimas provides the UK academic community with direct access to resource discovery tools, research data, learning and teaching materials and electronic journals covering a wide range of subject areas. Mimas hosts a number of socio-economic data services, which provide access to and support for aggregate statistics from the UK Censuses of Population and key international macro datasets. Mimas is also engaged in research and development particularly in the areas of e-Science, electronic publishing, metadata standards, digital preservation and authentication.
Louise Corti is Associate Director at the UK Data Archive at the University of Essex, where she directs the sections of ESDS Qualidata, Outreach and Promotion and Acquisitions. In the past she has taught sociology, social research methods and statistics, and has undertaken research in [Page x]the fields of education and health. She worked on design, implementation and analysis of the British Household Panel Study for six years before joining the newly founded Qualidata Centre at Essex, as Manager, then Deputy Director, and in 2000 joining the UK Data Archive to lead the User Services division. Her current research interests include methods, technologies and standards for enhancing, sharing and reusing digital qualitative data, and the application and use of social science data in teaching and learning. Louise has authored a virtual tutorial for social research methods and published articles on the sharing and reuse of qualitative data, statistical literacy and on reusing learning materials.
Rob Crouchley is the Director of Lancaster Centre for e-Science and Professor of Applied Statistics at Lancaster University. Rob has used high performance computers for estimating multivariate random effect generalised linear models for more than 25 years. He has completed many UK Research Council and Joint Information Systems Committee-funded research contracts and published many papers as well as writing and editing books in these and related subject areas. His research interests cover advances in statistical modelling, collaboration tools, database management both on the desktop and using grid computing systems. He is a member of the Joint Information Services Committee for the Support of Research.
Rebecca Eynon, PhD, is a Research Fellow at the Oxford Internet Institute (OII), University of Oxford. Prior to working at the OII Rebecca worked at the Centre for Mass Communication Research, University of Leicester; the School of Education, University of Birmingham; the Department of Sociology and Cultural Studies at Birmingham and the Department of Sociology, City University. Rebecca's current research and teaching interests are focused on the use of ICTs in education – particularly in the contexts of higher education and adult learning – and the use of online methods in the social sciences.
Alvaro A.A. Fernandes, BSc, MSc, PhD, is a senior lecturer at the School of Computer Science of the University of Manchester. From 1974 to 1989, he worked as an IT professional in Brazil, in the areas of database, and database application, design methodologies. He was a research associate at Heriot-Watt University, and a lecturer in Computer Science at Goldsmiths College, University of London, where he stayed from 1996 to 1998. He has been in Manchester since 1998, a senior lecturer since 2005. His most recent research falls in the areas of adaptive, distributed query processing in advanced infrastructures (such as service-based Grids and sensor networks).
Nigel Fielding is Professor of Sociology and Associate Dean of Arts and Human Sciences at the University of Surrey. With Ray Lee, he co-directs the CAQDAS Networking Project, which provides training and support in the use of computers in qualitative data analysis. His research interests are in new technologies for social research, qualitative research methods, and mixed method research design. He has authored or edited 20 books, over 50 journal articles and over 200 other publications. In research methodology his books include a study of methodological integration (Linking Data, 1986, Sage; with Jane Fielding), an influential book on qualitative software (Using computers in qualitative research, 1991, Sage; editor, with Ray Lee), a study of the role of computer technology in qualitative research (Computer Analysis and Qualitative Research, 1998, Sage, with Ray Lee) and a four volume set, Interviewing (2002, Sage; editor). He is presently researching the application of high performance computing applications to qualitative methods.
Danyel Fisher is a researcher at Microsoft Research. His work emphasises online communication with visualization techniques, social network analysis, and qualitative and [Page xi]quantitative analysis. He uses these forms of analysis to design systems that improve users’ experiences with computer systems.
Michael Fischer is Professor of Anthropological Sciences at the University of Kent, Canterbury, UK. After a misspent youth in the music and computer businesses in Memphis and Austin, he became an anthropologist, doing his doctoral research in Lahore, Pakistan on arranged marriages. Following his doctorate at UT Austin, in 1985 he joined Kent as Lecturer in Anthropology and Computing. Fischer promotes the use of computers in anthropological fieldwork, with his fieldwork mostly in Pakistan and the Cook Islands. In 1986, he initiated the first interactive anthropology resource on the IPSS network, which was viewed by the Queen in 1988. Fischer is thus occasionally referred to by students at Kent as the only person at Kent who has met both the King and the Queen. In 1989, he became Director of the Centre for Social Anthropology and Computing at Kent. In April 1993, he initiated the first anthropology website, which became the Ethnographics Gallery in January 1994. In 2005 AnthroGrid was initiated in hopes it will someday become a major resource for anthropologists, perhaps through the thin veneer of AnthroSpace in coming years.
Ronald D. Fricker, Jr is an associate professor in the Operations Research Department of the US Naval Postgraduate School. He holds a PhD and an MS in statistics from Yale University, an MS in Operations Research from The George Washington University and a bachelor's degree from the United States Naval Academy. Professor Fricker's current research interests include Internet-based surveys and survey development, the evaluation of various statistical methods for biosurveillance, and the development of statistical process control methodologies. He has published in the Journal of the Royal Statistical Society, Environmental and Ecological Statistics, Journal of Quality Technology, Naval Research Logistics, Teaching Statistics and Chance Magazine.
Jenny Fry is a lecturer in the Department of Information Science, Loughborough University. Her research is concerned with the disciplinary shaping of networked digital resources and scholarly communication on the Internet. Her publications in this area include, ‘Scholarly Research and Information Practices: A Domain Analytic Approach’, Information Processing and Management, 42, pp. 299-316. She also writes about legal and ethical issues relating to the Internet and is author of the following editorial: ‘Google's Privacy Responsibilities at Home and Abroad’. Journal of Librarianship and Information Science. 38(3), pp. 135-9.
Ted J. Gaiser is an entrepreneur and adjunct faculty member in the Sociology Department at Boston College, where he received his PhD. His experience includes consulting, technology management, and leading and supporting academic online research endeavours. His research has been on online social forms and online research methods. He was one of the first to present and publish on online focus groups, earning him the Founder's Award of Merit from The Social Science Computing Association.
Eric Gleave is a sociology graduate student at the University of Washington. His research spans sociological fields concerned with early modern revolts, simulation studies of cooperation and corruption, social network methods, norms of war, formal models of religious choice, and collective action in online communities.
Michael Hardey is a reader in Sociology at Hull/York Medical School. He has previously worked at the University of Surrey, the University of Southampton and the University of Newcastle. His main research interests are in mediated information and relationships. This falls [Page xii]into three broad areas: e-health and in particular the role of the Internet in shaping health beliefs and behaviours; e-body and identity (particularly the representation of the self through new media); and e-relationships (particularly the role of Internet dating and the embodied relationships). He is working on a Department of Health funded Research with Brian Loader (University of York) and Leigh Keeble (CIRA) on ‘Wired for the Third Age? Electronic Service Delivery for Older People’. He is also involved in a ESRC e-Society project ‘Sorting Places Out? Classification and its Consequences in an E-Society’ with Roger Burrows (University of York), Nicholas Gane (Brunel University), Nick Ellison (University of Durham), Simon Parker (University of York) and Brian Woods (University of York). An ongoing collaboration with the Laboratory of Computational Engineering, Helsinki University of Technology is developing methodologies to understand mediated interactions and relationships. In the medical school he teaches courses on health, the Internet, the social body and other aspects of sociology.
Claire Hewson is Lecturer in Psychology at the Open University, Milton Keynes. She received her PhD in cognitive science/psychology from the University of Edinburgh, Scotland, in 1996. After working as a research associate in the Human Communication Research Centre, University of Edinburgh, for several years, she took up a lectureship in psychology at the University of Bolton (formerly Bolton Institute), until September 2007 when she moved to the Open University. Her research interests include ‘folk psychology’, lay theories and beliefs, use of the Internet as a data gathering tool in social and behavioural research, and the use of IT in teaching and learning. She has co-authored the book Internet Research Methods (2003), and has published various articles on this topic.
Jon Hindmarsh is a senior lecturer in the Department of Management at King's College London. His research involves video-based field studies of social interaction and work practice in settings such as control centres, operating theatres, dental clinics, research labs and museums and galleries. In addition he engages in interdisciplinary research that explores the potential for field studies to contribute to the development of new technologies. His recent publications include articles in Organization Studies, The Sociological Review and The Sociological Quarterly. He co-edited Workplace Studies (Cambridge University Press, 2000) and is currently co-authoring a text on video-based methods for Sage and co-editing a collection on interactional studies of work practice for Cambridge University Press.
Christine Hine is Senior Lecturer in Sociology at the University of Surrey. Her research in sociology of science and technology focuses on the development of ethnographic methodologies for scientific and technical settings including the Internet, and on the role of information and communication technologies in science. She is the author of Virtual Ethnography (Sage, 2000) and Systematics as Cyberscience (MIT, 2008) and editor of Virtual Methods (Berg, 2005) and New Infrastructures for Knowledge Production (Idea Group, 2006). She is President of the European Association for the Study of Science and Technology (http://www.easst.net).
Bernie Hogan is a doctoral candidate in sociology and a Research Coordinator at NetLab, both the University of Toronto. His dissertation focuses on how individuals use information technologies to maintain personal relationships. Additionally, Bernie has a keen interest in usable methodologies. He has recently published (with Juan Carrasco and Barry Wellman) articles in Field Methods and Environment and Planning B detailing a paper-based social network capture technique. He has reviewed the state of online social networks in IEEE Data Engineering Bulletin. Additionally, he is managing the development of a suite of software for personal network analysis – ‘Egotistics’.[Page xiii]
Dietmar Janetzko received his PhD in psychology from Freiburg University, Germany, in 1996. In 2006, he acquired a PhD in learning sciences from the Technical University of Kaiserslautern, Germany. He is currently a lecturer at the School of Informatics of the National College of Ireland in Dublin, Ireland. His work is centred on Internet research, in particular on methodology and techniques of data collection, on probabilistic data analysis methods (data mining) and on tutorial dialogues.
Lars Kaczmirek graduated at the University of Mannheim in psychology. Since 2003, he has been working as a survey research methodologist at GESIS-ZUMA (Center for Survey Research and Methodology). His primary research interests are online surveys, reducing survey error, website evaluation and usability issues. He has also worked in the project Web Survey Methodology Site (http://WebSM.org).
Brian S. Krueger is an associate professor of Political Science at the University of Rhode Island. His research focuses on the potential of new technologies to alter conventional patterns of political behaviour.
Dianna Laurent, PhD, teaches in the English Department at Southeastern Louisiana University. She co-authored Internet Research Methods: A Practical Guide for the Social and Behavioural Sciences with Claire Hewson, Peter Yule and Carl Vogel. Dr Laurent publishes on a variety of subjects involving the Internet. Her most recent publication is a chapter on E-Zines co-authored with Joe Burns for The Handbook of Research on Computer Mediated Communication and edited by Sigrid Kelsey and Kirk St. Amant. She also assists in compiling the yearly ATTW bibliography for editor Paul R. Sawyer and is the business manager of 19th Century Studies for the Nineteenth Century Studies Association.
Raymond M. Lee is Professor of Social Research Methods in the Department of Health and Social Care at Royal Holloway University of London. He has written extensively about a range of methodological topics, including the problems and issues involved in research on ‘sensitive’ topics, research in physically dangerous environments, the role of new technologies in the research process, and the history of the interview. He coordinates the UK Economic and Social Research Council's Researcher Development Initiative, a nationwide programme to develop an advanced training infrastructure for social researchers.
Samuel Leung joined the University of Southampton as a Research Assistant in 2004. He has been working on various e-learning projects in the School of Geography. His research involves supporting the implementation of innovative approaches to learning and teaching based on the use of digital assets embedded in online learning activities. Samuel is highly skilled in authoring and transferring interoperable and reusable online learning resources. Since 2006, Samuel has been working on the ESRC-funded Geo-Refer project where he creates and develops geographical referencing resources for social science researchers making use of an online and adaptive learning environment.
Bert Little is Professor of Computer Science and Mathematics, and Executive Director of the Center for Agribusiness Excellence and Texas Data Mining Research Institute, Tarleton State University, Texas A&M University System. He has a PhD in biological anthropology and applied mathematics from the University of Texas at Austin, an MA in primatology from Ball State University, and BA in anthropology from Appalachian State University. He and Michael Schucking have collaborated for more than 25 years on various computing and simulation projects. Most recently, they have worked together on a data warehousing/data mining project [Page xiv]for the United States Department of Agriculture Risk Management Agency since 2000 aimed at improving program integrity through combating waste, fraud and abuse. Dr Little has testified before the United States Congress (Senate and House of Representatives) several times regarding the use of data mining in improvement of governmental programs. In 2008, the USDA data mining project enters its eighth year, and has saved the government an estimated $1.5 billion in cost avoidance.
Stephen Lyon, BSc Goldsmiths’ College, London; PhD Kent, is Senior Lecturer in Anthropology at Durham University. He has conducted research in rural Pakistan on local politics, patron-client networks and cultural systems. In addition, he has collaborated on interdisciplinary projects employing e-Science methods. He is the author of Anthropological Analysis of Local Politics and Patronage in a Pakistani Village (Edwin Mellen Press, 2004). He has made extensive use of IRCT throughout his anthropological research and was one of the pioneers of the fieldwork blog in the 1990s (http://www.dur.ac.uk/s.m.lyon).
Clare Madge is a senior lecturer in Geography at the University of Leicester. Although originally a development geographer, in recent years her attention has turned to cybergeographies. Through her cybergeographies research she has been particularly interested in the development of online research methods, especially online focus groups, and has published widely on this topic. She is currently involved in the development of training in online research methods with a team based at the University of Leicester (see http://www.geog.le.ac.uk/orm/).
Katja Lozar Manfreda, PhD, is an Assistant Professor of Statistics and Social Informatics at the Faculty of Social Sciences, University of Ljubljana, Slovenia. Her research interests include survey methodology, new technologies in social science data collection and web surveys. She has been involved in WebSM site developments from its beginnings in 1998. She is also a member of the ESRA (European Survey Research Association) committee and is the secretary of RC-33 (Research committee on Logic and Methodology) of the International Sociological Association.
David Martin is Professor of Geography at the University of Southampton. The first edition of his influential text Geographic Information Systems and their Socioeconomic Applications was published in 1991. He has researched and published widely on census and health care applications of geographic information systems (GIS) and pioneered the system of output area design adopted for the 2001 census in England and Wales. He has edited texts on census data and research methods. David is Director of the Economic and Social Research Council's Census Programme and a co-Director of the National Centre for Research Methods.
Henrietta O'Connor is a lecturer in Employment Studies at the Centre for Labour Market Studies at the University of Leicester. Henrietta's interest in online research methods began in 1998 when, together with Clare Madge, she carried out online interviews with new parents. Following on from this she was member of a team funded by the ESRC Research Methods Programme (Phase 2) to produce a website providing training in the use of online research methods (http://www.geog.le.ac.uk/orm/). Henrietta has also published widely in the field of online research methods.
Karsten Boye Rasmussen is Associate Professor at the Department of Marketing and Management at the University of Southern Denmark (SDU) in the area of organization and information technology. He coordinates the graduate programme in ‘IT, communication and organization’, teaches Business Intelligence, and does research in aspects of IT and organization, [Page xv]virtual organization and data mining. He qualified as a sociologist at University of Copenhagen, and worked before joining the SDU in 1998 at social science support in data archiving, specialising in social science metadata. He is the editor of the IASSIST Quarterly.
Ralph Schroeder is a research fellow at the Oxford Internet Institute at Oxford University. He is an investigator on the Oxford e-Social Science (OeSS) Project: Ethical, Legal and Institutional Dynamics of e-Sciences. His publications include Rethinking Science, Technology and Social Change (Stanford University Press, 2007), Possible Worlds: The Social Dynamic of Virtual Reality Technology (Westview, 1996) and the edited collections The Social Life of Avatars: Presence and Interaction in Shared Virtual Environments (Springer 2002) and (co-edited with Ann-Sofie Axelsson) Avatars at Work and Play: Collaboration and Interaction in Shared Virtual Environments (Springer 2005).
Michael Schucking is Technical Director of the Center for Agribusiness Excellence and Texas Data Mining Research Institute, Tarleton State University, Texas A&M University System. He has a BA in sociology and BS in computer science with a minor in mathematics from the University of Texas at Austin, and Post-graduate Certificate from the Software Quality Institute, College of Engineering, the University of Texas at Austin. He also did Ford Foundation sponsored post-graduate study in sociology at the Free University of Berlin. He has collaborated with Bert Little for more than 25 years on various computing and simulation projects. Most recently, they have worked together on a data warehousing/data mining project for the United States Department of Agriculture Risk Management Agency since 2000 aimed at improving program integrity through combating waste, fraud, and abuse. In 2008, the USDA data mining project enters its eighth year, and has saved the government an estimated $1.5 billion in cost avoidance.
Robert Shaw is a learning technologist in the Faculty of Health at Leeds Metropolitan University, where he provides development and training to promote and support the use of learning technologies across the Faculty. He was the Learning Technologist for the ESRC-funded Exploring Online Research Methods/TRI-ORM projects (http://www.geog.le.ac.uk/orm) involving the development, maintenance and enhancement of the website, the production and conversion of training materials for online delivery, and the authoring of guidance and training materials on the technical aspects of carrying out online research.
Marc Smith is a research sociologist at Microsoft Research specialising in the social organization of online communities. His research focuses on the ways group dynamics change when they take place in social cyberspaces. His goal is to visualize these social cyberspaces, mapping and measuring their structure, dynamics and life cycles. He has developed the ‘Netscan’ engine that facilitates Usenet research.
Vasja Vehovar, PhD, is a full Professor of Statistics at the Faculty of Social Sciences, University of Ljubljana, Slovenia. He teaches courses on Sampling, Survey Methodology, and Information Society. From 1996 he has been the principal investigator of the national project Research on Internet in Slovenia (RIS), the leading source for information society research in Slovenia. He is also involved in various EU information society research projects. He is responsible for the development of the WebSM portal devoted to web survey methodology and was the coordinator of the corresponding EU Framework project. His research interests span from survey methodology to information society issues.
Nina Wakeford is Reader in Sociology at Goldsmiths, University of London. Her research interests are the sociology of technology and design, as well as interdisciplinary collaborations [Page xvi]between sociologists, artists and designers. Her first work in Internet studies was a project investigating the construction of identity in a listserv women's community. She has also researched Internet cafes, mobile phone culture, new media companies and the use of social research by designers of new technologies. She currently holds an Economic and Social Research Council Fellowship and is investigating the translations between ethnography and design, visualizations of ‘user experience’, and the development of new methodologies for sociology drawing on art and design practice.
Jo Wathan is a research fellow at the Cathie Marsh Centre, University of Manchester UK. Most of her time is divided between two national data support services; the ‘Government’ function of the Economic and Social Data Service (ESDS), and the UK Census Samples of Anonymised Records (SARs). She also teaches secondary analysis and statistics software. Previously she has produced learning and teaching materials with the SARs, worked on household classifications for the UK Census 2001 and undertaken spells of research in the public and voluntary sectors. She has a substantive interest in family effects on employment participation.
Jane Wellens is an educational developer at the University of Leicester where she has responsibility for supporting the initial and continuing professional development of research staff. She has been involved in two UK Economic and Social Research Council funded projects to develop support materials and training resources for researchers interested in using online research methods. Jane has also used online questionnaires extensively for research projects exploring different aspects of university students’ learning experiences.
Howard T. Welser is an assistant professor of Sociology at Ohio University. His research investigates how micro-level processes generate collective outcomes, with application to status achievement in avocations, development of institutions and social roles, the emergence of cooperation, and network structure in computer mediated interaction.
David Zeitlyn is a social anthropologist who has been conducting field research with the Mambila in Cameroon and Nigeria regularly since 1985. He has been a pioneer of using the Internet to distribute research results (making the first sound recordings of ono-European language available online in 1993). He is also the Honorary Editor of Anthropological Index Online and has undertaken research in UK on how researchers use bibliographic databases. A member of the Research Resources Board of the Economic and Social Research Council, he has been involved in e-Social Science since its inception.
Glossary of Key Terms
- [Page 550]Access Grid : see ‘e-Social Science’.
- Algorithm: A finite list of well-defined terms for accomplishing some task that, given an initial state, will proceed through a well-defined series of successive states, possibly culminating in an end state.
- Application Programming Interface (‘API’): A source code interface that a computer application, operating system or ‘library’ (in computer science, a collection of subprograms used to develop software) provides to support requests for services to be made of it by a computer program. One function of such high-level interfaces is to interact with a database that renders HTML (see separate entry) code. Such interfaces enable other computer applications to interact with survey software (or other kinds of software).
- Arc: A directed relationship between nodes in a network. See also ‘edge’, ‘tie’.
- Artificial Intelligence (‘AI’): The scientific understanding of the mechanisms underlying thought and intelligent behaviour, and their embodiment in machines.
- Authentication: Generic term for a set of procedures for determining that a user has rights to receive a given online service, such as access to an archived database.
- Avatar: A representation of a human, animal or other animate object enabling the representation's participation in some form of online interaction.
- Beeper studies: Experiential time-sampling research whereby participants report by various means their activities in progress at the time a signal is activated by a device carried on or about the person. Responses were originally entered on a paper instrument, but more recently include online response modes.
- Beyond 20/20: A software application enabling exploration of online datasets, supporting display, subsetting, visualisation, charting and downloading.
- Blog: A diary-like genre in which the ‘blogger’ records and/or comments on their own activity/beliefs and/or that of others, often including perspectives on current events, posting the ‘blog’ on the Web. May include audio and visual information as well as text, and the opportunity to comment on what is posted.
- Computer Assisted Qualitative Data Analysis (‘CAQDAS’): Software for the analysis of qualitative data, chiefly text, but also audio, video and still images.
- Cave: An immersive virtual environment in which interaction initiated by users occurs between figurative representations of interactants projected on walls of a room and gives the illusion of 3D. See also ‘Head-mounted display’.
- Chatroom: An online communications environment facilitating discussion between subscribed members.
- Clickstream analysis: Analysis of how users negotiate a path around a website.
- Client-side: Computer resources such as programs or information that are held on the [Page 551]user's computer rather than on or from the server to which the computer is linked. See also ‘server-side’.
- Collaboratory, collaboratories: Distributed research groups that work together via online technologies enabling data exchange, communication and real-time collaboration over data transmitted across networks.
- Common Gateway Interface (‘CGI’): A scripting language that allows various commands to be executed by the researcher s web server based on the actions of the respondent.
- Common Logfile Format (‘CLF’): Hits defined as web elements transferred from a server to the user's browser. Common ‘log fields’ include the user's IP address, timestamp from the server, request for the element or web page, the status of the request, and the number of bytes transferred.
- Computational grid: See ‘e-Social Science’.
- Computer-Supported Cooperative Work: A field of social and behavioural science concerned with the ways in which people apply and relate to information technologies when they are mutually engaged in tasks using those technologies.
- Cookie: An automated code enabling the unique identification of browsers and users’ hypertext pathways.
- Co-presence: Interaction between social actors that takes place in the same physical space, rather than being computer-mediated.
- Coverage error: When some part of a relevant population cannot be included in a survey sample.
- Data controller: Those who are responsible for processing ‘personal data’, information that is held about an identifiable living person. See also ‘Data subject, an associated legal term.
- Data grid: See ‘e-Social Science.
- Data integration: A computational process enabling the linking together of different datasets.
- Data mining: A set of procedures, such as clustering and pattern-recognition algorithms that search large datasets for patterns. It is usually atheoretical, using unsupervised learning and identifying patterns in data and summarising them without reference to a conceptual or theoretical organising framework.
- Data poisoning: Providing wrong or misleading information. A main type is subject fraud, especially where a survey instrument is forwarded to individuals outside the intended sample.
- Data subject: Those who are subject of personal data and enjoy specified rights in respect of such data. See also ‘Data controller.
- Digital curation: Preparation of data in ‘future proof formats, assigning permanent identifiers to documents and ‘mirroring’ archives across multiple sites.
- Digital Rights Management: Professional and legal regulation of legitimate access to, and use of, digital resources.
- Digital trace: Indicator of human activity created in the course of online interaction, e.g., patterns of search behaviour apparent from web log files (see ‘web log file’).
- Documentality: Extent to which the data used in a research study are recorded and available post hoc, ideally including a description of the research design and how data collection proceeded in practice as well as the characteristics of the data.
- Drop-out: Withdrawal of research subjects from participation in an online research study, especially Internet surveys.
- Edge: An undirected relationship between nodes in a network. See also ‘arc, ‘tie.
- Emoticon: A figurative representation formed only by using characters available on a QWERTY keyboard. The most common is the ‘smiley’ [:-)], which celebrated its twenty-fifth birthday in late 2007.
- [Page 552]Encryption: Procedures for coding data in transit so that only those authorised with rights to see and use the data may do so.
- End User License: Conditions and rights associated with the use of online datasets and other resources.
- Entertainment poll: Surveys conducted for their amusement value. These have proliferated on the Internet, where they largely consist of websites where any visitor can respond to a posted survey. As unscientific as are telephone call-in polls.
- e-Social Science: A range of computational resources and procedures using Grid and High Performance Computing to facilitate social science research, comprising the Access Grid (support for using online video teleconferencing), Computational Grid (support for computation of very large and/or complex requirements) and Data Grid (support for discovery, collation and transfer of distributed datasets). In natural science, the equivalent term is ‘e-Science’. The term ‘e-Research’ is also in use as a generic alternative to subject-specific terminology. In the US, the term in use is ‘cyber-research’. These terms are also used to identify policies and programmes promoted by research funding organisations, such as the US National Science Foundation and the UK Research Councils. See also ‘Grid’.
- Expert systems: A sub-field of ‘artificial intelligence’ (see separate entry) that attempts to enable computers to perform a task as well as human experts by using an ‘ontology’ (see separate entry) for a substantive domain to reason about it.
- Extensible Mark-up Language (‘XML’): A flexible text format that is used for data exchange. This general purpose markup language is designed to be readable by humans, while also providing metadata tags for content that can be easily recognised by computers.
- File Transfer Protocol: A protocol enabling computers and servers to transmit data across networks.
- Firewall: A means of providing security of Internet user accounts. There are both hardware and software firewalls. None is 100% effective against hackers (those seeking illegitimate access to users’ accounts).
- Folksonomy: Also called ‘collaborative tagging’ or ‘social tagging’. The practice and method of collaboratively creating and managing tags to annotate and categorise content. Usually, freely chosen keywords are used rather than a controlled vocabulary.
- Geographical Information System: Software handling geographical information and its visual representation. See also ‘raster data’ and ‘vector data’.
- Granularity: The fineness or coarseness of the detail available from a given data source.
- Grid: A distributed computing infrastructure that combines parallel and distributed computer platforms to enable computational operations exceeding the capacities of individual desktop computers.
- Grid-enabling: Adapting a dataset to make it accessible programmatically over the Grid.
- Harvested e-mail: Sets of e-mail addresses collected from postings on the web and from individuals knowingly or not knowingly solicited for their e-mail address.
- Head-mounted Display (‘HMD’): An immersive VRE in which the environment is displayed in 3D glasses. See also ‘Cave’.
- Human-Computer Interaction (‘HCI’): A field of social and behavioural science concerned with the ways that people apply and relate to computer technologies.
- Human Subjects Model (also called ‘Human Subjects Research Model’): Amodel of ethical guidelines developed in reaction against scientific practice in Nazi Germany. Its key elements are the protection of confidentiality, anonymity and the use of informed consent.
- Hyperlink: A user-assigned or automatically generated connection between two or more points on online documents or other online artefacts.
- [Page 553]Hypertext: An unstructured series of pages and links between pages in a network.
- Hypertext Mark Up Language (HTML): A standard for marking up documents containing text and multimedia objects and linking those documents with hypertext links. Initial basis of the World Wide Web.
- Hypertext Transfer Protocol (HTTP): A text-based protocol that is commonly used for transferring information across the Internet.
- HTTP Tunnelling: A procedure enabling individuals to reach through a corporate firewall via a proxy server. Illegal in some cases.
- Institutional Review Board: A body charged with determining that the potential risks to research subjects are outweighed by the potential benefits of the research. Also called ‘ethics committees’ and ‘research ethics committees.
- Intellectual Property Rights: The rights in law that the creator of a document, composition, performance, invention or other valued innovation, enjoys over its licensed and legitimate use.
- Intelligent agent: A software program possessing some form of ‘artificial intelligence’ (see separate entry) sufficient to sense changes in a complex environment and act on those changes to achieve goals on behalf of users.
- Intercept survey: Pop-up surveys that often use systematic sampling for every kth visitor to a website or web page.
- Interoperability: Procedures and computer programs enabling the linking of datasets to facilitate analytic inquiries that cannot be satisfied by reference to a single dataset. Involves the assignment of ‘metadata tags to archived data.
- IP Address (Internet Protocol Address): The ‘address’ is the identifying number of the computer from which a given Internet transaction has taken place. Internet Service Providers may assign addresses ‘dynamically’ so that two sessions from the same machine show different numbers.
- Internet Relay Chat: An instant messaging protocol for online communication.
- Java: A platform-independent programming language, currently offering survey instruments the highest level of flexibility and interactivity. Like an image file, a Java ‘applet’ can be included in a web page; the applet s code is transferred to the user s browser, which then executes the code. Java is suited to use in complex survey instruments. Since some users disable Java, researchers may also use HTML in tandem with CGI to present interactive forms on the web. See also ‘HTML’, ‘Common Gateway Interface.
- Mash-up: A collation and correlation of information from a variety of online sources, often quickly done to form a first overview of information available on a topic.
- Measurement error: When survey response differs from the ‘true response, for example, because respondents have not candidly answered sensitive questions; see ‘Social desirability effect.
- Metadata: Data about data. May include references to schemas, provenance and information quality.
- Middleware: Software components that are deployed together with existing software systems on the user s computer platform in order to provide generic services between those systems. The principal use is in data integration, where the software tools are designed to reconcile descriptive and format differences between datasets and/or other computational entities to allow their unimpeded interaction (‘interoperability’). Used on the web as well as the Grid. Examples of middleware include OGSA-DAI (Open Grid Services Architecture – Database Access and Integration), and OGSA-DQP (Open Grid Services Architecture – Distributed Query Processing). A widespread system using middleware is the Globus Toolkit. See also ‘interoperability’, ‘wrapper.
- [Page 554]MORFing: A form of ‘netiquette’ (see separate entry) by which individuals determine whether those with whom they are in online contact are male or female; may also include exchange of other basic personal information.
- MUDS and MOOS (Multi-User Dungeon/Domain, MUD-Object Oriented): An online virtual environment based purely on the exchange of text rather than figurative representations.
- Multi-user Virtual Environment (‘MUVE’): Technologies allowing users to interact via digital representations of themselves in a virtual space or place.
- Natural Language Processing (‘NLP’): A sub-field of ‘artificial intelligence’ (see separate entry) in which computer software is used to automatically generate and understand natural human language. Natural language generation systems convert information from databases into normal-sounding language, while natural language understanding systems convert normal language into more formal representations of knowledge that a computer can manipulate.
- Netiquette: Norms of appropriate online behaviour.
- Neural network: In computing, an algorithm ('see separate entry’) that attempts to mimic human reasoning by linking a series of artificial neurons to one another that are exposed to inputs and generate outputs, with a view to creating an adaptive system capable of learning to solve problems.
- Newsgroup: An online forum enabling discussion between subscribers.
- Non-reactive data: Data that are collected for research purposes without the subject of the data being aware that it is being collected. Also called ‘unobtrusive data’; first word was not hyphenated in the original usage.
- Ontology: In computer science, a knowledge base that holds semantic relationships between terms and is used to reason about a substantive domain. In computer science, ontologies generally consist of a ‘semantic network’ linking individual objects, classes of objects, attributes or features describing those objects, and relationships between objects. The meaning of the term is distinct from its usage in philosophy.
- Open Source: Software whose source code is made freely available by the programmer so that others may customise it and/or elaborate its functionality.
- Opt-in panel: These comprise individuals who have volunteered to participate in an ongoing survey or series of surveys, often following a solicitation on a website.
- Opt-out: A source of bias that occurs when survey sample members choose not to participate in a survey. The opt-in samples that feature in non-probability surveys are also a source of bias, because there is seldom any information available about those who chose not to opt in.
- Paradata: Data about the process of data collection, in an online survey context including information like the amount of time to answer a particular question.
- Phishing: A criminal activity in which individuals attempt to fraudulently acquire sensitive personal information, such as passwords and bank card details, by masquerading as a trustworthy entity in an electronic communication.
- Podcast: A digital media file, or a series of such files, that is distributed over the Internet using syndication feeds for playback on portable media players and personal computers. The term can refer either to the content itself or to the method by which it is syndicated.
- Pop up: An associated link that appears on the users screen when visiting a website, often used to invite users to respond to an online survey.
- Radio button: Response display for participants in web surveys; the participant clicks on the graphical representation of a ‘button’ applicable to their preferred response (the buttons resemble those on 1950s automobile radios). [Page 555]Radio Frequency Identification Device (‘RFID’): A remote sensing technology providing position location and other information using small wireless transponders that return a unique ID number when activated by a suitable radio frequency signal. May be embedded in inanimate or animate objects.
- Random Digit Dialing (‘RDD’): A random sampling method used in telephone surveys.
- Raster data: Pixel-based geographical data. Also see ‘Geographical Information System and ‘Vector data.
- Really Simple Syndication (‘RSS’): A set of web feed formats used to publish frequently updated content like blog entries, podcasts or news headlines.
- Redundant Array of Independent Disks (‘RAID’): A security system comprising redundant hard drives and daily backup.
- Relational database: A database that maintains a set of separate, related files (tables), but combines data elements from the files for queries and reports when required. Such databases are organised around a data table in which a row refers to a single case and a column refers to a specific attribute.
- Resource Definition Format (‘RDF’): A format providing the means of representing relationships between elements of document content.
- Resource discovery: Location of datasets satisfying an analytic requirement from repositories and archives, particularly in an online environment.
- Roll-off: Occurs when a respondent exits an online survey instrument before completing all the questions. Increases measurement error.
- Scraper: Automated computer scripts that parse the content of web pages so it is useful as data. See also ‘spider’.
- Search Engine: Facilities by which computer users can search for required information on online networks, notably the Internet and World Wide Web. Major providers like Google, Yahoo! and MSN collect data on searches for unspecified uses that are retained for unspecified periods of time.
- Secondary analysis: Analysis of data conducted by those other than the original collectors of the data.
- Secure Socket Layer: A communication protocol whose primary goal is to provide private and reliable communication between two computer applications.
- Seed set: A set of web pages purposively selected to satisfy a query. See also ‘scraper’ and ‘spider.
- Semantic Web: The use of formal knowledge representation techniques to comment and annotate web (or Grid) resources. Unlike the already familiar web, in the Semantic Web, resources are commented and annotated explicitly to form expressive knowledge bases, enabling automation of various procedures in application to data resources. Autonomous computer programs (‘agents’) can access the content for selective retrieval and analysis. Key feature of the Semantic Web is the imposition of structure on data to facilitate automated processing.
- Sensor studies: Research involving the collection of data from sensors applied to people or objects (‘tagging’) to track behavioural information, including transactions. Also called ‘remote sensor studies.
- Server Log File: An automated system recording network search terms and date and time of requests.
- Server-side: Computer resources such as programs or information that is not held on the user s computer but on or from the server to which the computer is linked. See also ‘client-side.
- SOAP: An XML-based protocol for exchanging structured information in a decentralised, distributed environment. See also ‘Extensible Markup Language.
- Social affordance: A feature of a system that enables a form of social interaction.
- [Page 556]Social desirability effect: When research subjects provide responses that they think the researcher wants to hear or that they think put them in a good light, rather than their actual views or behaviour.
- Social Network Analysis: Techniques for the analysis of social networks and their role in social behaviour. Not confined to, but greatly enhanced by, online information and resources.
- Social shaping: A social science concept referring to the role that technology has in shaping society and social relations and the way that society and social relations affect the development and application of technologies.
- Spam: Unwanted online communication, usually received via e-mail and generally containing advertising. Named after a 1970s’ Monty Python TV sketch in which a restaurant offered only Spam, Spam and more Spam. Spam is a formed meat product sold in cans.
- Spider: A special class of scraper that follows links between web pages and collects information along the way. Data for spiders often comes from a seed set. See also ‘scraper’ and ‘seed set’.
- Structured Query Language (‘SQL’): A standard language for inserting data into a database, selecting subsets, aggregating results and producing ‘reports’.
- Text mining: Techniques employing computer applications to analyse large volumes of text in order to identify patterns, concordances (associations) and links between hitherto unrelated information sources in the public domain.
- Tie: Relationship between nodes in a network. See ‘arc’, ‘edge’.
- Triangulation: The use of different research methods in combination, originally to test for ‘convergent validation’, but now more often to enable a fuller, richer account of the phenomenon under study. A frequent combination is of one or more quantitative and one or more qualitative methods. Also called ‘mixed methods’, ‘multiple methods’ or ‘multi-method’.
- URL (Uniform Resource Locator): The unique address of an Internet resource.
- Usability: Level of difficulty of using an online application. Affects response rate and data quality.
- Usenet: A distributed conversation system much used as source of aggregated message records for social network analysis.
- Vector data: Coordinate-based geographical data. Also see ‘Geographical Information System’ and ‘Raster data’.
- Voice Over Internet Protocol (VOIP): Internet telephony (e.g., Skype or Video MSN), later versions of which enable users to communicate via computers with the advantage that they can see as well as hear each other.
- Web crawler: Also called a ‘Web spider’ or ‘Web robot’. A program or automated script which browses the World Wide Web in a methodical, automated manner. Many sites, in particular search engines, use this means to provide up-to-date data. Web crawlers are mostly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used to automate maintenance tasks on websites, such as checking links or validating HTML code. Can also be used to gather specific types of information from web pages, such as harvesting e-mail addresses. See also ‘Spider’ and ‘Scraper’.
- Web log: Information available from Common Logfile Format data and clickstream analysis. See entries for ‘Common Logfile Format’ and ‘Clickstream analysis’.
- Webometrics: Measures of online activity, both automated and human, that provide information about online behaviour. Techniques and sources expand in tandem with technological development.
- [Page 557]Web Services: A software system designed to support interoperable machine or application-oriented interaction over a network.
- Web survey: A social survey conducted over the web. Also called an ‘Internet Survey’.
- Web 2.0: A development of the original World Wide Web providing features promoting user participation and support for large-scale social networking applications.
- Wiki: A computer application enabling incremental contributions by multiple users to elaborate a resource ranging from single documents to encyclopaedias and dictionaries.
- Wrapper: Components of middleware systems with which datasets can be web-enabled or Grid-enabled. They resolve dataset heterogeneity and thus enable integration of different datasets.
- XML: See ‘Extensible Mark Up Language’.