Data Management

Paul J.Lavrakas

doi:10.4135/9781412963947

Entry
Reader's guide
Entries A-Z
Subject index

Return to Entries

Data Management

Edited by:
Paul J. Lavrakas
In:Encyclopedia of Survey Research Methods
Chapter DOI:https://doi.org/10.4135/9781412963947.n123
Subject:Survey Research
Keywords:surveying; surveying; surveys

Request Permissions

Show page numbers Hide page numbers

Longitudinal projects and other large surveys generate large, complex data files on thousands of persons that researchers must effectively manage. The preferred data management strategy for such large, complex survey research projects is an integrated database facility built around modern relational databases. If one is dealing with a relatively small, simple questionnaire, many carefully implemented methods for data collection and data management will work. What needs to be done for technologically complex surveys touches upon all the considerations that can be given to less complex survey data sets.

As the scale, scope, and complexity of a survey project grow, researchers need to plan carefully for the questionnaire, how the survey collects the data, the management of the data it produces, and making the resultant data readily available for analysis. For these steps to run smoothly and flow smoothly from one to the other, they need to be integrated. For these reasons, relational database management systems (RDBMS) are effective tools for achieving this integration. It is essential that the data file preserve the relationships among the various questions and among the questionnaire, respondent answers, the sampling structure, and respondent relationships. In birth cohort or household panel studies there are often complex relationships among persons from the same family structure or household. In longitudinal surveys there are also complex relationships among the answers in various waves that result from pre-fills (i.e. data carried forward) from previous surveys and bounded interviewing techniques that create event histories by integrating lines of inquiry over multiple rounds of interviewing. Failure to use an RDBMS strategy for a longitudinal survey can be considered a serious error that increases administrative costs, but not using RDBMS methods in large and complex cross-sectional surveys can be considered just as big an error.

Structure

Questionnaires often collect lists, or rosters, of people, employers, insurance plans, medical providers, and so on and then cycle through these lists asking sets of questions about each person, employer, insurance plan, or medical provider in the roster. These sets of related answers to survey questions constitute some of the tables of a larger relational database in which the connections among the tables are denned by the design of the questionnaire. One can think of each question in a survey as a row within a table, with a variety of attributes that are linked in a flexible manner with other tables. The attributes (or columns) within a question table would contain, at a minimum, the following:

The question identifier and the title(s) associated with the variable representing the question's answer with the facility to connect the same question asked in [Page 178]different sweeps or rounds of a longitudinal survey. This same facility is useful in repeated cross-sections.
Descriptors that characterize or index the content of the question (alcohol use, income, etc.).
The question text.
A set of questions or check items that leads into the question (in practice this information is contained in the skip patterns of contingency questions).
A set of allowable responses to the question and data specifications for these allowable responses (whether the answer is a date, time, integer, dollar value, textual response, or a numerical value assigned to a categorical response, such as 1—Yes, 0—No).
For multi-lingual surveys, there would be separate tables for question text and pick-lists for each language. This greatly simplifies the preparation and management of different survey versions for different languages that share the same core structure.
Routing instructions to the next question, including branching conditions driven by the response to the current question, or complex check items that are contingent on the response to the current question as well as previous responses.
Real-time edit specifications imposed upon dates, currency amounts, and other numerical (i.e. non-pick-list) data, such as numerical values that require interviewer confirmation (soft range checks) or limits on permissible values (hard range checks).
Pre-loaded values.
Text fill specifications.
Instructions to assist the interviewer and respondent in completing the question and/or show cards, audio files used for audio computer-assisted self-interviews.
Date and time stamps for the question, indicators of multiple passes through the question, and time spent in the question (this preserves an audit trail for each step in the questionnaire).
Archival comments about the accuracy or interpretation of the item or its source or “See also notes” referring the user to associated variables that are available to users in the data set.
Notes to the support staff about complexities associated with the question to document the internal operation of the survey.
Links to supporting documentation produced by the survey organization or, in the case of standard scales or psychometric items, a URL to more comprehensive documentation on the item.

These attributes of questions often are referred to as “metadata.” With RDBMS methods these pieces of information that describe a question are automatically connected to the variables generated by that question. For example, metadata include which questions lead into a particular question and questions to which that question branches. These linkages define the flow of control or skip pattern in a questionnaire. With a sophisticated set of table definitions that describes virtually any questionnaire, one can “join” tables and rapidly create reports that are codebooks, questionnaires, and other traditional pieces of survey documentation. The questionnaire itself is not “programmed” but rather is formed by the successive display on the screen of the question's characteristics, with the next question determined either by direct branching or by the execution of internal check items that are themselves specified in the question records. Sequential queries to the instrument database display the questions using an executable that does not change across surveys but guides the interview process through successive question records.

...

Sign in to access this content

Get a 30 day FREE TRIAL

Watch videos from a variety of sources bringing classroom topics to life
Read modern, diverse business cases
Explore hundreds of books and reference titles

No internet connection.

All search filters on the page have been cleared.

Your search has been saved.

Entry

Reader's guide

Entries A-Z

Subject index

Data Management

Structure

Sign in to access this content

Get a 30 day FREE TRIAL

Read next

More like this

Sage Recommends