Final test

This is the final test for the course Analyzing Social Science Data Using R in SNS.

The test consists of four exercises shown below to be solved within two hours (until 16:00). All have equal weight in the final score. When scoring the test we will primarily evaluate:

  • Arriving at the correct result.
  • Mindful use of R commands.

You have to:

  1. Download a script template file. Save it as <yourfamilyname>.R.
  2. For each exercise you will use specified data files. It is best to save them in the same folder as the script template.
  3. Use the script template to solve the exercises. Try to comment (using R comments starting with #) what you are doing. Use the comments to write the textual answers if the exercise requires you to do so.
  4. Remember to save the script file at the end!
  5. After you are done, email the script to (remember to attach the file!). In the email subject please write “R@SNS test”.

Good luck!


Exercise 1

In analyzing data from face-to-face surveys it is often overlooked that the interviewer might influence the responses of the respondent.

The file pgssint.rda contains a data frame pgssint which is an excerpt from PGSS data containing the following variables:

  • pgssyear: PGSS edition year
  • female: whether the respondent is a female.
  • intfemale: whether the interviewer is a female.
  • q9age: age of respondent.
  • q7d: whether the respondent agrees or not (1=Strongly agree, 2, 3, 4=Strongy disagree) with the statement that “it is a role of a man to earn money and a woman to take care of the household”.

Special values of the variables have been already recoded to missing data (NA).

Using this dataset answer the question to what extent the responses to question q7d depend on the gender of the respondent and the gender of the interviewer. In particular:

  1. Create a cross-tabulation of the response to q7d, respondent's gender, and interviewer's gender.
  2. Compute the percentage distribution of responses to q7d given the gender of the respondent and the interviewer.
  3. Demonstrate graphically (with e.g. a bar chart) how the responses to q7d depend on the gender of respondent and interviewer. For simplification use the total percentage of answers “strongly agree” and “agree”.

What can we conclude about the interviewers influencing respondents' answers to that particular survey question? Write your answer.

Exercise 2

Using the data pgssint (the same as in Exercise 1) investigate how the answers to question q7d vary between different cohorts and over time. In particular:

  1. Create a variable “year of birth” based on age and year of study and categorize it into intervals using breakpoints at 1940, 1960, and 1980. These will be the cohorts.
  2. Compute the (conditional) percentage of answers “strongly agree” and “agree” given the cohorts and year of study.
  3. How did opinions of the cohorts evolve during the period covered by PGSS?

Exercise 3

status.csv is a dataset containing 690 observations on 5 variables:

  • status: an estimate of subject's social status on a 10-point scale, with larger values indicating higher status
  • earnings: subject's net monthly income in PLN
  • degree: subject's degree of education
  • prestige: prestige category of subject's occupation
  • gender: subject's gender.

The data are written in a text file with variables names in the first row and columns separated by semi-colons (;).

Using the status data:

  1. Create a barplot showing mean values of income by social status. Add appropriate labels and titles to the plot and save as a PDF.
  2. Estimate a regression model that has status as a dependent variable (DV), and earnings (in thousands of PLN), degree of education, occupational prestige, and gender as independent variables (IVs). Is the DV significantly related to all the IVs?
  3. Update the model by adding an interaction effect between occupational prestige and gender. Does that improve the model's fit? Does the social standing of men and women “respond” differently to changes in prestige?
  4. Create a coplot showing a predicted relationship between status and earnings in categories of occupation and gender. Add titles and labels to the plot and save it as a PDF.

Exercise 4

Using PGSS data available in the file JustEarn (plain text file with columns separated with tabs):

  1. Read the data into R. Select a subset containing complete cases only.
  2. For each subject in the reduced data set, compute the mean perceived earnings and the mean of just earnings. Is there a positive relationship between the two?
  3. Estimate a regression model that has the log of mean of just earnings as a DV, with the log of the mean of perceived earnings as an IV.
  4. Interpret the values of the regression coefficients.
  5. How strong is the relationship between the DV and IV?