Analysing Social Science Data Using R

Course and Graduate School for Social Research (GSSR). Spring semester of 2012/2013.

Location: GSSR computer lab, Pałac Staszica, 2nd floor.

Time: 14:00 - 16:00

Recent news and updates:

  • R Color Reference Sheet (2013-04-17 03:49)
    R has a built-in collection of 657 colors that you can use in plotting functions by using color names. There are also various facilities to select color sequences more systematically: Color palettes and ramps available in packages RColorBrewer and colorRamps. R base functions colorRamp and colorRampPalette that you can use to create your own color […]
  • Course meeting cancelled (Analysing Social Science Data Using R) (2013-03-11 11:42)
    Unfortunately today course meeting is cancelled. The course will resume next week normally. I’m sorry for the inconvenience.
  • Errata to tutorial script 3 (Analysing Social Science Data Using R) (2013-03-02 02:17)
    This is the first update post regarding the course “Analysing Social Science Data Using R” in GSSR. There is a dedicated tag “r4sns” and an RSS feed (also visible on the course webage). If you recall the meeting from February 25, there was one issue in the tutorial script that did not work. It involved […]

Course description

Quantitative data analysis skills are an important part of the curriculum of a contemporary social scientist. The goal of this course is to instruct the participants how to perform statistical analysis using R. R (http://www.r-project.org) is a free and open-source software for statistical computing and data visualization, and is considered a “lingua franca” of data analysis among academics and professionals. It is more and more popular among social scientists and in commercial environments. R is widely recognized for its power, unsurpassed data visualization capabilities, and ability to implement any statistical method or model that can be brought to bear. The main objective of the course is to train the participants in using R for a typical analysis of data in sociology (or other social science disciplines), i.e.: basic manipulation of data (recoding, variable transformation), computing descriptive statistics, visualizing data, estimating regression models. Acquired basic R skills will enable the participants to easily learn how to use any of the 3000+ R supplementary packages that implement grand majority of modern statistical methods.

The course is designed to be rather self-contained. However, should a need arise, the following books provide a handy reference to the covered material:

  • Fox, J. and Weisberg, S. 2011. An R Companion to Applied Regression, 2nd Edition. Sage
  • Muenchen, R. 2009. R for SAS and SPSS Users. Springer
  • Agresti, A. and Finlay, B. Statistical Methods for Social Sciences, Prentice Hall

Course syllabus: syllabus.pdf

Data files

Data files used in tutorial scripts and in the lab.

File URL Description
pgss1999in.tab http://www.bojanorama.pl/_media/r4sns2012:pgss1999in.tab Data from PGSS and ISSP 1999 containing a battery of questions on estimated and “fair” incomes for several social/occupational categories, plus some demopgraphic variables.
pgss1.sav http://www.bojanorama.pl/_media/r4sns2012:pgss1.sav Data from PGSS editions 1999 and 2008. Selected variables

If during the lab you cannot save the file to disk then please use the URL in the second column in data-loading functions, e.g.:

spss <- read.spss("http://www.bojanorama.pl/_media/r4sns2012:pgss1.sav")
tab <- read.table("http://www.bojanorama.pl/_media/r4sns2012:pgss1999in.tab",
                  sep="\t", header=TRUE, as.is=TRUE)

Schedule

There will be 12 meetings including the final test:

February 11

Organizational matters. What is R and RStudio. Installation. Basics of the interface.

Files: slides

February 18

R basics:

  • R as an advanced calculator
  • Creating objects
  • Numeric and character vectors
  • Functions
  • Help system

Files: slides, R script, Homework template.

February 25

Importing data into R and working with data frames

  • R's working directory.
  • Importing data into R from:
    • plain text files (tab-delimited, CSV)
    • SPSS files
  • Data frames and functions: $, attach, detach, with, and subset.
  • Basic descriptive statistics: computing means, variances, standard deviation, median, quantiles etc.
  • Handling missing data.

Files: slides, tutorial script, homework template

March 4

High-level plotting functions (barplot, hist, plot) and their customization including modifying colors, line styles, fonts, titles, legends, etc.

Files: tutorial script, lab assignment, ''pch'' reference card

March 18

Summary of vectors and data frames (indexing, recoding), using functions like replace, which, %in%.

New stuff:

  • Factors
  • Categorizing continuous variables with cut
  • Creating ranks
  • Creating cross-tabulations

Files: slides, tutorial script including lab exercise, and homework, homework solution.

March 25

Creating and working with frequency tables (aka crosstabs).

  • New object types: matrices and arrays.
  • Creating functions.
  • Performing Chi-square tests.

Files: slides, data, tutorial script, assignment, assignment solutions

April 8

Conditional descriptions, data aggregation, and merging

  • Computing conditional probabilities, means, variances, etc. Functions: apply and tapply.
  • Aggregating data with aggregate.
  • Merging datasets with merge.

Files: slides, script, exercise.

April 22

Advanced data visualization. Creating custom plots from scratch.

Files: slides, script, GUS data (tab-delimited), color reference sheet (PDF)

April 29

Linear regression, part 1.

Files: slides, script, data, assignment.

May 6

Linear regression, part 2.

Files: slides, script, and assignment

May 13

Binary logistic regression.

Files: slides, script, data on presidential elections 2005.

May 27

Final test: data, template.