Foundations for Statistical Inference Continued
In cross-sectional study design the data is collected at one time point
Example: GPA of CSUCI students that do and don’t exercise regularly
In longitudinal study design the data is collected on the same subjects at multiple time points
Example: A study comparing weight-loss diets with a primary outcome of change in body weight after two years.
We will focus on modeling cross-sectional studies in our class
The unit of observation is the entity or individual on which data is collected and studied.
Example: Do CSUCI students who exercise regularly have higher GPA?
The units of observation are the students.
Our data should contain one row per student and each column would represent a variable (i.e. one column each for regular exercise, GPA).
In experiments the researchers assign the treatment groups, the levels of the explanatory variable.
Often we include a control which serves as a benchmark for comparison.
Example: Treating septic shock
In a study to treat septic shock, Hwang et al. (2020) used two study groups of equal size: one group received an intravenous infusion of vitamin C and thiamine and the other group received intravenous saline.
Explanatory variable: Injection received
Example: Reducing heart attack rate
Suppose we want to learn if a new drug, drug A, can lower heart attack rate.
What should be
If we think a subject knowing what treatment they got will affect their outcome we can do a blinded study, meaning the subjects do not know what treatment they received.
Example: Treating septic shock
Subject are not told what injection they received.
Example: Do pre-workout supplements improve athletic performance?
Subjects are not told if the supplement they receive is pre-workout or a placebo.
It could just be the idea of pre-workout that helps improve performance.
If we think people conducting the research knowing what treatment the subject got will affect the outcome we can do a double-blinded study.
Subjects and people conducting the research do not know who received what treatment
Example: Experimental Alzheimer’s drug
Alzheimer’s is currently irreversible. Maybe a doctor will unintentionally put in more effort at check ups for people they know received the drug.
Careful planning is needed to still record who did and did not receive the experimental drug.
All research must be ethical, and must meet ethical guidelines, to minimize risk of harm to the environment, property and to participants, and to preserve the well-being, dignity, rights and safety of participants (including animals).
Most research studies require an ethics committee to formally grant ethics approval before research begins. Institutional Review Boards (IRB) are in place to review and approve/disapprove studies.
CSUCI IRB states
A Classroom project that will be used for teaching and learning the research tools in the classroom or for pedagogy and will not be published, or presented beyond the classroom does not require IRB review and approval.
Example: Confidentiality
Example: Storage of data
Example: Consent
Example: Economic risks
Example: Harm or discomfort
Example: Methods
Example: Analysis
Example: Acknowledgements
Example: Pagiarism
Reproducible research enables others to repeat the study and analysis to confirm findings
Methods, data (when appropriate), analysis and relevant computer code must be made available when possible
There are serious medical consequences to errors attributable to the effects of spreadsheet programs and software operated through a graphical user interface […] that could have been avoided through a reproducible research paradigm…
Simons and Holmes (2019), p. 471
Numerical (or quantitative) variables are mathematically numerical: the numbers have numerical meaning, and represent quantities or amounts. Numerical variables generally arise from counting or measuring.
Discrete numerical variables has a countable number of possible values between any two given values
Example: Number of computers in a classroom
Continuous numerical variables have an infinite number of possible values between any two given values
Example: A person’s height
Sometimes, discrete quantitative data with a very large number of possible values may be treated as continuous.
Categorical (or qualitative) variables are not mathematically numerical data: they comprise mutually exclusive categories or labels.
Nominal categorical variables are categorical variables where the levels do not have a natural order.
Example: Blood types: Type A, Type B, Type AB, Type O
Ordinal categorical variables are categorical variables where the levels do have a natural order.
Example: Likert scale response: Strongly agree, agree, neutral, disagree, strongly disagree
Example: Age groups: 0-18, 19-55, 56+
Categorical types:
character: takes string values (e.g. a person’s name, address)factor: categorical variables with different levelslogical: TRUE (1), FALSE (0)Numerical types:
numeric: integer or doubleinteger: integer (for discrete)double: floating decimal (for continuous)Microbiome of clothing items worn for a single day in a non-healthcare setting