saoter commited on
Commit
4a54679
1 Parent(s): 7b0b579

Upload questions_working.csv

Browse files
Files changed (1) hide show
  1. questions_working.csv +78 -0
questions_working.csv ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Unnamed: 0,ID,Question,Level,Count,Status,Answer
2
+ 1,V1,What is the difference between a sample and population?,Very Easy,0,Not Asked,"The main difference between a sample and a population is their size and scope. A population refers to the entire group or collection of individuals, items, or data points that you are interested in studying and drawing conclusions about. It is the complete set of all possible observations relevant to a particular question or problem. In contrast, a sample is a subset of the population, typically smaller in size, that is selected for the purpose of conducting research or analysis. Sampling involves choosing a representative group from the population to make inferences about the entire population."
3
+ 2,V2,Why do we collect or analyse a sample for inferential statistics? Why not study the whole population?,Very Easy,0,Not Asked,"We collect and analyze samples for inferential statistics rather than studying the whole population for several reasons. First, it is often impractical or impossible to study an entire population due to factors like cost, time constraints, or logistical challenges. Second, taking a sample allows us to make inferences about the population as a whole while reducing the resources required. Third, when done properly, sampling can provide accurate and reliable estimates of population parameters. Statistical techniques are then used to generalize the findings from the sample to the larger population, allowing us to draw conclusions and make predictions with a degree of confidence. This approach is not only more feasible but also more efficient for many real-world research scenarios."
4
+ 3,V3,"What is mean, median? Why should we care about these statistics?",Very Easy,0,Not Asked,"Mean and median are measures of central tendency in statistics. The mean is the average of a set of values, calculated by adding up all the values and dividing by the number of values. The median is the middle value in a dataset when the values are arranged in ascending order. We care about these statistics because they provide insights into the typical or central value of a dataset. The mean is sensitive to extreme outliers and can be skewed, while the median is more robust in the presence of outliers, making them both valuable tools for summarizing data."
5
+ 4,V4,Define numerical variable and categorical variable. Provide examples.,Very Easy,0,Not Asked,"A numerical variable is a type of variable that represents quantities and can be measured on a numerical scale. Examples include age, height, and temperature. A categorical variable, on the other hand, represents categories or labels and cannot be measured on a numerical scale. Examples include gender (male, female), color (red, blue, green), or type of vehicle (car, bike, bus)."
6
+ 5,V5,"If one flips a fair coin many times, what is the probability of getting a heads?",Very Easy,0,Not Asked,"The probability of getting heads when flipping a fair coin is 0.5 or 50%. This is because there are two equally likely outcomes: heads and tails, and each has a 50% chance of occurring."
7
+ 6,V6,"If one flips a fair coin many times, what is the probability of getting a tails? ",Very Easy,0,Not Asked,"The probability of getting tails when flipping a fair coin is also 0.5 or 50%. Like the probability of getting heads, it's 50% because there are two equally likely outcomes, and each has an equal chance of happening."
8
+ 7,V7,"If one rolls a fair six-sided die {1, 2, 3, 4, 5, 6}, what is the probability of each face occurring?",Very Easy,0,Not Asked,"When rolling a fair six-sided die, each face has an equal probability of occurring, which is 1/6 or approximately 16.67%. This is because there are six possible outcomes, and they are all equally likely."
9
+ 8,V8,What is null hypothesis and alternative hypothesis? Give an example.,Very Easy,0,Not Asked,"The null hypothesis (H0) is a statement or assumption that there is no significant difference or effect, and any observed differences are due to random variation. The alternative hypothesis (H1 or Ha) is a statement that there is a significant difference or effect. For example, in a drug trial, the null hypothesis might be that the new drug has no effect on patients' symptoms, while the alternative hypothesis would be that the drug does have a significant effect."
10
+ 9,V9,"If we want to investigate a population's variability, what parameter will be investigate?",Very Easy,0,Not Asked,"To investigate a population's variability, we typically look at the parameter known as the population variance. Variance measures how spread out or dispersed the values in a population are from the population mean. It quantifies the extent of variability in the data."
11
+ 10,V10,"In the linear regression equation $ Y = \alpha + \beta X + \epsilon $, what are variables X and Y?",Very Easy,0,Not Asked,"In the linear regression equation Y = ? + ?X + ?, variable X represents the independent variable or predictor, which is used to predict the dependent variable Y. Y is the dependent variable or the outcome we are trying to predict."
12
+ 11,V11,"In the linear regression equation $ Y = \alpha + \beta X + \epsilon $, what are parameters alpha and beta?",Very Easy,0,Not Asked,"In the linear regression equation Y = ? + ?X + ?, parameter ? (alpha) represents the y-intercept, which is the value of Y when X is 0. Parameter ? (beta) represents the slope of the regression line, indicating how much Y changes for a unit change in X."
13
+ 12,V12,The slope parameter $ \hat{\beta} $ of a simple linear regression is +2. Interpret this number.,Very Easy,0,Not Asked,"A slope parameter, such as $\hat{\beta} = 2$, in a simple linear regression means that for every one-unit increase in X, Y is expected to increase by 2 units. This indicates a positive and linear relationship between X and Y, where X has a strong positive effect on Y."
14
+ 13,V13,The slope parameter $ \hat{\beta} $ of a simple linear regression is -2. Interpret this number.,Very Easy,0,Not Asked,"A slope parameter, such as $\hat{\beta} = -2$, in a simple linear regression means that for every one-unit increase in X, Y is expected to decrease by 2 units. This indicates a negative and linear relationship between X and Y, where X has a strong negative effect on Y."
15
+ 14,E1,What are the major types of data statisticians work with?,Easy,0,Not Asked,"Statisticians work with two major types of data: categorical data, which includes labels or categories, and numerical data, which includes measurable values that can be either discrete or continuous."
16
+ 15,E2,What are some of the challenges or issues in sampling of data? List out any three.,Easy,0,Not Asked,"Some challenges in sampling data include sampling bias, which can lead to unrepresentative samples; non-response bias, where selected individuals do not participate; and determining an adequate sample size to ensure statistical reliability."
17
+ 16,E3,"Why do we use sampling in statistics? What is the meaning of random sampling, can you provide an example?",Easy,0,Not Asked,"We use sampling in statistics to make inferences about a larger population efficiently. Random sampling is a method where each individual or element in the population has an equal and independent chance of being selected, ensuring that the sample represents the population."
18
+ 17,E4,What does standard deviation in a dataset tell us? What does it mean when standard deviation is small or large?,Easy,0,Not Asked,"The standard deviation in a dataset measures the spread or variability of the data points around the mean. When the standard deviation is small, it indicates that the data points are closely clustered around the mean, suggesting low variability. Conversely, a large standard deviation implies that the data points are more spread out from the mean, indicating higher variability in the dataset."
19
+ 18,E5,Provide examples for discrete and continuous data.,Easy,0,Not Asked,"Examples of discrete data include the number of students in a classroom, the count of cars in a parking lot, and the number of books on a shelf. Examples of continuous data include height, weight, temperature, and time."
20
+ 19,E6,List out one difference between discrete and continuous variables. Provide an example of each.,Easy,0,Not Asked,"One difference between discrete and continuous variables is that discrete variables can only take on specific, distinct values (often counted in whole numbers), while continuous variables can take on any value within a range. For example, the number of children in a family is discrete (1, 2, 3, etc.), while the height of individuals is continuous (can be any value within a range)."
21
+ 20,E7,"What does it mean when one says - ""90th percentile""?",Easy,0,Not Asked,"When one says ""90th percentile,"" it means that 90% of the data falls below or is less than that particular data point. It is a measure of relative standing in a dataset."
22
+ 21,E8,"What does it mean when one says - ""10th percentile""?",Easy,0,Not Asked,"When one says ""10th percentile,"" it means that 10% of the data falls below or is less than that particular data point. It also represents a measure of relative standing in a dataset but at the lower end."
23
+ 22,E9,Explain any symmetric distribution - either discrete or continuous.,Easy,0,Not Asked,"A symmetric distribution is one in which the left and right sides of the distribution are mirror images of each other. In other words, the distribution is balanced around its central point (mean/median). The normal distribution is a classic example of a continuous symmetric distribution."
24
+ 23,E10,"Properties of a binomial distribution, mention any two. Provide an example",Easy,0,Not Asked,"Two properties of a binomial distribution are: 1.It consists of a fixed number of trials, each with only two possible outcomes (success or failure).2. The probability of success (p) remains constant for each trial. An example is flipping a fair coin multiple times."
25
+ 24,E11,List out properties of the normal distribution. Mention any three.,Easy,0,Not Asked,"Properties of the normal distribution include: A) It is symmetric and bell-shaped. B) The mean, median, and mode are equal and located at the center. C) The empirical rule (68-95-99.7 rule) applies, which specifies the percentages of data within 1, 2, and 3 standard deviations from the mean."
26
+ 25,E12,Properties of continuous random variables. Mention any three.,Easy,0,Not Asked,"Properties of continuous random variables include:A) They can take on an infinite number of values within a range. B) Probability is represented by the area under the probability density function (PDF) curve. C) The probability of any specific value is technically zero, as there are infinitely many possible values."
27
+ 26,E13,What is the area under the curve for a continuous distribution?,Easy,0,Not Asked,The area under the curve for a continuous distribution represents the probability of an event occurring within a certain range of values. It signifies the likelihood of a random variable falling within that range.
28
+ 27,E14,How does mean and standard deviation affect the shape of the normal distribution?,Easy,0,Not Asked,"Mean and standard deviation affect the shape of the normal distribution as follows:A) The mean (?) determines the central point of the distribution. B) The standard deviation (?) determines the spread or width of the distribution. A larger standard deviation leads to a wider distribution, while a smaller standard deviation results in a narrower one."
29
+ 28,E15,"What is the relationship between mode, mean and median in a normal distribution?",Easy,0,Not Asked,"In a normal distribution, the mode, mean, and median are all equal and located at the same central point, which is the peak of the bell-shaped curve."
30
+ 29,E16,What is an outlier and what are the ways of dealing with them?,Easy,0,Not Asked,"An outlier is an observation that is significantly different from the other data points in a dataset. Ways of dealing with outliers include removing them, transforming the data, or conducting robust statistical analyses that are less sensitive to outliers."
31
+ 30,E17,"Under a normal distribution, what interval does 95% of the probability fall within? And for 90%?",Easy,0,Not Asked,"Under a normal distribution, approximately 95% of the probability falls within two standard deviations of the mean, and about 90% falls within 1.645 standard deviations."
32
+ 31,E18,What is the meaning of null-hypothesis? What is the objective of hypothesis testing?,Easy,0,Not Asked,"The null hypothesis (H0) is a statement of no effect or no difference, typically a default assumption. The objective of hypothesis testing is to determine whether there is enough evidence in the sample data to reject the null hypothesis in favor of an alternative hypothesis (H1 or Ha) that suggests a significant effect or difference."
33
+ 32,E19,What are the steps in hypothesis testing?,Easy,0,Not Asked,The steps in hypothesis testing typically include: A) Formulating null and alternative hypotheses. B) Collecting data and calculating a test statistic.C) Determining the significance level (alpha). D) Comparing the test statistic to a critical value or calculating a p-value. E) Making a decision to either reject or fail to reject the null hypothesis. F) Drawing conclusions based on the decision.
34
+ 33,E20,What is the z-score in the standard normal distribution? What does it measure?,Easy,0,Not Asked,The z-score in the standard normal distribution measures how many standard deviations a data point is from the mean. It provides information about the relative position of a data point within a normal distribution.
35
+ 34,E21,"Can you describe when to use what type of test (e.g., two-tailed, left-tailed, right-tailed) for testing a hypothesis?",Easy,0,Not Asked,"The choice of test (two-tailed, left-tailed, or right-tailed) in hypothesis testing depends on the specific research question and the directionality of the effect being studied. A two-tailed test is used when you are interested in any significant difference or effect, while one-tailed tests are used when you are specifically interested in a difference or effect in one direction (either greater than or less than)."
36
+ 35,E22,Write down an example for a null and two-sided alternative hypothesis.,Easy,0,Not Asked,Example of a null and two-sided alternative hypothesis: Null Hypothesis (H0): The mean test scores of Group A and Group B are equal. Alternative Hypothesis (H1 or Ha): The mean test scores of Group A and Group B are not equal.
37
+ 36,E23,Write down an example for a null and one-sided alternative hypothesis.,Easy,0,Not Asked,Example of a null and one-sided alternative hypothesis: Null Hypothesis (H0): The new treatment has no effect on the recovery time. Alternative Hypothesis (H1 or Ha): The new treatment decreases the recovery time.
38
+ 37,E24,State the null and alternative hypothesis when testing variances from two independent populations.,Easy,0,Not Asked,Null and alternative hypotheses when testing variances from two independent populations: Null Hypothesis (H0): The variances of Population 1 and Population 2 are equal (?1^2 = ?2^2). Alternative Hypothesis (H1 or Ha): The variances of Population 1 and Population 2 are not equal (?1^2 ? ?2^2).
39
+ 38,E25,State the null and alternative hypothesis when testing for difference of means from two independent populations. Either One-tail or two-tail.,Easy,0,Not Asked,Null and alternative hypotheses when testing for the difference of means from two independent populations can be either one-tailed or two-tailed:A)Two-Tailed: Null Hypothesis (H0): The means of Population 1 and Population 2 are equal (?1 = ?2). Alternative Hypothesis (H1 or Ha): The means of Population 1 and Population 2 are not equal (?1 ? ?2). B) One-Tailed (Left): Null Hypothesis (H0): The mean of Population 1 is greater than or equal to the mean of Population 2 (?1 ? ?2). Alternative Hypothesis (H1 or Ha): The mean of Population 1 is less than the mean of Population 2 (?1 < ?2).
40
+ 39,E26,Both the equal-variances and unequal variances techniques require that populations be normally distributed. How can you check if this requirement is satisfied?,Easy,0,Not Asked,"To check if the requirement of normality is satisfied for both equal-variances and unequal variances techniques, you can use statistical tests such as the Shapiro-Wilk test, Anderson-Darling test, or visual methods like a normal probability plot or histogram to assess the normality of the data distribution."
41
+ 40,E27,What is a confidence interval?,Easy,0,Not Asked,"A confidence interval is a range of values that provides a plausible range for an unknown population parameter. It quantifies the level of confidence (e.g., 95% confidence) that the true parameter falls within that interval."
42
+ 41,E28,"What is the p-value test? Based on the p-value, how would reject or fail to reject the null hypothesis.",Easy,0,Not Asked,"The p-value test is used in hypothesis testing to determine the strength of evidence against the null hypothesis. Based on the p-value, you would reject the null hypothesis if the p-value is less than or equal to the significance level (?) and fail to reject it if the p-value is greater than ?."
43
+ 42,E29,What is the significance level $ \alpha $ (alpha)? What is the relationship between \alpha and confidence level? ,Easy,0,Not Asked,"The significance level ? (alpha) is the threshold used to determine statistical significance in hypothesis testing. The relationship between ? and the confidence level is complementary, where ? represents the probability of making a Type I error (false positive), and the confidence level (1 - ?) represents the probability of correctly rejecting the null hypothesis when it is false."
44
+ 43,E30,"If you were to design a study or analyse a dataset, which level of would you choose, and why? ",Easy,0,Not Asked,The choice of significance level ? depends on the specific research goals and the acceptable level of risk for making Type I errors (false positives). Common choices are ? = 0.05 (5% significance) and ? = 0.01 (1% significance). The decision depends on the trade-off between the desire for strong evidence (lower ?) and the potential for false positives.
45
+ 44,E31,"If the p-value of the statistical analysis is less than 0.01 (or 0.05, 0.1), what does that say about the statistical significance of the result?",Easy,0,Not Asked,"If the p-value is less than 0.01 (or 0.05, 0.1), it indicates that the result is statistically significant at the corresponding significance level. In other words, there is strong evidence to reject the null hypothesis in favor of the alternative hypothesis."
46
+ 45,E32,"If the p-value of the statistical analysis is higher than 0.1, what does that say about the statistical significance of the result?",Easy,0,Not Asked,"If the p-value of the statistical analysis is higher than 0.1, it suggests that the result is not statistically significant at the 10% significance level, and there is insufficient evidence to reject the null hypothesis."
47
+ 46,E33,"When do we use the Student's t-statistic or t-distribution? (with regards to population variance, sample size)",Easy,0,Not Asked,The Student's t-statistic or t-distribution is used when dealing with small sample sizes (typically less than 30) or when the population variance is unknown. It is used for hypothesis testing and confidence interval estimation when the population variance is not known and must be estimated from the sample data.
48
+ 47,E34,"For a study on variance, the alternative hypothesis is $ H_1: \sigma^2 < 1$. What is the null hypothesis for this problem?",Easy,0,Not Asked,"For a study on variance with the alternative hypothesis $ H_1: \sigma^2 < 1$, the null hypothesis would be $H_0: \sigma^2 \geq 1$."
49
+ 48,E35,What confidence intervals are used for? How can you increase or decrease the width of a confidence interval?,Easy,0,Not Asked,"Confidence intervals are used to provide a range of values for an unknown population parameter, allowing for estimation and uncertainty assessment. To increase the width of a confidence interval, you can use a higher confidence level (e.g., 99% instead of 95%), or to decrease the width, you can use a lower confidence level (e.g., 90% instead of 95%)."
50
+ 49,E36,Why do we use OLS linear regression models?,Easy,0,Not Asked,"Ordinary Least Squares (OLS) linear regression models are used to model and understand the relationship between dependent and independent variables in a dataset. They are valuable for predicting outcomes, identifying relationships, and assessing the impact of independent variables on the dependent variable."
51
+ 50,E37,"An analysis relates the age of used cars (in years) to their price (in USD), using data on a specific type of car. In a linear regression of the price on age, the slope parameter $\hat{\beta}$ is -700. Interpret the coefficient.",Easy,0,Not Asked,"In the linear regression of the price on age for used cars with a slope parameter $\hat{\beta}$ of -700, it means that, on average, the price of the car decreases by $700 USD for each additional year of age."
52
+ 51,E38,"An analysis relates the size of apartments (in square meters) to their price (in USD), using data from one city. In a linear regression of the price on size, the slope parameter $\hat{\beta}$ is +600. Interpret the coefficient.",Easy,0,Not Asked,"In the linear regression of the price on the size of apartments with a slope parameter $\hat{\beta}$ of +600, it means that, on average, the price of apartments increases by $600 USD for each additional square meter in size."
53
+ 52,E39,"In our linear regression model output, an R-squared $\R^2$ is reported, what does it mean and what do we use it for?",Easy,0,Not Asked,"The R-squared ($R^2$) in linear regression measures the proportion of the variance in the dependent variable that is explained by the independent variables. It indicates the goodness of fit of the model, with higher values indicating a better fit."
54
+ 53,E40,What are dummy or indicator variables? Provide examples.,Easy,0,Not Asked,"Dummy or indicator variables are used to represent categorical data in regression analysis. They are binary variables (0 or 1) that indicate the presence or absence of a category. For example, in a regression model for car types, you might have a dummy variable ""SUV"" taking the value 1 if it's an SUV and 0 otherwise."
55
+ 54,E41,How do we include nominal independent variables in regression analysis?,Easy,0,Not Asked,"To include nominal independent variables in regression analysis, you can create dummy variables for each category and include them in the model. For example, if you have a ""Color"" variable with categories ""Red,"" ""Blue,"" and ""Green,"" you would create three dummy variables (e.g., ""RedDummy,"" ""BlueDummy,"" ""GreenDummy"") and use them in the regression."
56
+ 55,E42,What does correlation coefficient explain? Positive and negative correlation?,Easy,0,Not Asked,"The correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. A positive correlation means that as one variable increases, the other tends to increase, while a negative correlation means that as one variable increases, the other tends to decrease."
57
+ 56,E43,"In the linear regression model, why do we require variation in values of X? What happens if there is no variation in values of X? ",Easy,0,Not Asked,"In linear regression, we require variation in the values of the independent variable (X) to estimate the relationship with the dependent variable (Y). If there is no variation in the values of X (e.g., all X values are the same), it becomes impossible to estimate a meaningful relationship, and the model becomes unstable."
58
+ 57,E44,What is an outlier and why can they be a problem in your analysis? Provide 1 or 2 ways in dealing with outliers.,Easy,0,Not Asked,"An outlier is an extreme data point that deviates significantly from the majority of the data. They can be a problem in analysis as they can distort the regression model and lead to incorrect conclusions. Dealing with outliers can involve removing them, transforming the data, or using robust statistical techniques."
59
+ 58,E45,"In experimental design - what is a treatment group, and what is a control group?",Easy,0,Not Asked,"In experimental design, a treatment group is a group of subjects or items that are exposed to a specific treatment or intervention being studied. A control group is a group that is treated identically to the treatment group but does not receive the experimental treatment. The control group serves as a baseline for comparison to assess the impact of the treatment."
60
+ 59,M1,"What is the empirical rule, and when can it be helpful?",Moderate,0,Not Asked,"The empirical rule, also known as the 68-95-99.7 rule, states that in a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and approximately 99.7% falls within three standard deviations. It can be helpful for quickly understanding the distribution of data and assessing the likelihood of values falling within certain ranges."
61
+ 60,M2,Define central limit theorem. Why is the central limit theorem important in statistics?,Moderate,0,Not Asked,"The Central Limit Theorem (CLT) states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. It is important in statistics because it allows us to make inferences about population parameters using the properties of the normal distribution, even when the population itself may not be normally distributed."
62
+ 61,M3,What is the Law of large numbers?,Moderate,0,Not Asked,"The Law of Large Numbers (LLN) states that as the sample size increases, the sample mean approaches the true population mean. In other words, with a sufficiently large sample, the sample mean becomes a more accurate estimate of the population mean."
63
+ 62,M4,Explain sampling distribution.,Moderate,0,Not Asked,"A sampling distribution is the distribution of a statistic (e.g., sample mean or sample proportion) computed from multiple random samples drawn from the same population. It provides information about the variability of the statistic and allows for statistical inference."
64
+ 63,M5,"Interpret the formula for z-score; $z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}$. What are the numerator and denominator measuring? What does the z-score tell us about the sample mean, say if the value of z=1.5 or z=2?",Moderate,0,Not Asked,"The formula for the z-score, z = (x? - ?) / (? / ?n), calculates how many standard deviations (?) a sample mean (x?) is away from the population mean (?), with n representing the sample size. The numerator measures the difference between the sample mean and the population mean, while the denominator measures the standard error of the sample mean. A z-score of 1.5 or 2 indicates that the sample mean is 1.5 or 2 standard deviations above the population mean, respectively."
65
+ 64,M6,Design a study (or explain a study) where you will apply difference of means test to compare two independent samples. Also state the hypothesis (You can use an example from lecture slides).,Moderate,0,Not Asked,"In a study comparing two independent samples using a difference of means test, you could investigate whether a new teaching method improves students' test scores compared to a traditional teaching method. The hypothesis could be: Null Hypothesis (H0): The mean test scores of students taught using the new method are equal to the mean test scores of students taught using the traditional method. Alternative Hypothesis (H1 or Ha): The mean test scores of students taught using the new method are not equal to the mean test scores of students taught using the traditional method."
66
+ 65,M7,Can you give an example of when to run a paired t-test and when to run an independent t-test?,Moderate,0,Not Asked,"You would run a paired t-test when you have two related groups or measurements, such as before and after measurements on the same individuals. An independent t-test is used when you are comparing the means of two separate and unrelated groups, like comparing test scores between two different schools."
67
+ 66,M8,"Explain Chi-square goodness of fit tests application, provide an example.",Moderate,0,Not Asked,"Chi-square goodness of fit tests are used to determine if observed categorical data fits an expected distribution. For example, you might use it to test whether the observed distribution of eye colors in a population matches the expected distribution based on a genetic model."
68
+ 67,M9,Explain Type I and Type II errors. Provide an example.,Moderate,0,Not Asked,"Type I error occurs when you reject a true null hypothesis (false positive), while Type II error occurs when you fail to reject a false null hypothesis (false negative). An example of Type I error is convicting an innocent person (rejecting the null hypothesis of innocence), and an example of Type II error is failing to convict a guilty person (failing to reject the null hypothesis of innocence)."
69
+ 68,M10,"In regression diagnostics, what does the following figure tell us: Histogram of residuals.",Moderate,0,Not Asked,"A histogram of residuals in regression diagnostics tells us about the distribution of the differences between observed and predicted values (residuals). It helps assess whether the residuals are approximately normally distributed, which is an assumption of linear regression."
70
+ 69,M11,"In regression diagnostics, what does the following figure tell us: Plot of residuals versus predicted values of y.",Moderate,0,Not Asked,"A plot of residuals versus predicted values of y in regression diagnostics helps identify patterns or trends in the residuals. It is used to check for heteroscedasticity, which is when the variability of the residuals changes across different levels of the independent variable."
71
+ 70,M12,Explain Heteroskedasticity.,Moderate,0,Not Asked,"Heteroskedasticity is a statistical term that describes the situation where the variability of the residuals in a regression model is not constant across different levels of the independent variable(s). In other words, the spread or dispersion of the residuals changes as the values of the predictor(s) change."
72
+ 71,M13,What is multicollinearity? How can we address this issue?,Moderate,0,Not Asked,"Multicollinearity refers to a situation in which two or more independent variables in a regression model are highly correlated with each other. It can be addressed by removing one of the correlated variables, using dimensionality reduction techniques, or combining the correlated variables into a composite variable."
73
+ 72,M14,How do you interpret regression coefficients in a logistic regression?,Moderate,0,Not Asked,"In logistic regression, regression coefficients represent the change in the log-odds of the dependent variable for a one-unit change in the independent variable. They indicate how the probability of the binary outcome changes as the independent variable(s) change."
74
+ 73,M15,"Provide an example of experimental design; what is the outcome of interest, explain treatment and control groups in the study.",Moderate,0,Not Asked,"An example of experimental design might involve testing a new drug's effectiveness for reducing blood pressure. The outcome of interest is blood pressure reduction, with the treatment group receiving the new drug, and the control group receiving a placebo. The study aims to compare the drug's effect on blood pressure between the two groups."
75
+ 74,M16,What is the difference between a correlation matrix and a linear regression model?,Moderate,0,Not Asked,"A correlation matrix shows the pairwise correlations between variables, indicating the strength and direction of linear relationships. A linear regression model, on the other hand, examines the relationship between a dependent variable and one or more independent variables, providing coefficients that represent the effect of the independent variables on the dependent variable."
76
+ 75,M17,Why cannot we use linear regression models when dependent variables is binary (0/1) or choice variable?,Moderate,0,Not Asked,"Linear regression models are not suitable for dependent variables that are binary (0/1) or choice variables because the assumptions of linearity, constant variance, and normality of residuals do not hold. Instead, logistic regression models are typically used for binary outcomes."
77
+ 76,M18,Can we assume that the regression model with the highest number of predictors is the best model? Why or why not.,Moderate,0,Not Asked,"We cannot assume that the regression model with the highest number of predictors is the best model because including more predictors can lead to overfitting, where the model performs well on the training data but poorly on new data. The choice of the best model should consider model fit, simplicity, and predictive performance on independent data."
78
+ 77,M19,What is the advantage of using multiple regression model as compared to a simple regression model?,Moderate,0,Not Asked,"The advantage of using a multiple regression model compared to a simple regression model is that it allows for the consideration of multiple independent variables simultaneously, capturing the potential joint effects of these variables on the dependent variable. This can lead to a more comprehensive understanding of the relationships in the data and improved predictive accuracy."