Which of the following is a common indicator of multicollinearity when examining a correlation matrix?
High correlation coefficients between the dependent variable and independent variables
Low correlation coefficients between all independent variables
High correlation coefficients between some independent variables
Negative correlation coefficients between the dependent and independent variables
What happens to the bias and variance of a linear regression model as the regularization parameter (lambda) increases?
Bias increases, Variance increases
Bias increases, Variance decreases
Bias decreases, Variance increases
Bias decreases, Variance decreases
What is a potential drawback of removing a highly correlated independent variable to deal with multicollinearity?
It may lead to an increase in the model's complexity.
It may result in a loss of valuable information and reduce the model's accuracy.
It may improve the model's overall fit but reduce its interpretability.
It has no drawbacks and is always the best solution.
How does the Variance Inflation Factor (VIF) quantify multicollinearity?
By measuring the change in R-squared when an independent variable is added to the model
By determining the difference between the predicted and actual values of the dependent variable
By measuring the correlation between two independent variables
By calculating the proportion of variance in one independent variable explained by all other independent variables
How does stepwise selection work in feature selection?
It ranks features based on their correlation with the target variable and selects the top-k features.
It iteratively adds or removes features based on a statistical criterion, aiming to find the best subset.
It uses L1 or L2 regularization to shrink irrelevant feature coefficients to zero.
It transforms the original features into a lower-dimensional space while preserving important information.
Poisson regression, another type of GLM, is particularly well-suited for analyzing which kind of data?
Continuous measurements
Proportions or percentages
Count data of rare events
Ordinal data with a specific order
In which scenario might you prefer Huber regression over RANSAC for robust regression?
When the proportion of outliers is relatively small
When the outliers are expected to be clustered together
When dealing with high-dimensional data with a large number of features
When it's important to completely discard the outliers from the analysis
How do you interpret a standardized coefficient (beta) in multiple linear regression?
It represents the unstandardized effect of the predictor on the outcome.
It indicates the statistical significance of the predictor.
It determines the goodness of fit of the regression model.
It represents the effect of a one-unit change in the predictor on the outcome, in standard deviation units.
A model has a high R-squared but a low Adjusted R-squared. What is a likely explanation?
The model is too simple.
The model is a perfect fit.
The model has high bias.
The model is overfitting.
What is the primary motivation for using robust regression over ordinary least squares (OLS) regression?
To improve the interpretability of the regression coefficients
To handle datasets with non-linear relationships between variables more effectively
To reduce the computational complexity of the regression analysis
To mitigate the impact of outliers on the fitted regression line