Which of these is NOT a recommended approach for dealing with outliers in linear regression?
Transforming the data to reduce the outlier's influence
Using robust regression methods less sensitive to outliers
Automatically removing all outliers without investigation
Investigating the cause of the outlier and correcting errors if possible
The performance of the Theil-Sen estimator can be sensitive to which characteristic of the data?
The presence of categorical variables
The presence of heteroscedasticity (unequal variances of errors)
The presence of multicollinearity (high correlation between independent variables)
The non-normality of the residuals
What does a high Cook's distance value indicate?
The observation has both high leverage and high influence
The observation has high leverage but low influence
The observation is not an outlier
The observation has low leverage but high influence
What is a key advantage of using Elastic Net Regression over Lasso Regression when dealing with highly correlated features?
Elastic Net can select groups of correlated features together, while Lasso might select only one feature from the group.
Elastic Net tends to outperform Lasso when the number of features is much larger than the number of samples.
Elastic Net is less prone to overfitting than Lasso when dealing with noisy datasets.
Elastic Net is computationally less expensive than Lasso for high-dimensional data.
How do hierarchical linear models help avoid misleading conclusions in nested data analysis?
By assuming all groups have the same effect on the outcome
By ignoring individual-level variations
By treating all observations as independent
By accounting for the correlation between observations within groups
When using Principal Component Analysis (PCA) as a remedy for multicollinearity, what is the primary aim?
To create new, uncorrelated variables from the original correlated ones
To introduce non-linearity into the model
To remove all independent variables from the model
To increase the sample size of the dataset
What does heteroscedasticity in a residual plot typically look like?
A random scattering of points
A funnel shape, widening or narrowing along the x-axis
A U-shape or inverted U-shape
A straight line with non-zero slope
In multiple linear regression, what does a coefficient of 0 for a predictor variable indicate?
The variable has a non-linear relationship with the outcome.
The variable is not statistically significant.
The variable is perfectly correlated with another predictor.
The variable has no impact on the predicted value.
Huber regression modifies the loss function used in OLS regression. How does this modification help in handling outliers?
It increases the learning rate of the regression model for outlier data points
It transforms all data points to follow a normal distribution
It assigns lower weights to data points that deviate significantly from the predicted values
It completely ignores data points identified as outliers
Why might centering or scaling independent variables be insufficient to completely resolve multicollinearity?
It doesn't address the fundamental issue of high correlations between the variables.
It only works for linear relationships between variables.
It can make the model more complex and harder to interpret.
It requires a large sample size to be effective.