A 2023 Primer on Data Analytics

# A 2023 Primer on Data Analytics

Part II: Key Analytics Models – Regression analysis, Monte Carlo simulations and Factor analysis

## This is the second of an article series on where data analytics stands today and what to look forward to in the coming year. Read Part I here.

Whilst analysing data for trends and patterns almost always depends on the kind of data being analysed and the kind of insights expected from it, there are a few key models that need to be at the fingertips of any analyst worth their salt.

Regression analysis: This type of model determines the relationship between two or more given sets of variables in order to identify crucial patterns and trends between them. You’re essentially checking to see if a dependent variable (the variable whose outcome you want to predict or measure) has a correlation between a number of independent variables (i.e. factors which affect the independent variable). This is especially useful for making predictions and forecasting future trends.

For example, an analyst can use regression analysis to correlate social spending with sales revenue to understand what the impacts on social investments have been on sales so far. Here, sales revenue acts as the dependent variable, whilst social spending is the independent variable.

Monte Carlo simulation: A multiple probability scenario, a Monte Carlo simulation calculates probability distributions of a number of uncertainties and gives an outcome for a number of events, and the likelihood of each event.

For example, assume your variable of interest is profit. Profits in an organisation may be affected by a number of different variables such as number of employees, marketing spend, number of sales, etc. If all the input variables had definite values, it would be easy to calculate profit. However, when the inputs become variable, a Monte Carlo simulation can be used to calculate profits under a number of variable scenarios, i.e., variable sales, employees etc. CareerFoundry writes:

It does so by replacing all uncertain values with functions which generate random samples from distributions determined by you, and then running a series of calculations and recalculations to produce models of all the possible outcomes and their probability distributions.

The Monte Carlo method is one of the most popular techniques for calculating the effect of unpredictable variables on a specific output variable, making it ideal for risk analysis.

Factor Analysis: Factor analysis is particularly useful when there is a large mass of data that needs to be shrunk to a smaller size for better understanding. Organisations often reduce variables by extracting all commonalities into a number of smaller factors – the underlying assumption is the fact that multiple separate, observable values will correlate with each other because of intrinsic similarities in features. This helps uncover a number of hidden factors and how said factors overlap one another.

Consider, for example, an organisation sends out a survey asking a hundred different questions to its consumers – from questions like their average household income, to things like how much they’re willing to spend on groceries per month. Now, once the survey data has been collected, instead of looking at the hundred different responses of each of the respondents, it is much more useful to group together responses using factor analysis. CareerFoundrywrites:

Factor analysis works by finding survey items that are strongly correlated. This is known as covariance. So, if there’s a strong positive correlation between household income and how much they’re willing to spend on groceries each month (i.e., as one increases, so does the other), these items may be grouped together. Together with other variables (survey responses), you may find that they can be reduced to a single factor such as “consumer purchasing power”. 