Expansion of diverse types of data being generated as led to number of analysis that every data scientist should know. They range from the least to most complex in nature, in terms of cost, time and knowledge. In summary, there are six diverse types of analysis:
This discipline revolves around quantitatively describing the key features of a dataset. In short, it describes a set of data against number of problems / questions at hand.
- Typically, it is the first kind of analysis which is being performed on the dataset.
- Commonly applied on a large set of data, such as population or census data.
- The interpretation and description phases are different from each other.
- Univariate and Bivariate are the couple of statistical descriptive analysis. Univariate is the analysis where data has only one variable and it is the simplest of analysis. Bivariate is the analysis where data consist of two variables.
It consists of analyzing datasets to find previously unknown relationships. It enables the data analyst to explore the different variables and generate visualizations as well.
- This type of analysis is good for discovering new connections.
- They are also useful for defining future predictions, based upon models.
- It may not fully answer the questions at hand, however; it may give you a good start in your data journey.
- Exploratory analysis shouldn’t be used for predicting alone.
It is all about testing various theories about the nature of the world based upon on the samples of variables taken from the same world. In other words, it uses small sample of dataset and infer that the population data has such characteristics or can be inferred from that small sample.
- Inference is the goal of the statistical models.
- Inference depends on both the population and the sample selected.
- Observational, cross sectional time study, and retrospective datasets are generally used.
It consist of various type of methods for analyzing current and historical patterns for prediction of future events. In essence, to use the data of an object and use that dataset for prediction of another object.
- Precise prediction depends on measuring the right variables within a dataset.
- Although there are various prediction models, however more data and simpler model works comparatively better.
- A single dataset is used for the prediction purposes. Part of which is used for training the model onto it, and when it succeeds a benchmark then the rest of the dataset is used for testing purposes.
This type of analysis is simply used to analyze what will happen to a variable when you change another variable within the same dataset.
- Randomized study of data is used usually for implementation.
- There are different ways of inferring causation in studies where non-randomized data is used.
- For data analysis, these casual models are regarded as gold standard.
This type of analysis requires most effort among other types of analysis. It is all about understanding the exact changes in variables that leads to changes in other variables for individual objects.
- They are extremely hard to infer, except for in the simple situations.
- Usually modeled by a deterministic set of equations which are around physical or engineering science.
- Normally, the random component of the dataset is measurement error.
- If we know the equations but the parameters are not known, they may be inferred with data analysis.
- Randomized trial data set is used for mechanistic data analysis.