Visualizations

Types of Visualizations

Sankey diagrams are very effective to show the break-up of categories.  

 

A ridgeline plot summarises the distribution of a numeric variable for several groups.  

 

Top 10 Data Visualizations of 2022 Worth Looking at!

 

Most Important Plots in Data Science

source: "Most Important Plots in Data Science" by Ava Chawla

KS Plot: Compares the cumulative distribution functions (CDFs) of a dataset to a theoretical distribution or between two datasets to assess the distributional differences.

SHAP Plot: Summarizes feature importance to a model’s predictions, by considering interactions/dependencies between them.

QQ Plot: Assesses the distributional similarity between observed data and theoretical distribution.   Plot the quantiles of the two distributions against each other.   Deviations from the straight line indicate a departure from the assumed distribution.  

Cumulative Explained Variance Plot:   Cumulatively plotting the component-wise variance can help identify which components have the most variance.   The goal is to retain as much variation in data as possible.   Principal component analysis (PCA) is used for the analysis.   (more)

Gini-Impurity vs. Entropy: They are used to measure the impurity or disorder of a node or split in a decision tree.   Provides insights into the tradeoff between the Gini impurity and the Entropy measures.  

Bias-Variance Tradeoff: Used to find the right balance between the bias and the variance of a model.

ROC Curve: Depicts the trade-off between the true positive rate (TPR) and the false positive rate (FPR) across different classification thresholds.

Precision-Recall Curve: Depicts the trade-off between Precision and Recall across different classification thresholds.

Elbow Curve: Helps to identify the optimal number of clusters for k-means algorithm.