Statistical Analysis On an IMDB DATA SET

genral Information:

Programming language used: Python

Source: Kaggle

Credibility : 10/10 Kaggle scale

Packages used: Pandas, Seaborn, NumPy, Matplotlib

Hypotheses:

  1. Correlation between budget and gross income is directly proportional.
  2. Correlation between company's name and gross income is directly proportional.

Process:

  • Started with some data cleaning, checking for duplicates (there were none).
  • Removed NaN values (switching them to zeros if they existed in numerical columns).
  • Extracted the year from the date column into a separate column.
  • Normalized the data and used statistical methods (Pearson correlation) to find the correlation between the columns.

Findings:

  • Correlation between budget and gross income is directly proportional (correct, with a correlation coefficient of 0.74).
  • A company's name has no effect on gross income (contrary to hypothesis).

Visualization:

Extracting data from the notebook and creating an interesting Dashboard using Tableau.