Data Preparation and Analysis


Data Preparation and Analysis involves collecting, cleaning, transforming, and organizing data to make it suitable for analysis.

  • Data Collection :
    • Sources: Databases, APIs, CSV/Excel files, Web scraping, Surveys
    • Formats: Structured (SQL, CSV), Semi-structured (JSON, XML), Unstructured (Text, Images)
  • Data Cleaning :
    • Handling Missing Values: Imputation (mean, median, mode), Removal
    • Removing Duplicates
    • Handling Outliers: Z-score, IQR method
    • Standardizing Formats: Consistent date/time formats, text normalization
    • Fixing Data Entry Errors
  • Data Transformation :
    • Normalization/Scaling: Min-max scaling, Standardization
    • Encoding Categorical Variables: One-hot encoding, Label encoding
    • Feature Engineering: Creating new meaningful variables
  • Data Splitting :
    • Train-Test Split: Typically 80-20% or 70-30%
    • Cross-Validation: K-fold, Stratified sampling
Computer Algorithms Computer Algorithms

Course Detail

Course Detail

Syllabus for Internal Examination

  • Introduction: Data Objects and Attribute Types, Basic Statistical Descriptions of Data, Data Visualization (Pixel-Oriented, Geometric Projection, Icon-Based, Hierarchical), Measuring Data Similarity and Dissimilarity for Different Types of Attributes
  • Data Preprocessing: Data Preprocessing: An Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data Discretization
  • Correlation and Regression Analysis: Introduction to Correlation, Correlation Coefficients (Pearson and Spearman Rank), Auto-Correlation, Introduction to SLR(Simple Linear Regression), SLR Model Building, Estimation of Parameters Using OLS, Interpretation of SLR Coefficients