Practial Statistics for Data Scientists

R
Python
Data Science
Author

Shitao5

Published

2023-07-02

Modified

2023-07-04

Progress

Learning Progress: 4.29%.

1 Exploratory Data Analysis

1.1 Elements of Structured Data

  • In fact, a major challenge of data science is to harness this torrent of raw data into actionable information.

1.2 Estimates of Location

  • METRICS AND ESTIMATES
    Statisticians often use the term estimate for a value calculated from the data at hand, to draw a distinction between what we see from the data and the theoretical true or exact state of affairs. Data scientists and business analysts are more likely to refer to such a value as a metric. The difference reflects the approach of statistics versus that of data science: accounting for uncertainty lies at the heart of the discipline of statistics, whereas concrete business or organizational objectives are the focus of data science. Hence, statisticians estimate, and data scientists measure.

  • Still, outliers are often the result of data errors such as mixing data of different units (kilometers versus meters) or bad readings from a sensor. When outliers are the result of bad data, the mean will result in a poor estimate of location, while the median will still be valid. In any case, outliers should be identified and are usually worthy of further investigation.

1.3 Example: Location Estimates of Population and Murder Rates

To be continued
  • Estimates of Location
Back to top