Overview
Brought to you by YData
Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 50916 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 10.4 MiB |
| Average record size in memory | 213.8 B |
Variable types
| Numeric | 3 |
|---|---|
| Categorical | 3 |
disease has constant value "SMALLPOX" | Constant |
cases is highly overall correlated with incidence_per_capita | High correlation |
incidence_per_capita is highly overall correlated with cases | High correlation |
state is highly overall correlated with state_name | High correlation |
state_name is highly overall correlated with state | High correlation |
cases has 32590 (64.0%) zeros | Zeros |
incidence_per_capita has 32576 (64.0%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-08 01:59:57.153969 |
|---|---|
| Analysis finished | 2024-11-08 02:00:00.821324 |
| Duration | 3.67 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
week
Real number (ℝ)
| Distinct | 1145 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 193809.85 |
| Minimum | 192801 |
|---|---|
| Maximum | 195250 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 397.9 KiB |
Quantile statistics
| Minimum | 192801 |
|---|---|
| 5-th percentile | 192904 |
| Q1 | 193312 |
| median | 193819 |
| Q3 | 194324 |
| 95-th percentile | 194726 |
| Maximum | 195250 |
| Range | 2449 |
| Interquartile range (IQR) | 1012 |
Descriptive statistics
| Standard deviation | 591.48989 |
|---|---|
| Coefficient of variation (CV) | 0.0030519083 |
| Kurtosis | -1.1870546 |
| Mean | 193809.85 |
| Median Absolute Deviation (MAD) | 506 |
| Skewness | -0.00381597 |
| Sum | 9.8680224 × 109 |
| Variance | 349860.29 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 194821 | 49 | 0.1% |
| 194333 | 49 | 0.1% |
| 194310 | 49 | 0.1% |
| 194311 | 49 | 0.1% |
| 194312 | 49 | 0.1% |
| 194313 | 49 | 0.1% |
| 194314 | 49 | 0.1% |
| 194316 | 49 | 0.1% |
| 194317 | 49 | 0.1% |
| 194318 | 49 | 0.1% |
| Other values (1135) | 50426 |
| Value | Count | Frequency (%) |
| 192801 | 44 | |
| 192802 | 43 | |
| 192803 | 43 | |
| 192804 | 47 | |
| 192805 | 47 | |
| 192806 | 47 | |
| 192807 | 47 | |
| 192808 | 45 | |
| 192809 | 48 | |
| 192810 | 45 |
| Value | Count | Frequency (%) |
| 195250 | 1 | |
| 195244 | 1 | |
| 195239 | 1 | |
| 195227 | 1 | |
| 195226 | 1 | |
| 195224 | 1 | |
| 195221 | 1 | |
| 195220 | 1 | |
| 195219 | 1 | |
| 195217 | 2 |
state
Categorical
High correlation 
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.9 MiB |
| KS | 1067 |
|---|---|
| MO | 1064 |
| TX | 1064 |
| AZ | 1064 |
| KY | 1063 |
| Other values (44) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | AL |
|---|---|
| 2nd row | AR |
| 3rd row | AZ |
| 4th row | CA |
| 5th row | CO |
Common Values
| Value | Count | Frequency (%) |
| KS | 1067 | 2.1% |
| MO | 1064 | 2.1% |
| TX | 1064 | 2.1% |
| AZ | 1064 | 2.1% |
| KY | 1063 | 2.1% |
| MS | 1063 | 2.1% |
| SD | 1063 | 2.1% |
| OK | 1062 | 2.1% |
| WI | 1061 | 2.1% |
| ID | 1061 | 2.1% |
| Other values (39) | 40284 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| ks | 1067 | 2.1% |
| tx | 1064 | 2.1% |
| az | 1064 | 2.1% |
| mo | 1064 | 2.1% |
| ky | 1063 | 2.1% |
| ms | 1063 | 2.1% |
| sd | 1063 | 2.1% |
| ok | 1062 | 2.1% |
| wi | 1061 | 2.1% |
| id | 1061 | 2.1% |
| Other values (39) | 40284 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 11407 | 11.2% |
| N | 10953 | 10.8% |
| M | 9518 | 9.3% |
| I | 7394 | 7.3% |
| T | 6347 | 6.2% |
| D | 6331 | 6.2% |
| C | 6324 | 6.2% |
| O | 5300 | 5.2% |
| S | 4248 | 4.2% |
| W | 4231 | 4.2% |
| Other values (14) | 29779 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 101832 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 11407 | 11.2% |
| N | 10953 | 10.8% |
| M | 9518 | 9.3% |
| I | 7394 | 7.3% |
| T | 6347 | 6.2% |
| D | 6331 | 6.2% |
| C | 6324 | 6.2% |
| O | 5300 | 5.2% |
| S | 4248 | 4.2% |
| W | 4231 | 4.2% |
| Other values (14) | 29779 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 101832 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 11407 | 11.2% |
| N | 10953 | 10.8% |
| M | 9518 | 9.3% |
| I | 7394 | 7.3% |
| T | 6347 | 6.2% |
| D | 6331 | 6.2% |
| C | 6324 | 6.2% |
| O | 5300 | 5.2% |
| S | 4248 | 4.2% |
| W | 4231 | 4.2% |
| Other values (14) | 29779 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 101832 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 11407 | 11.2% |
| N | 10953 | 10.8% |
| M | 9518 | 9.3% |
| I | 7394 | 7.3% |
| T | 6347 | 6.2% |
| D | 6331 | 6.2% |
| C | 6324 | 6.2% |
| O | 5300 | 5.2% |
| S | 4248 | 4.2% |
| W | 4231 | 4.2% |
| Other values (14) | 29779 |
state_name
Categorical
High correlation 
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 MiB |
| KANSAS | 1067 |
|---|---|
| MISSOURI | 1064 |
| TEXAS | 1064 |
| ARIZONA | 1064 |
| KENTUCKY | 1063 |
| Other values (44) |
Length
| Max length | 20 |
|---|---|
| Median length | 12 |
| Mean length | 8.8091759 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | ALABAMA |
|---|---|
| 2nd row | ARKANSAS |
| 3rd row | ARIZONA |
| 4th row | CALIFORNIA |
| 5th row | COLORADO |
Common Values
| Value | Count | Frequency (%) |
| KANSAS | 1067 | 2.1% |
| MISSOURI | 1064 | 2.1% |
| TEXAS | 1064 | 2.1% |
| ARIZONA | 1064 | 2.1% |
| KENTUCKY | 1063 | 2.1% |
| MISSISSIPPI | 1063 | 2.1% |
| SOUTH DAKOTA | 1063 | 2.1% |
| OKLAHOMA | 1062 | 2.1% |
| WISCONSIN | 1061 | 2.1% |
| IDAHO | 1061 | 2.1% |
| Other values (39) | 40284 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| new | 4214 | 6.6% |
| dakota | 2120 | 3.3% |
| south | 2118 | 3.3% |
| north | 2112 | 3.3% |
| carolina | 2110 | 3.3% |
| virginia | 1901 | 3.0% |
| kansas | 1067 | 1.7% |
| missouri | 1064 | 1.7% |
| arizona | 1064 | 1.7% |
| texas | 1064 | 1.7% |
| Other values (43) | 44721 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 58709 | |
| I | 46895 | 10.5% |
| N | 44564 | 9.9% |
| O | 40160 | 9.0% |
| S | 33852 | 7.5% |
| E | 28894 | 6.4% |
| R | 24060 | 5.4% |
| T | 22175 | 4.9% |
| M | 15843 | 3.5% |
| L | 15820 | 3.5% |
| Other values (16) | 117556 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 435889 | |
| Space Separator | 12639 | 2.8% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 58709 | |
| I | 46895 | |
| N | 44564 | |
| O | 40160 | 9.2% |
| S | 33852 | 7.8% |
| E | 28894 | 6.6% |
| R | 24060 | 5.5% |
| T | 22175 | 5.1% |
| M | 15843 | 3.6% |
| L | 15820 | 3.6% |
| Other values (15) | 104917 |
Space Separator
| Value | Count | Frequency (%) |
| 12639 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 435889 | |
| Common | 12639 | 2.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 58709 | |
| I | 46895 | |
| N | 44564 | |
| O | 40160 | 9.2% |
| S | 33852 | 7.8% |
| E | 28894 | 6.6% |
| R | 24060 | 5.5% |
| T | 22175 | 5.1% |
| M | 15843 | 3.6% |
| L | 15820 | 3.6% |
| Other values (15) | 104917 |
Common
| Value | Count | Frequency (%) |
| 12639 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 448528 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 58709 | |
| I | 46895 | 10.5% |
| N | 44564 | 9.9% |
| O | 40160 | 9.0% |
| S | 33852 | 7.5% |
| E | 28894 | 6.4% |
| R | 24060 | 5.4% |
| T | 22175 | 4.9% |
| M | 15843 | 3.5% |
| L | 15820 | 3.5% |
| Other values (16) | 117556 |
disease
Categorical
Constant 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 MiB |
| SMALLPOX |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | SMALLPOX |
|---|---|
| 2nd row | SMALLPOX |
| 3rd row | SMALLPOX |
| 4th row | SMALLPOX |
| 5th row | SMALLPOX |
Common Values
| Value | Count | Frequency (%) |
| SMALLPOX | 50916 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| smallpox | 50916 |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 101832 | |
| S | 50916 | |
| M | 50916 | |
| A | 50916 | |
| P | 50916 | |
| O | 50916 | |
| X | 50916 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 407328 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 101832 | |
| S | 50916 | |
| M | 50916 | |
| A | 50916 | |
| P | 50916 | |
| O | 50916 | |
| X | 50916 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 407328 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| L | 101832 | |
| S | 50916 | |
| M | 50916 | |
| A | 50916 | |
| P | 50916 | |
| O | 50916 | |
| X | 50916 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 407328 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| L | 101832 | |
| S | 50916 | |
| M | 50916 | |
| A | 50916 | |
| P | 50916 | |
| O | 50916 | |
| X | 50916 |
cases
Real number (ℝ)
High correlation  Zeros 
| Distinct | 200 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.5727866 |
| Minimum | 0 |
|---|---|
| Maximum | 350 |
| Zeros | 32590 |
| Zeros (%) | 64.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 397.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2 |
| 95-th percentile | 25 |
| Maximum | 350 |
| Range | 350 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 15.062277 |
|---|---|
| Coefficient of variation (CV) | 3.2938946 |
| Kurtosis | 66.318942 |
| Mean | 4.5727866 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.769109 |
| Sum | 232828 |
| Variance | 226.87218 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 32590 | |
| 1 | 5075 | 10.0% |
| 2 | 2095 | 4.1% |
| 3 | 1369 | 2.7% |
| 4 | 1067 | 2.1% |
| 5 | 800 | 1.6% |
| 6 | 675 | 1.3% |
| 7 | 561 | 1.1% |
| 8 | 474 | 0.9% |
| 9 | 424 | 0.8% |
| Other values (190) | 5786 | 11.4% |
| Value | Count | Frequency (%) |
| 0 | 32590 | |
| 1 | 5075 | 10.0% |
| 2 | 2095 | 4.1% |
| 3 | 1369 | 2.7% |
| 4 | 1067 | 2.1% |
| 5 | 800 | 1.6% |
| 6 | 675 | 1.3% |
| 7 | 561 | 1.1% |
| 8 | 474 | 0.9% |
| 9 | 424 | 0.8% |
| Value | Count | Frequency (%) |
| 350 | 1 | |
| 290 | 1 | |
| 277 | 1 | |
| 271 | 1 | |
| 269 | 1 | |
| 254 | 1 | |
| 247 | 1 | |
| 245 | 1 | |
| 243 | 1 | |
| 242 | 1 |
incidence_per_capita
Real number (ℝ)
High correlation  Zeros 
| Distinct | 673 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.24910794 |
| Minimum | 0 |
|---|---|
| Maximum | 50.36 |
| Zeros | 32576 |
| Zeros (%) | 64.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 397.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.09 |
| 95-th percentile | 1.46 |
| Maximum | 50.36 |
| Range | 50.36 |
| Interquartile range (IQR) | 0.09 |
Descriptive statistics
| Standard deviation | 0.82433133 |
|---|---|
| Coefficient of variation (CV) | 3.3091331 |
| Kurtosis | 322.39074 |
| Mean | 0.24910794 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.060931 |
| Sum | 12683.58 |
| Variance | 0.67952214 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 32576 | |
| 0.03 | 1079 | 2.1% |
| 0.04 | 973 | 1.9% |
| 0.05 | 784 | 1.5% |
| 0.01 | 638 | 1.3% |
| 0.02 | 638 | 1.3% |
| 0.06 | 554 | 1.1% |
| 0.08 | 505 | 1.0% |
| 0.09 | 420 | 0.8% |
| 0.11 | 404 | 0.8% |
| Other values (663) | 12345 | 24.2% |
| Value | Count | Frequency (%) |
| 0 | 32576 | |
| 0.01 | 638 | 1.3% |
| 0.02 | 638 | 1.3% |
| 0.03 | 1079 | 2.1% |
| 0.04 | 973 | 1.9% |
| 0.05 | 784 | 1.5% |
| 0.06 | 554 | 1.1% |
| 0.07 | 385 | 0.8% |
| 0.08 | 505 | 1.0% |
| 0.09 | 420 | 0.8% |
| Value | Count | Frequency (%) |
| 50.36 | 1 | |
| 21.9 | 1 | |
| 18.61 | 1 | |
| 15.88 | 1 | |
| 15.07 | 1 | |
| 13.75 | 1 | |
| 12.9 | 1 | |
| 12.73 | 1 | |
| 12.67 | 1 | |
| 12.56 | 1 |
Interactions
Correlations
| cases | incidence_per_capita | state | state_name | week | |
|---|---|---|---|---|---|
| cases | 1.000 | 0.989 | 0.102 | 0.102 | -0.425 |
| incidence_per_capita | 0.989 | 1.000 | 0.054 | 0.054 | -0.427 |
| state | 0.102 | 0.054 | 1.000 | 1.000 | 0.032 |
| state_name | 0.102 | 0.054 | 1.000 | 1.000 | 0.032 |
| week | -0.425 | -0.427 | 0.032 | 0.032 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| week | state | state_name | disease | cases | incidence_per_capita | |
|---|---|---|---|---|---|---|
| 0 | 192801 | AL | ALABAMA | SMALLPOX | 1 | 0.04 |
| 1 | 192801 | AR | ARKANSAS | SMALLPOX | 7 | 0.38 |
| 2 | 192801 | AZ | ARIZONA | SMALLPOX | 0 | 0.00 |
| 3 | 192801 | CA | CALIFORNIA | SMALLPOX | 18 | 0.34 |
| 4 | 192801 | CO | COLORADO | SMALLPOX | 31 | 3.06 |
| 5 | 192801 | CT | CONNECTICUT | SMALLPOX | 26 | 1.65 |
| 6 | 192801 | DE | DELAWARE | SMALLPOX | 0 | 0.00 |
| 7 | 192801 | FL | FLORIDA | SMALLPOX | 1 | 0.07 |
| 8 | 192801 | GA | GEORGIA | SMALLPOX | 0 | 0.00 |
| 9 | 192801 | IA | IOWA | SMALLPOX | 58 | 2.37 |
| week | state | state_name | disease | cases | incidence_per_capita | |
|---|---|---|---|---|---|---|
| 50906 | 195217 | NV | NEVADA | SMALLPOX | 1 | 0.55 |
| 50907 | 195219 | AZ | ARIZONA | SMALLPOX | 2 | 0.24 |
| 50908 | 195220 | NE | NEBRASKA | SMALLPOX | 1 | 0.08 |
| 50909 | 195221 | KS | KANSAS | SMALLPOX | 1 | 0.05 |
| 50910 | 195224 | WI | WISCONSIN | SMALLPOX | 1 | 0.03 |
| 50911 | 195226 | NM | NEW MEXICO | SMALLPOX | 1 | 0.14 |
| 50912 | 195227 | NV | NEVADA | SMALLPOX | 1 | 0.55 |
| 50913 | 195239 | MT | MONTANA | SMALLPOX | 1 | 0.17 |
| 50914 | 195244 | SD | SOUTH DAKOTA | SMALLPOX | 1 | 0.15 |
| 50915 | 195250 | IA | IOWA | SMALLPOX | 1 | 0.04 |