Overview
Brought to you by YData
Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 50916 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 10.4 MiB |
Average record size in memory | 213.8 B |
Variable types
Numeric | 3 |
---|---|
Categorical | 3 |
disease has constant value "SMALLPOX" | Constant |
cases is highly overall correlated with incidence_per_capita | High correlation |
incidence_per_capita is highly overall correlated with cases | High correlation |
state is highly overall correlated with state_name | High correlation |
state_name is highly overall correlated with state | High correlation |
cases has 32590 (64.0%) zeros | Zeros |
incidence_per_capita has 32576 (64.0%) zeros | Zeros |
Reproduction
Analysis started | 2024-11-08 01:59:57.153969 |
---|---|
Analysis finished | 2024-11-08 02:00:00.821324 |
Duration | 3.67 seconds |
Software version | ydata-profiling vv4.12.0 |
Download configuration | config.json |
Variables
week
Real number (ℝ)
Distinct | 1145 |
---|---|
Distinct (%) | 2.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 193809.85 |
Minimum | 192801 |
---|---|
Maximum | 195250 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 397.9 KiB |
Quantile statistics
Minimum | 192801 |
---|---|
5-th percentile | 192904 |
Q1 | 193312 |
median | 193819 |
Q3 | 194324 |
95-th percentile | 194726 |
Maximum | 195250 |
Range | 2449 |
Interquartile range (IQR) | 1012 |
Descriptive statistics
Standard deviation | 591.48989 |
---|---|
Coefficient of variation (CV) | 0.0030519083 |
Kurtosis | -1.1870546 |
Mean | 193809.85 |
Median Absolute Deviation (MAD) | 506 |
Skewness | -0.00381597 |
Sum | 9.8680224 × 109 |
Variance | 349860.29 |
Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
194821 | 49 | 0.1% |
194333 | 49 | 0.1% |
194310 | 49 | 0.1% |
194311 | 49 | 0.1% |
194312 | 49 | 0.1% |
194313 | 49 | 0.1% |
194314 | 49 | 0.1% |
194316 | 49 | 0.1% |
194317 | 49 | 0.1% |
194318 | 49 | 0.1% |
Other values (1135) | 50426 |
Value | Count | Frequency (%) |
192801 | 44 | |
192802 | 43 | |
192803 | 43 | |
192804 | 47 | |
192805 | 47 | |
192806 | 47 | |
192807 | 47 | |
192808 | 45 | |
192809 | 48 | |
192810 | 45 |
Value | Count | Frequency (%) |
195250 | 1 | |
195244 | 1 | |
195239 | 1 | |
195227 | 1 | |
195226 | 1 | |
195224 | 1 | |
195221 | 1 | |
195220 | 1 | |
195219 | 1 | |
195217 | 2 |
state
Categorical
High correlation 
Distinct | 49 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 2.9 MiB |
KS | 1067 |
---|---|
MO | 1064 |
TX | 1064 |
AZ | 1064 |
KY | 1063 |
Other values (44) |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | AL |
---|---|
2nd row | AR |
3rd row | AZ |
4th row | CA |
5th row | CO |
Common Values
Value | Count | Frequency (%) |
KS | 1067 | 2.1% |
MO | 1064 | 2.1% |
TX | 1064 | 2.1% |
AZ | 1064 | 2.1% |
KY | 1063 | 2.1% |
MS | 1063 | 2.1% |
SD | 1063 | 2.1% |
OK | 1062 | 2.1% |
WI | 1061 | 2.1% |
ID | 1061 | 2.1% |
Other values (39) | 40284 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
ks | 1067 | 2.1% |
tx | 1064 | 2.1% |
az | 1064 | 2.1% |
mo | 1064 | 2.1% |
ky | 1063 | 2.1% |
ms | 1063 | 2.1% |
sd | 1063 | 2.1% |
ok | 1062 | 2.1% |
wi | 1061 | 2.1% |
id | 1061 | 2.1% |
Other values (39) | 40284 |
Most occurring characters
Value | Count | Frequency (%) |
A | 11407 | 11.2% |
N | 10953 | 10.8% |
M | 9518 | 9.3% |
I | 7394 | 7.3% |
T | 6347 | 6.2% |
D | 6331 | 6.2% |
C | 6324 | 6.2% |
O | 5300 | 5.2% |
S | 4248 | 4.2% |
W | 4231 | 4.2% |
Other values (14) | 29779 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 101832 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
A | 11407 | 11.2% |
N | 10953 | 10.8% |
M | 9518 | 9.3% |
I | 7394 | 7.3% |
T | 6347 | 6.2% |
D | 6331 | 6.2% |
C | 6324 | 6.2% |
O | 5300 | 5.2% |
S | 4248 | 4.2% |
W | 4231 | 4.2% |
Other values (14) | 29779 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 101832 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
A | 11407 | 11.2% |
N | 10953 | 10.8% |
M | 9518 | 9.3% |
I | 7394 | 7.3% |
T | 6347 | 6.2% |
D | 6331 | 6.2% |
C | 6324 | 6.2% |
O | 5300 | 5.2% |
S | 4248 | 4.2% |
W | 4231 | 4.2% |
Other values (14) | 29779 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 101832 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
A | 11407 | 11.2% |
N | 10953 | 10.8% |
M | 9518 | 9.3% |
I | 7394 | 7.3% |
T | 6347 | 6.2% |
D | 6331 | 6.2% |
C | 6324 | 6.2% |
O | 5300 | 5.2% |
S | 4248 | 4.2% |
W | 4231 | 4.2% |
Other values (14) | 29779 |
state_name
Categorical
High correlation 
Distinct | 49 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.2 MiB |
KANSAS | 1067 |
---|---|
MISSOURI | 1064 |
TEXAS | 1064 |
ARIZONA | 1064 |
KENTUCKY | 1063 |
Other values (44) |
Length
Max length | 20 |
---|---|
Median length | 12 |
Mean length | 8.8091759 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | ALABAMA |
---|---|
2nd row | ARKANSAS |
3rd row | ARIZONA |
4th row | CALIFORNIA |
5th row | COLORADO |
Common Values
Value | Count | Frequency (%) |
KANSAS | 1067 | 2.1% |
MISSOURI | 1064 | 2.1% |
TEXAS | 1064 | 2.1% |
ARIZONA | 1064 | 2.1% |
KENTUCKY | 1063 | 2.1% |
MISSISSIPPI | 1063 | 2.1% |
SOUTH DAKOTA | 1063 | 2.1% |
OKLAHOMA | 1062 | 2.1% |
WISCONSIN | 1061 | 2.1% |
IDAHO | 1061 | 2.1% |
Other values (39) | 40284 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
new | 4214 | 6.6% |
dakota | 2120 | 3.3% |
south | 2118 | 3.3% |
north | 2112 | 3.3% |
carolina | 2110 | 3.3% |
virginia | 1901 | 3.0% |
kansas | 1067 | 1.7% |
missouri | 1064 | 1.7% |
arizona | 1064 | 1.7% |
texas | 1064 | 1.7% |
Other values (43) | 44721 |
Most occurring characters
Value | Count | Frequency (%) |
A | 58709 | |
I | 46895 | 10.5% |
N | 44564 | 9.9% |
O | 40160 | 9.0% |
S | 33852 | 7.5% |
E | 28894 | 6.4% |
R | 24060 | 5.4% |
T | 22175 | 4.9% |
M | 15843 | 3.5% |
L | 15820 | 3.5% |
Other values (16) | 117556 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 435889 | |
Space Separator | 12639 | 2.8% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
A | 58709 | |
I | 46895 | |
N | 44564 | |
O | 40160 | 9.2% |
S | 33852 | 7.8% |
E | 28894 | 6.6% |
R | 24060 | 5.5% |
T | 22175 | 5.1% |
M | 15843 | 3.6% |
L | 15820 | 3.6% |
Other values (15) | 104917 |
Space Separator
Value | Count | Frequency (%) |
12639 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 435889 | |
Common | 12639 | 2.8% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
A | 58709 | |
I | 46895 | |
N | 44564 | |
O | 40160 | 9.2% |
S | 33852 | 7.8% |
E | 28894 | 6.6% |
R | 24060 | 5.5% |
T | 22175 | 5.1% |
M | 15843 | 3.6% |
L | 15820 | 3.6% |
Other values (15) | 104917 |
Common
Value | Count | Frequency (%) |
12639 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 448528 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
A | 58709 | |
I | 46895 | 10.5% |
N | 44564 | 9.9% |
O | 40160 | 9.0% |
S | 33852 | 7.5% |
E | 28894 | 6.4% |
R | 24060 | 5.4% |
T | 22175 | 4.9% |
M | 15843 | 3.5% |
L | 15820 | 3.5% |
Other values (16) | 117556 |
disease
Categorical
Constant 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 3.2 MiB |
SMALLPOX |
---|
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | SMALLPOX |
---|---|
2nd row | SMALLPOX |
3rd row | SMALLPOX |
4th row | SMALLPOX |
5th row | SMALLPOX |
Common Values
Value | Count | Frequency (%) |
SMALLPOX | 50916 |
Length
Histogram of lengths of the category
Common Values (Plot)
Value | Count | Frequency (%) |
smallpox | 50916 |
Most occurring characters
Value | Count | Frequency (%) |
L | 101832 | |
S | 50916 | |
M | 50916 | |
A | 50916 | |
P | 50916 | |
O | 50916 | |
X | 50916 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 407328 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
L | 101832 | |
S | 50916 | |
M | 50916 | |
A | 50916 | |
P | 50916 | |
O | 50916 | |
X | 50916 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 407328 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
L | 101832 | |
S | 50916 | |
M | 50916 | |
A | 50916 | |
P | 50916 | |
O | 50916 | |
X | 50916 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 407328 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
L | 101832 | |
S | 50916 | |
M | 50916 | |
A | 50916 | |
P | 50916 | |
O | 50916 | |
X | 50916 |
cases
Real number (ℝ)
High correlation  Zeros 
Distinct | 200 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 4.5727866 |
Minimum | 0 |
---|---|
Maximum | 350 |
Zeros | 32590 |
Zeros (%) | 64.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 397.9 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 2 |
95-th percentile | 25 |
Maximum | 350 |
Range | 350 |
Interquartile range (IQR) | 2 |
Descriptive statistics
Standard deviation | 15.062277 |
---|---|
Coefficient of variation (CV) | 3.2938946 |
Kurtosis | 66.318942 |
Mean | 4.5727866 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 6.769109 |
Sum | 232828 |
Variance | 226.87218 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 32590 | |
1 | 5075 | 10.0% |
2 | 2095 | 4.1% |
3 | 1369 | 2.7% |
4 | 1067 | 2.1% |
5 | 800 | 1.6% |
6 | 675 | 1.3% |
7 | 561 | 1.1% |
8 | 474 | 0.9% |
9 | 424 | 0.8% |
Other values (190) | 5786 | 11.4% |
Value | Count | Frequency (%) |
0 | 32590 | |
1 | 5075 | 10.0% |
2 | 2095 | 4.1% |
3 | 1369 | 2.7% |
4 | 1067 | 2.1% |
5 | 800 | 1.6% |
6 | 675 | 1.3% |
7 | 561 | 1.1% |
8 | 474 | 0.9% |
9 | 424 | 0.8% |
Value | Count | Frequency (%) |
350 | 1 | |
290 | 1 | |
277 | 1 | |
271 | 1 | |
269 | 1 | |
254 | 1 | |
247 | 1 | |
245 | 1 | |
243 | 1 | |
242 | 1 |
incidence_per_capita
Real number (ℝ)
High correlation  Zeros 
Distinct | 673 |
---|---|
Distinct (%) | 1.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.24910794 |
Minimum | 0 |
---|---|
Maximum | 50.36 |
Zeros | 32576 |
Zeros (%) | 64.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 397.9 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0.09 |
95-th percentile | 1.46 |
Maximum | 50.36 |
Range | 50.36 |
Interquartile range (IQR) | 0.09 |
Descriptive statistics
Standard deviation | 0.82433133 |
---|---|
Coefficient of variation (CV) | 3.3091331 |
Kurtosis | 322.39074 |
Mean | 0.24910794 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 10.060931 |
Sum | 12683.58 |
Variance | 0.67952214 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0 | 32576 | |
0.03 | 1079 | 2.1% |
0.04 | 973 | 1.9% |
0.05 | 784 | 1.5% |
0.01 | 638 | 1.3% |
0.02 | 638 | 1.3% |
0.06 | 554 | 1.1% |
0.08 | 505 | 1.0% |
0.09 | 420 | 0.8% |
0.11 | 404 | 0.8% |
Other values (663) | 12345 | 24.2% |
Value | Count | Frequency (%) |
0 | 32576 | |
0.01 | 638 | 1.3% |
0.02 | 638 | 1.3% |
0.03 | 1079 | 2.1% |
0.04 | 973 | 1.9% |
0.05 | 784 | 1.5% |
0.06 | 554 | 1.1% |
0.07 | 385 | 0.8% |
0.08 | 505 | 1.0% |
0.09 | 420 | 0.8% |
Value | Count | Frequency (%) |
50.36 | 1 | |
21.9 | 1 | |
18.61 | 1 | |
15.88 | 1 | |
15.07 | 1 | |
13.75 | 1 | |
12.9 | 1 | |
12.73 | 1 | |
12.67 | 1 | |
12.56 | 1 |
Interactions
Correlations
cases | incidence_per_capita | state | state_name | week | |
---|---|---|---|---|---|
cases | 1.000 | 0.989 | 0.102 | 0.102 | -0.425 |
incidence_per_capita | 0.989 | 1.000 | 0.054 | 0.054 | -0.427 |
state | 0.102 | 0.054 | 1.000 | 1.000 | 0.032 |
state_name | 0.102 | 0.054 | 1.000 | 1.000 | 0.032 |
week | -0.425 | -0.427 | 0.032 | 0.032 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
week | state | state_name | disease | cases | incidence_per_capita | |
---|---|---|---|---|---|---|
0 | 192801 | AL | ALABAMA | SMALLPOX | 1 | 0.04 |
1 | 192801 | AR | ARKANSAS | SMALLPOX | 7 | 0.38 |
2 | 192801 | AZ | ARIZONA | SMALLPOX | 0 | 0.00 |
3 | 192801 | CA | CALIFORNIA | SMALLPOX | 18 | 0.34 |
4 | 192801 | CO | COLORADO | SMALLPOX | 31 | 3.06 |
5 | 192801 | CT | CONNECTICUT | SMALLPOX | 26 | 1.65 |
6 | 192801 | DE | DELAWARE | SMALLPOX | 0 | 0.00 |
7 | 192801 | FL | FLORIDA | SMALLPOX | 1 | 0.07 |
8 | 192801 | GA | GEORGIA | SMALLPOX | 0 | 0.00 |
9 | 192801 | IA | IOWA | SMALLPOX | 58 | 2.37 |
week | state | state_name | disease | cases | incidence_per_capita | |
---|---|---|---|---|---|---|
50906 | 195217 | NV | NEVADA | SMALLPOX | 1 | 0.55 |
50907 | 195219 | AZ | ARIZONA | SMALLPOX | 2 | 0.24 |
50908 | 195220 | NE | NEBRASKA | SMALLPOX | 1 | 0.08 |
50909 | 195221 | KS | KANSAS | SMALLPOX | 1 | 0.05 |
50910 | 195224 | WI | WISCONSIN | SMALLPOX | 1 | 0.03 |
50911 | 195226 | NM | NEW MEXICO | SMALLPOX | 1 | 0.14 |
50912 | 195227 | NV | NEVADA | SMALLPOX | 1 | 0.55 |
50913 | 195239 | MT | MONTANA | SMALLPOX | 1 | 0.17 |
50914 | 195244 | SD | SOUTH DAKOTA | SMALLPOX | 1 | 0.15 |
50915 | 195250 | IA | IOWA | SMALLPOX | 1 | 0.04 |