Overview

Brought to you by YData

Dataset statistics

Number of variables6
Number of observations50916
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.4 MiB
Average record size in memory213.8 B

Variable types

Numeric3
Categorical3

Alerts

disease has constant value "SMALLPOX" Constant
cases is highly overall correlated with incidence_per_capitaHigh correlation
incidence_per_capita is highly overall correlated with casesHigh correlation
state is highly overall correlated with state_nameHigh correlation
state_name is highly overall correlated with stateHigh correlation
cases has 32590 (64.0%) zeros Zeros
incidence_per_capita has 32576 (64.0%) zeros Zeros

Reproduction

Analysis started2024-11-08 01:59:57.153969
Analysis finished2024-11-08 02:00:00.821324
Duration3.67 seconds
Software versionydata-profiling vv4.12.0
Download configurationconfig.json

Variables

week
Real number (ℝ)

Distinct1145
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean193809.85
Minimum192801
Maximum195250
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size397.9 KiB
2024-11-07T21:00:01.036864image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum192801
5-th percentile192904
Q1193312
median193819
Q3194324
95-th percentile194726
Maximum195250
Range2449
Interquartile range (IQR)1012

Descriptive statistics

Standard deviation591.48989
Coefficient of variation (CV)0.0030519083
Kurtosis-1.1870546
Mean193809.85
Median Absolute Deviation (MAD)506
Skewness-0.00381597
Sum9.8680224 × 109
Variance349860.29
MonotonicityIncreasing
2024-11-07T21:00:01.217033image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
194821 49
 
0.1%
194333 49
 
0.1%
194310 49
 
0.1%
194311 49
 
0.1%
194312 49
 
0.1%
194313 49
 
0.1%
194314 49
 
0.1%
194316 49
 
0.1%
194317 49
 
0.1%
194318 49
 
0.1%
Other values (1135) 50426
99.0%
ValueCountFrequency (%)
192801 44
0.1%
192802 43
0.1%
192803 43
0.1%
192804 47
0.1%
192805 47
0.1%
192806 47
0.1%
192807 47
0.1%
192808 45
0.1%
192809 48
0.1%
192810 45
0.1%
ValueCountFrequency (%)
195250 1
< 0.1%
195244 1
< 0.1%
195239 1
< 0.1%
195227 1
< 0.1%
195226 1
< 0.1%
195224 1
< 0.1%
195221 1
< 0.1%
195220 1
< 0.1%
195219 1
< 0.1%
195217 2
< 0.1%

state
Categorical

High correlation 

Distinct49
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
KS
 
1067
MO
 
1064
TX
 
1064
AZ
 
1064
KY
 
1063
Other values (44)
45594 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters101832
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAL
2nd rowAR
3rd rowAZ
4th rowCA
5th rowCO

Common Values

ValueCountFrequency (%)
KS 1067
 
2.1%
MO 1064
 
2.1%
TX 1064
 
2.1%
AZ 1064
 
2.1%
KY 1063
 
2.1%
MS 1063
 
2.1%
SD 1063
 
2.1%
OK 1062
 
2.1%
WI 1061
 
2.1%
ID 1061
 
2.1%
Other values (39) 40284
79.1%

Length

2024-11-07T21:00:01.366265image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ks 1067
 
2.1%
tx 1064
 
2.1%
az 1064
 
2.1%
mo 1064
 
2.1%
ky 1063
 
2.1%
ms 1063
 
2.1%
sd 1063
 
2.1%
ok 1062
 
2.1%
wi 1061
 
2.1%
id 1061
 
2.1%
Other values (39) 40284
79.1%

Most occurring characters

ValueCountFrequency (%)
A 11407
 
11.2%
N 10953
 
10.8%
M 9518
 
9.3%
I 7394
 
7.3%
T 6347
 
6.2%
D 6331
 
6.2%
C 6324
 
6.2%
O 5300
 
5.2%
S 4248
 
4.2%
W 4231
 
4.2%
Other values (14) 29779
29.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 101832
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 11407
 
11.2%
N 10953
 
10.8%
M 9518
 
9.3%
I 7394
 
7.3%
T 6347
 
6.2%
D 6331
 
6.2%
C 6324
 
6.2%
O 5300
 
5.2%
S 4248
 
4.2%
W 4231
 
4.2%
Other values (14) 29779
29.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 101832
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 11407
 
11.2%
N 10953
 
10.8%
M 9518
 
9.3%
I 7394
 
7.3%
T 6347
 
6.2%
D 6331
 
6.2%
C 6324
 
6.2%
O 5300
 
5.2%
S 4248
 
4.2%
W 4231
 
4.2%
Other values (14) 29779
29.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 101832
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 11407
 
11.2%
N 10953
 
10.8%
M 9518
 
9.3%
I 7394
 
7.3%
T 6347
 
6.2%
D 6331
 
6.2%
C 6324
 
6.2%
O 5300
 
5.2%
S 4248
 
4.2%
W 4231
 
4.2%
Other values (14) 29779
29.2%

state_name
Categorical

High correlation 

Distinct49
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
KANSAS
 
1067
MISSOURI
 
1064
TEXAS
 
1064
ARIZONA
 
1064
KENTUCKY
 
1063
Other values (44)
45594 

Length

Max length20
Median length12
Mean length8.8091759
Min length4

Characters and Unicode

Total characters448528
Distinct characters26
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowALABAMA
2nd rowARKANSAS
3rd rowARIZONA
4th rowCALIFORNIA
5th rowCOLORADO

Common Values

ValueCountFrequency (%)
KANSAS 1067
 
2.1%
MISSOURI 1064
 
2.1%
TEXAS 1064
 
2.1%
ARIZONA 1064
 
2.1%
KENTUCKY 1063
 
2.1%
MISSISSIPPI 1063
 
2.1%
SOUTH DAKOTA 1063
 
2.1%
OKLAHOMA 1062
 
2.1%
WISCONSIN 1061
 
2.1%
IDAHO 1061
 
2.1%
Other values (39) 40284
79.1%

Length

2024-11-07T21:00:01.498144image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
new 4214
 
6.6%
dakota 2120
 
3.3%
south 2118
 
3.3%
north 2112
 
3.3%
carolina 2110
 
3.3%
virginia 1901
 
3.0%
kansas 1067
 
1.7%
missouri 1064
 
1.7%
arizona 1064
 
1.7%
texas 1064
 
1.7%
Other values (43) 44721
70.4%

Most occurring characters

ValueCountFrequency (%)
A 58709
13.1%
I 46895
 
10.5%
N 44564
 
9.9%
O 40160
 
9.0%
S 33852
 
7.5%
E 28894
 
6.4%
R 24060
 
5.4%
T 22175
 
4.9%
M 15843
 
3.5%
L 15820
 
3.5%
Other values (16) 117556
26.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 435889
97.2%
Space Separator 12639
 
2.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 58709
13.5%
I 46895
10.8%
N 44564
10.2%
O 40160
 
9.2%
S 33852
 
7.8%
E 28894
 
6.6%
R 24060
 
5.5%
T 22175
 
5.1%
M 15843
 
3.6%
L 15820
 
3.6%
Other values (15) 104917
24.1%
Space Separator
ValueCountFrequency (%)
12639
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 435889
97.2%
Common 12639
 
2.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 58709
13.5%
I 46895
10.8%
N 44564
10.2%
O 40160
 
9.2%
S 33852
 
7.8%
E 28894
 
6.6%
R 24060
 
5.5%
T 22175
 
5.1%
M 15843
 
3.6%
L 15820
 
3.6%
Other values (15) 104917
24.1%
Common
ValueCountFrequency (%)
12639
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 448528
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 58709
13.1%
I 46895
 
10.5%
N 44564
 
9.9%
O 40160
 
9.0%
S 33852
 
7.5%
E 28894
 
6.4%
R 24060
 
5.4%
T 22175
 
4.9%
M 15843
 
3.5%
L 15820
 
3.5%
Other values (16) 117556
26.2%

disease
Categorical

Constant 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
SMALLPOX
50916 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters407328
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSMALLPOX
2nd rowSMALLPOX
3rd rowSMALLPOX
4th rowSMALLPOX
5th rowSMALLPOX

Common Values

ValueCountFrequency (%)
SMALLPOX 50916
100.0%

Length

2024-11-07T21:00:01.655210image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-11-07T21:00:01.792184image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
smallpox 50916
100.0%

Most occurring characters

ValueCountFrequency (%)
L 101832
25.0%
S 50916
12.5%
M 50916
12.5%
A 50916
12.5%
P 50916
12.5%
O 50916
12.5%
X 50916
12.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 407328
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 101832
25.0%
S 50916
12.5%
M 50916
12.5%
A 50916
12.5%
P 50916
12.5%
O 50916
12.5%
X 50916
12.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 407328
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 101832
25.0%
S 50916
12.5%
M 50916
12.5%
A 50916
12.5%
P 50916
12.5%
O 50916
12.5%
X 50916
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 407328
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 101832
25.0%
S 50916
12.5%
M 50916
12.5%
A 50916
12.5%
P 50916
12.5%
O 50916
12.5%
X 50916
12.5%

cases
Real number (ℝ)

High correlation  Zeros 

Distinct200
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5727866
Minimum0
Maximum350
Zeros32590
Zeros (%)64.0%
Negative0
Negative (%)0.0%
Memory size397.9 KiB
2024-11-07T21:00:01.931786image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile25
Maximum350
Range350
Interquartile range (IQR)2

Descriptive statistics

Standard deviation15.062277
Coefficient of variation (CV)3.2938946
Kurtosis66.318942
Mean4.5727866
Median Absolute Deviation (MAD)0
Skewness6.769109
Sum232828
Variance226.87218
MonotonicityNot monotonic
2024-11-07T21:00:02.100743image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 32590
64.0%
1 5075
 
10.0%
2 2095
 
4.1%
3 1369
 
2.7%
4 1067
 
2.1%
5 800
 
1.6%
6 675
 
1.3%
7 561
 
1.1%
8 474
 
0.9%
9 424
 
0.8%
Other values (190) 5786
 
11.4%
ValueCountFrequency (%)
0 32590
64.0%
1 5075
 
10.0%
2 2095
 
4.1%
3 1369
 
2.7%
4 1067
 
2.1%
5 800
 
1.6%
6 675
 
1.3%
7 561
 
1.1%
8 474
 
0.9%
9 424
 
0.8%
ValueCountFrequency (%)
350 1
< 0.1%
290 1
< 0.1%
277 1
< 0.1%
271 1
< 0.1%
269 1
< 0.1%
254 1
< 0.1%
247 1
< 0.1%
245 1
< 0.1%
243 1
< 0.1%
242 1
< 0.1%

incidence_per_capita
Real number (ℝ)

High correlation  Zeros 

Distinct673
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.24910794
Minimum0
Maximum50.36
Zeros32576
Zeros (%)64.0%
Negative0
Negative (%)0.0%
Memory size397.9 KiB
2024-11-07T21:00:02.262494image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.09
95-th percentile1.46
Maximum50.36
Range50.36
Interquartile range (IQR)0.09

Descriptive statistics

Standard deviation0.82433133
Coefficient of variation (CV)3.3091331
Kurtosis322.39074
Mean0.24910794
Median Absolute Deviation (MAD)0
Skewness10.060931
Sum12683.58
Variance0.67952214
MonotonicityNot monotonic
2024-11-07T21:00:02.431899image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 32576
64.0%
0.03 1079
 
2.1%
0.04 973
 
1.9%
0.05 784
 
1.5%
0.01 638
 
1.3%
0.02 638
 
1.3%
0.06 554
 
1.1%
0.08 505
 
1.0%
0.09 420
 
0.8%
0.11 404
 
0.8%
Other values (663) 12345
 
24.2%
ValueCountFrequency (%)
0 32576
64.0%
0.01 638
 
1.3%
0.02 638
 
1.3%
0.03 1079
 
2.1%
0.04 973
 
1.9%
0.05 784
 
1.5%
0.06 554
 
1.1%
0.07 385
 
0.8%
0.08 505
 
1.0%
0.09 420
 
0.8%
ValueCountFrequency (%)
50.36 1
< 0.1%
21.9 1
< 0.1%
18.61 1
< 0.1%
15.88 1
< 0.1%
15.07 1
< 0.1%
13.75 1
< 0.1%
12.9 1
< 0.1%
12.73 1
< 0.1%
12.67 1
< 0.1%
12.56 1
< 0.1%

Interactions

2024-11-07T20:59:59.800183image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:58.652941image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:59.296724image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:59.950907image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:58.816973image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:59.483760image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T21:00:00.138810image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:59.130719image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2024-11-07T20:59:59.651261image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2024-11-07T21:00:02.566629image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
casesincidence_per_capitastatestate_nameweek
cases1.0000.9890.1020.102-0.425
incidence_per_capita0.9891.0000.0540.054-0.427
state0.1020.0541.0001.0000.032
state_name0.1020.0541.0001.0000.032
week-0.425-0.4270.0320.0321.000

Missing values

2024-11-07T21:00:00.362983image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-11-07T21:00:00.629151image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

weekstatestate_namediseasecasesincidence_per_capita
0192801ALALABAMASMALLPOX10.04
1192801ARARKANSASSMALLPOX70.38
2192801AZARIZONASMALLPOX00.00
3192801CACALIFORNIASMALLPOX180.34
4192801COCOLORADOSMALLPOX313.06
5192801CTCONNECTICUTSMALLPOX261.65
6192801DEDELAWARESMALLPOX00.00
7192801FLFLORIDASMALLPOX10.07
8192801GAGEORGIASMALLPOX00.00
9192801IAIOWASMALLPOX582.37
weekstatestate_namediseasecasesincidence_per_capita
50906195217NVNEVADASMALLPOX10.55
50907195219AZARIZONASMALLPOX20.24
50908195220NENEBRASKASMALLPOX10.08
50909195221KSKANSASSMALLPOX10.05
50910195224WIWISCONSINSMALLPOX10.03
50911195226NMNEW MEXICOSMALLPOX10.14
50912195227NVNEVADASMALLPOX10.55
50913195239MTMONTANASMALLPOX10.17
50914195244SDSOUTH DAKOTASMALLPOX10.15
50915195250IAIOWASMALLPOX10.04