Dunning-Kruger Illustrated, ASMR Explained & Bad Non-Official Czech Data

Another of these statisticians' spats that we all love

OpenVAET

Jul 27, 2024

Context

A few days ago, Steve Kirsch returned & initiated a new mess1, with analytics on the Czech Republic, derived from a bad university-produced record-level dataset2, compiled by Univerzita Palackého v Olomouci. The result is, as usual, representing the unvaccinated dying more than the vaccinated, based on awfully confounded data.

Dr Ah Kahn Syed (to whom the credit of a good part of the interesting findings developed here is due) & myself analyzed this data & decided to postpone our current 8 pages of data consistency analytics & subsequent demonstrations, while waiting for the publication of the official CZ dataset - announced soon to come by UZIS, their statistical office. Shortly explained (we will briefly expand on that below), the current dataset is riddled with flaws & unknown confounders, and shouldn’t be used to draw conclusions - but merely to demonstrate how poor the observational data used is, in general.

In the following developments and controversies, Henjin (ex Mongol_fi), 8-bits data mixer masquerading as “experienced conspiracy theorist”, produced yet another of these Chat-GPT analytic flows3 that he masters like no one else, in his urge to demonstrate that the jabs are working as intended.

The character appeared on the Twitter-scene end 2023, and his only endorsed conspiracy to date is “all Jews are bad”. Don’t ask him to justify this view, or that makes you a “baby truther” only worthy of his despise.

In the course of our own analytics, I realized with surprise, painfully re-reading his non-commented and non-indented code, that what he called "ASMR"4 (Age Standardized Mortality Rate) was using a flawed rating system, not a true age-standardized rate as it should be. ASMR involves taking a population with a known mortality rate in each 5-year age band, which gives a crude mortality rate for the whole population. You then recalculate the overall mortality rate, assuming those same age bands maintain their mortality rate, but you use the population distribution from the "standard" population to recalculate the overall mortality of your population. It is a specific method to control for different age distributions among populations or over time, which, by definition, involves the use of a "standardized" population. Yet, at this time, the page explained that the following was an ASMR without any reference population being loaded, which was rather surprising.

Without noting that the “way he figured out” was a bastardized version of my latest analytics on New-Zealand5, I kindly pointed out this ASMR terminology issue to him on July 196. Running the code on his archived page resulted in the following plot. Said plot was taken at face value by various analysts, and generated further virtual ink which could have been spared.

While one could have expected a public acknowledgement of his error without further delay, that’s… not what happened, and Henjin went for editing page & code7, of course without erratum. But at least we gained more colors.

Judging useless to reply to this error pointed out, he reappeared yesterday in another thread, for his 40th~ attempt to have me doubting Jikkyleaks. At which stage, annoyed, I stressed again the issue8.

Followed a long digression where neither papers explaining ASMR9 or ChatGPT definitions (according to his stated preference10) succeeded in having him admitting his mistake. His argument is “I use a non-standard standard population. It's still ASMR and the same formula”11.

Hence the current break-down.

Henjin’s thing detailed

So…

keeping only his “Unvaccinated” and “All Vaccinated” categories,
shortening his code for sobriety,
commenting it for readability,
and truncating to the relevant period,
without altering a line of its behavior,

… this is what Henjin is currently calculating (R 12):

Using a file he extrapolated13 from the Kirsch-provided dataset, he picks what he pre-calculated (poorly) to be “alive” and “dead” populations for each dose, at each date14.

From a second file provided by the CZ institute15, he picks his reference population - being the 2021 CZ data.

He then employs various hacks & smoothing to make the gum stick and produce the end chart he calls “ASMR” - the daily deaths rates in each age groups, weighted against the people of age, still alive, according to his “random birth dates estimates”, being barbarically weighted against the CZ 2021 population deemed “standard”.

Calculating ASMR from bad data

To calculate (basic) ASMR in order to produce what Henjin attempted, one requires 2 source files:

1. A similar file to Henjin’s one regarding the population available, for each age & dose received. As stated on point 1 of the previous section, I disagree with the method used to generate this file originally, but given it’s just about demonstrating the difference in calculation on a crappy metric, let’s stick with it for now to simply compare the end result.
2. A (real) standardized population, which is a fiction existing nowhere but in statistics16. It doesn’t matter much which commonly acknowledged standard you use, being the WHO one (default, which replaced the US standard17 in 200018) or the EU one19 (common - as long as you precise it - currently on its 2013 revision20 of the Waterhouse et al., 1976 standard). Here, we will use the WHO one, which contains percentages by 5-year age groups, summarized in the following .CSV file 21.

From here, this is what the chart should have looked like, to be titled ASMR without misleading the readers (Perl 22, R 23).

From which data does such dataset originate

At this stage, it is useful to understand how such dataset is created, at state level. Most - if not every modern country - will have a well maintained & robust Civil Registry database - containing their population, kept by the Interior Ministry, or is delegated to the country’s statistical office.

Most - if not every modern country - will have initiated a distinct database from this Civil Registry to store vaccinations data. Some (rare) will have had, when COVID started, an existing infrastructure compliant with 1 to N doses of vaccines administered to each citizen. Most will have improvised their tools, and evolved them as the information evolved (for example, planning originally for 2 doses by citizen, then adjusting later for 4 boosters, then adjusting for up to 10 boosters, etc.)

At each of these decisions and adjustments, opportunities for poor data management, structural errors, and failures in later reconciliations are created.

From these statements which I don’t think anyone understanding data will contest, can be derived two main scenarios :

1. In an optimal scenario, the country would have a perfect Civil Registry data. This data would contain every useful 1 to 1 data concerning the citizen (date of birth, date of death if any, social security number, current height, current weight, current educational level, etc.).
When people would apply to vaccination, they would simply provide their social security number. Everyone would have a social security number, know it, and prior to administer any vaccine, the doctor, pharmacist or nurse would verify that this social security number exists in the system, cross-confirm the identity of the recipient, and would instantly log the dose in the system after it had been administered - someone else being ready to intervene if the patient had an incident following the injection.
2 years later, the data on first doses administered would perfectly fit the state registries on vaccination administered at individual level.
In the end, knowing how many of the citizens are vaccinated would simply be a matter of using the social security number to match civil registry records & doses records, producing perfect record level data.
→ Needless to say this country exists only in the mind of a few statisticians.
2. In a less optimal scenario, reality and human errors interfere. Some people provide erroneous social security numbers, or the country simply requires a first name, last name, and date of birth to allow someone to get vaccinated. The agents in charge are capturing the data which people can provide them prior to capture the doses administered, when they have time to proceed with the later. Errors occur when providing social security numbers, entering them, these numbers aren't checked by the system for existence prior to validation, etc. These problems will be naturally inflated in countries with non-standard Latin alphabets, making name matching from one database to another quite hazardous (“Ničolá” in the civil registrar data can be “Nicola” in the “doses data” and for a computer, these aren’t the same thing at all).
To edit metrics, the state statisticians are adjusting their database in a hurry, and creating duplicate data by adding data to the vaccination database which doesn’t belong here - for example, the date of deaths of subjects. NZ, covered in various previous articles, offered an excellent example of such “data duplication” & conflicts.
→ This will be most of the countries presented as “good data”. Of course there is far worse, such “country which vaccinated in stadium with improvised workers, without taking precisely any valid info”. If you’re having doubts CZ was in this case, read the “I’m having problems” section on this page24.

How such dataset is created

Having in mind that most countries will derive from “Scenario 2” above, let’s pick an imaginary country, of 10 inhabitants - 5 vaccinated, and 5 unvaccinated, of whom 2 died in the recent period - for ease of illustration.

This country will maintain the following 2 databases, represented as follows.

In this simple example, 2 entries out of 5, in the vaccination records, have issues:

Eva Černá is named Eva Cerna in the vaccination database, and has a social security number error
Jana Svobodová hasn’t provided a security number at all.

Upon matching these two databases, the operator will therefore have, based on the social security number, 3 out of 5 of his subjects matched.

He will then, looking for the subject on the basis of the name & date of birth, easily find Jana Svobodová. At this stage, he will be left with only one record, non-matched, who died.

He can resolve this conflict one of three ways:

Passing through an advanced reconciliation process - looking for the most likely unvaccinated match in his civil registry using Levenshtein distance, and other tools to analyse potential DOB errors, social security numbers errors, etc.
→ This might sound simple in a database of 10 people but becomes harder in a database of 100.000 people - not to mention several millions, and practically speaking, almost none will do that unless he has a real interest in getting to the truth of the matter.
Deciding that subjects who aren’t found in the Vaccination Database are considered unvaccinated, while keeping the Civil Registry as “reference”. The results will be a 10 subjects database, with 2 unvaccinated deaths - instead of the 1 vaccinated - 1 unvaccinated, which happened in reality.
Deciding to merge these two databases - not to ignore a vaccinated death. The result will be an 11 subjects output, with 2 unvaccinated deaths, and 1 vaccinated death, resulting in more deaths than occurred in reality.

So, why is ASMR a terrible metric ?

First, ASMR, like the Mongol-thing first described, are both integrating infant deaths by default - which will constitute the bulk of almost fully unvaccinated deaths under 12 - weighting on the groups balance.

The CZ census projection for the covered period - here 2022 & 2023 January 1st populations by ages25 - which we will pick on Eurostats26 - is available. The last physical census occurred in 2011, with CZ moving to an online census in 202127.

Mongol had produced a 2021 “record level data to census comparison”28, which we declined, simplifying it to simply check against EuroStat figures only (R 29).

From higher percentages in the record level data, he concluded that it may include non-residents - but that give or take 1.5% of the population, we were pretty much fitting.

The problem is that performing the same comparison produces drastically different results at cut-off 2022-12-31, when compared to the January 1st 2023 census data a day later - with all age groups being under Census figures - aside for the 75+ (R 30).

It means, obviously, that too many people have been counted as “dead” in the record level dataset for the “active population” - that the record level data is incomplete - or that the census data is terrible - or all of these options together - make your pick.

Lastly, ASMR deprives us of more relevant & granular analytics by age groups, which would make rather obvious that the dataset, as such, is unusable for any analysis of effectiveness.

Miscategorization & data integrity issues - Death Rates in Age groups 10 to 24, in 2021

The Czech Republic keeps on its statistical platform a file containing all the COVID deaths registered within the state during the pandemic, with date of death, age, and sex31. This allows us to represent that during the course of the pandemic, 16 COVID deaths occurred in the 10 to 24 age group (R 32).

The WHO33 provides statistics on the causes of deaths34 by age groups in the Czech Republic35. Allowing us to represent that the leading causes of deaths in this age group, in CZ, are aneurysm, suicides & car accidents (R 36).

Therefore, anyone who isn’t Debunk the Funk37 and has some basic understanding of statistics should understand that we can expect our death rates, in the 10-24 age groups, to be relatively balanced between vaccinated & unvaccinated, unless the claim is out there that COVID vaccines protect you from car accidents & other life hazards. Surprise, this isn’t the case - confirming the dataset is terrible for effectiveness analysis - as anticipated (Perl 38, R 39).

No doubt that Uncle John Returns40 or another dedicated gas-lighter will try “HVE” - if they haven’t already.

In the meantime, the unpleasant reality which these parties never want to discuss too much persists: you don’t cheat when your product works as advertised, and we already established that they cheated on every metric.

As always, and while any persisting error would be my sole fault, thanks to Shez & Arkmedic's blog for their insights & patients corrections.

💬 Join the conversation

Want to like, comment, or share this article?
Head over to our Substack page to engage with the community.

View on Substack

Likes, comments, and shares are synchronized here every 5 minutes.

https://kirschsubstack.com/p/breaking-record-level-data-from-czech

https://github.com/PalackyUniversity/uzis-data-analysis/blob/main/data/Vesely_106_202403141131.tar.xz

https://web.archive.org/web/20240718214004/https://sars2.net/czech.html#ASMR_by_month_and_vaccine_type

https://cdn.who.int/media/docs/default-source/gho-documents/global-health-estimates/gpe_discussion_paper_series_paper31_2001_age_standardization_rates.pdf

https://www.openvaet.info/p/new-zealand-demographic-data-and

https://x.com/canceledmouse/status/1814157174681034990

http://web.archive.org/web/20240725115146/https://sars2.net/czech.html#Plot_for_ASMR_by_dose_and_date

https://x.com/canceledmouse/status/1816143760797098182

https://x.com/canceledmouse/status/1816148296840405256

https://x.com/canceledmouse/status/1816244644981924101

https://x.com/henjin256/status/1816360408212476200 - https://archive.ph/HArWU

https://github.com/OpenVaet/mongol_asmr/blob/main/mongoljunk/mongol_rates_simple.R

http://web.archive.org/web/20240725115146/https://sars2.net/czech.html#Plot_for_ASMR_by_dose_and_date - See the section “Bucket Analysis”

Was present in the article before : “ To calculate the date of birth, he picks a random birth date corresponding to the birth year, for each subject… which is a great way to ensure his code will be impossible to reproduce, and that the population & resulting summaries will change on each iteration of his script.” - but the critic was invalid due to a seed usage - as established by Mongol in the comments.

I had originally wrongly attributed the file origin to Eurostat while, as established by Mongol, it came from CZ’s statistical institute - https://vdb.czso.cz/vdbvo2/faces/en/index.jsf?page=vystup-objekt&z=T&f=TABULKA&skupId=4449&katalog=33517&pvo=SLD21022-VSE&pvo=SLD21022-VSE&str=v335.

https://seer.cancer.gov/stdpopulations/world.who.html

https://seer.cancer.gov/stdpopulations/stdpop.19ages.html

https://seer.cancer.gov/stdpopulations/single_age.html

https://ec.europa.eu/eurostat/databrowser/bookmark/a720a4ea-3831-4276-8da7-974fb22e2f88?lang=en

https://search.r-project.org/CRAN/refmans/PHEindicatormethods/html/esp2013.html

https://github.com/OpenVaet/mongol_asmr/blob/main/data/who_standard_population_age_group_5.csv

https://github.com/OpenVaet/mongol_asmr/blob/main/calculate_asmr_v2.pl

https://github.com/OpenVaet/mongol_asmr/blob/main/mongoljunk/mongol_rates_fixed.R

https://www.expats.cz/czech-news/article/vaccination-in-the-czech-republic-all-you-need-to-know

https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/demo_pjan/A.NR.Y_LT1+Y1+Y2+Y3+Y4+Y5+Y6+Y7+Y8+Y9+Y10+Y11+Y12+Y13+Y14+Y15+Y16+Y17+Y18+Y19+Y20+Y21+Y22+Y23+Y24+Y25+Y26+Y27+Y28+Y29+Y30+Y31+Y32+Y33+Y34+Y35+Y36+Y37+Y38+Y39+Y40+Y41+Y42+Y43+Y44+Y45+Y46+Y47+Y48+Y49+Y50+Y51+Y52+Y53+Y54+Y55+Y56+Y57+Y58+Y59+Y60+Y61+Y62+Y63+Y64+Y65+Y66+Y67+Y68+Y69+Y70+Y71+Y72+Y73+Y74+Y75+Y76+Y77+Y78+Y79+Y80+Y81+Y82+Y83+Y84+Y85+Y86+Y87+Y88+Y89+Y90+Y91+Y92+Y93+Y94+Y95+Y96+Y97+Y98+Y99+Y_OPEN+UNK.T.CZ/?format=TSV&startPeriod=2021&endPeriod=2023

https://ec.europa.eu/eurostat/databrowser/bookmark/8841ab13-2919-44ca-bee9-b2c5a72d9edf?lang=en