Shielding of Electrosmog: Some Observations (part 8)
Today I consider effect size as well as statistical significance in the assessment of orgonite as a method of protecting against cell phone RF EMR.
Well then, part 7 proved to be a trifle exciting even if rigorous statistical testing using Kruskal-Wallis clobbered the notion that we were seeing a bone fide reduction in the mean RF EMR once orgonite buttons were attached to my Motorola G23 Android (albeit a modest one). The problem with statistical testing of real-world ‘noisy’ data is that if we don’t have a big enough sample size then no mean difference is going to be declared as statistically significant. On the other hand if we collect thousands of observations then pretty much any mean difference is going to be declared as statistically significant even if this carries no real world meaning. This opens up a can of worms that we ought to cogitate on without delay; though I am going to suggest that normal people skip this entire article if they wish to remain sane...
Muchos Ouchos
Relying solely on p-values for statistical testing of mean differences is inadequate for several interconnected reasons. A p-value only indicates the compatibility of the observed data with the null hypothesis, not the probability that the null hypothesis is true, which is a common misinterpretation. The p-value is not a measure of effect size or the practical importance of a result, which means a statistically significant difference (p<0.05) can be trivial in real-world terms, whilst a non-significant result (p>0.05) can actually represent a scientifically meaningful effect, especially if the study lacks sufficient power. Ouch!
The arbitrary p=0.05 threshold for statistical significance is a convention without inherent scientific justification and leads to simplistic thinking patterns that discard valuable information. This practice is particularly problematic with very large sample sizes where even minuscule differences can easily yield very small p-values, making the result statistically significant but not meaningful. Double ouch!
Furthermore, the widespread practice of conducting multiple statistical tests on the same dataset and reporting only those with p<=0.05 (known as ‘p-hacking’ or ‘data dredging’) leads to a high rate of false positives and undermines the reproducibility of research. The American Statistical Association (ASA) has explicitly stated that scientific conclusions should not be based solely on whether a p-value crosses an arbitrary threshold, and that proper inference requires complete transparency about all analyses conducted. A more robust approach involves reporting the actual p-value attained alongside confidence intervals, whilst considering data quality and the practical significance of the findings. Triple ouch!
If all this nerd speak floats your boat then this paper might be of interest, along with this paper and this paper. Meanwhile, we’ve somehow got to get back to Earth in a meaningful manner when dealing with RF EMR signal means as measured in my kitchen. H’mmm…
Effect Size
Something else we can do to save us from a sea of chaos is to calculate what is known as the effect size.
Effect size is a quantitative measure of the magnitude of a relationship between variables; that is to say it attempts to quantify the difference between sub-groups in a population. It provides numerical information about the ‘strength’ or practical significance of a finding, indicating how meaningful the observed effect is in real-world terms. Unlike statistical significance (p-values), which only tells us whether a genuine effect likely exists, effect size tells us how large or important the effect is. A statistically significant result (p<0.05) with a very small effect size might not be practically meaningful.
Effect size can be expressed in different ways depending on the context. Standardised effect sizes are particularly useful because they allow for comparison across studies with different units of measurement or sample sizes. For instance, Cohen's d expresses the difference between two means in terms of standard deviation units, with values of 0.2, 0.5, and 0.8 generally considered ‘small’, ‘medium’, and ‘large’ effects, respectively. Effect size is crucial for understanding the practical importance of research findings, as a statistically significant result does not guarantee a large or meaningful effect.
So, then, Cohen’s d is our saviour from a sea of chaos eh? We may ask what exactly the heck is this when it’s at home. Well, it’s a pretty darn basic concept being the absolute difference in the means between two sub-groups divided by their pooled standard deviation. If we think of the mean difference as ‘muscle’ and the standard deviation as ‘wobble’, then Cohen’s d is a way of expressing muscle per wobble, and we may deduce it will take values between 0 and 1.
Yes indeedy, I am going to derive Cohen’s d right now using my trusty hand held vintage calculator so we may see what those orgonite buttons were capable of in terms of effect size.
Back To The Results
When it comes to estimating effect size we’ve got three key groups to consider:
Transmitting phone with no orgonite buttons;
Transmitting phone with 1 orgonite button attached;
Transmitting phone with 2 orgonite buttons attached.
So far so good, but what about that group 1 wacko outlier of 3760.9 mW/m² observed at 13:56:13? Well, we can simply delete it or, better still, replace it with the mean of the values observed during 13:56:12 and 13:56:14, this being 76.624 mW/m² - that’s quite a big difference!
What we need now is a table of group means, pooled standard deviations and derivatives for Cohen’s d. Here’s one I baked earlier:
Just in case folk haven’t cottoned to what I’m doing, I start out with the three individual group means and standard deviations. I then derive the mean differences between group 1 and group 2 (-2.31 mW/m²), group 1 and group 3 (-4.86 mW/m²), and group 2 and group 3 (-2.55 mW/m²). The corresponding pooled standard errors between group pairs appear to the right of mean differences in exactly the same manner. Over on the far right are the three possible estimates for Cohen’s d.
Interesting they are too, with the gold medal going to an effect size estimated at -0.35 for the drop in RF EMR signal between the no button and 2 button phases. In real world terms a coefficient of this magnitude is considered to be small-to-middling. In p-value terms I can report a statistically significant Kruskal-Wallis outcome of p=0.040, which is tantalising. Herewith the raw stats output from Kruskal-Wallis for consideration:
Beige man strikes again! This time he’s telling us that there is indeed a statistically significant difference between no button and 2 button phases (just!) but the effect size is very modest. Sounds to me like I need to throw a larger lump of orgonite at my phone and extend the observation period. Until then…
Kettle On!
So pleased to read this data, I've wondered about effectiveness. I'm also guessing the proportions of crystal and metals in the resin mix may have an impact too, but you'd need to source these from different suppliers who vouch for the type and proportions of materials used?