A variety of analyses have been undertaken to establish the quality of the data and characterize the output of individual devices and the network as a whole. The first stage is a careful search for any data that are problematic because of equipment failure or other mishap. Such data are removed. With all bad data removed, each individual REG or RNG can be characterized to provide empirical estimates for statistical parameters. These are used to convert the database into a normalized, completely reliable data resource to facilitate rigorous analysis. The intent is to lay the basis for an assessment of the multi-year database with sophisticated statistical and mathematical techniques. We then can use a range of statistical tools to look for small, but reliable changes from expected random distributions that may be correlated with natural or human-generated variables.
A major effort was made to identify the “formal” events that could be accepted according to rigorous criteria. This resulted in a set of 170 usable events over the first 6 years of the project. A total of 13 events that were originally in the formal series were excluded because they were partially redundant or overlapped others, or were not unambiguously defined in the original narrative predictions.
Real Devices vs Theory
Ideally, the trials recorded from the REGs distribute like binomial [200, 0.5] (mean 100, variance 50). But although they all are high-quality random sources, perfect theoretical performance is not the case for these real-life devices. A logical XOR of the raw bit-stream with a fixed pattern of bits with exactly 0.5 probability compensates mean biases of the regs.
Normalized and Standardized Data
After XOR’ing, the mean is guaranteed over the long run to fit theoretical expectation. The trial variances remain biased, however. The biases are small (about 1 part in 10,000) and generally stable on long timescales. We treat them as real albeit tiny biases that need to be corrected by normalization for rigorous analysis. They are corrected by converting the trialsums for each individual egg to standard normal variables (z-scores), based on the emprirical standard deviations.
The normalized and standardized data resource allows us to to a rigorous re-analysis of the event-based experiment This was the primary analysis approach for the first few years of the project, and it generated sufficient evidence of anomalous correlations to justify deeper analysis, and more general correlation strategies. In this approach, “global events” are identified and prediction that specifies a time period and an analysis recipe is registered. The analytical results are combined into a cumulative, or aggregate, assessment of the hypothesis of correlated departures from expectation.
The background of careful preparation for rigorous analysis can be envisioned as a conversion of the GCP database to a “data resource” which can be examined with power and flexibility. As we proceed, new materials will be added to this page. The following excursions are examples of what can now be done with some facility. Some provide deeper understanding of previous work, others give new perspectives and insights. We have developed a number of questions that are capable of informing us deeply about the nature and quality of the evidence. As we proceed, we expect to have many cases that, in Peter’s term, “will require a lot of mulling,” but can learn much from the ability to visualize the data in different ways.
Analysis of Periodic Variation
Fourier analysis gives us a general answer to the question whether there is any indication of periodic structure in the data. We wish to know, for example, if there is any diurnal variation suggesting differences corresponding to time of day, or if there are any longer term effects associated with the day or the week, etc.
Sliding the Event Time
Here we look at the aggregate event zscore when the event examination periods are shifted uniformly in time. The question is what happens to the evidence for anomalous deviation associated with an event as a result of sliding the event periods over the dbase in 1/2 hr steps and recalculating the aggregate Z at each step.
Assessing the Effect of Blocking
The earliest analysis method was a hand calculation using 15-minute blocking of the data. In this method the composite Z for each egg is computed for each time block in the event. The sum of the resulting Z� values is a Chisquare with degrees of freedom equal to the number of blocks times eggs. The early procedure was replaced for most events by a “standard analysis” using the raw data with no blocking. But an obvious question was what effect the various blocking levels might have on the outcome. One form of the question is, “What is the optimum blocking level?” Here we begin to look at this question in a rigorous and comprehensive way.
New Year Celebrations
One of the most interesting recurring events that we have examined is the New Year transition. We have made a prediction each year since 1998/1999 that the period around midnight on New Years eve will show structure in the network variance — the squared Stouffer Z across eggs. Beginning in 1999/2000 we also examined the device variance — the sum of squared Z-scores per egg each second. These analyses accomodate the moving locus of the New Year celebrations by doing a signal average across time zones. The data resource allows a much more facile exploration of the question whether the New Year Variance Analysis shows structure.
Long Trends and Correlations
The rigorously normalized and standardized data resource can be used for a wide variety of completely general analyses that are not constrained to the event-based protocols. For example, we ask whether there is any significant large scale structure with questions addressing long trends and correlations in the full, six-year database.