Preface
This book represents an extended and thoroughly revised version of a collection of lecture notes on permutation testing of multidimensional hypotheses (Pesarin, 1999b). The key idea, which allows us to deal with a wide range of complex problems in easy-to-check conditions and which also provides generally good solutions, is based on the method of nonparametric combination of dependent tests. This method assumes that a testing problem is properly broken down into a set of simpler sub-problems, each provided with a proper permutation solution, and that these sub-problems can be jointly analysed in order to maintain underlying and possibly unknown dependence relations.
Research connected with the nonparametric combination method started about 25 years ago when a biologist asked for a solution to a rather unusual problem: of 40 seeds of a given plant, a random sample of 20 were sown in a normal soil and the other 20 in a fertilized soil. The expected effect of the treatment was that fertilizer stochastically increase one or more of the following three variables: i) probability of germination, ii) plant weight, and iii) the first two moments of the total surface of the leaves. It should be noted that the first variable is binary and the other two real valued. Moreover, it is difficult to assume a proper distributional model, such as normality, for quantitative variables and a model for underlying dependence relations.
The problem was naturally broken down into at least three sub-problems:
one related to the binary variable on germination, the second to weight,
and the third to total leaf surface. All component-wise sub-alternatives were
restricted to positive increments, i.e. to three one-sided alternatives, with
one categorical and, subject to germination, two real valued variables.
If separately considered, the first sub-problem was easy to deal with, as it
can be solved, for instance, by Fisher's exact probability test.
Regarding the other two, due to the fact that, in the alternative, treatment
is assumed to influence the probability of germination or, in other terms,
data observability, analysis conditional on germinated seeds was found to be
rather difficult. Moreover, the third sub-problem, being related to two concurrent
aspects which must be jointly taken into account, was initially solved by only
examining the second moment because, to the best of our knowledge, no general
solution for concurrent multi-aspect problems was found in the literature.
In addition, the overall combination, due to the unknown dependence relations
between three partial tests, was extremely difficult, especially because of the
lack of theory and of appropriate methods. Thus, the overall solution was found to
be too difficult to deal with, and the problem therefore had to be left unsolved
until proper methods and related coherent theories, one on the nonparametric combination
of dependent tests, one on multi-aspect testing, and one on testing problems in which
part of the data may be missing not completely at random, had been developed.
The complete solution to this challenging and complex problem is discussed
in Section 9.6 for the first time. However, although the problem was at the very origin
of the research which led to the development of nonparametric combination methodology and
to several of its applications, I also would like to remark that almost all methods
discussed herein originated from formal analysis of real problems for the most part related
to biostatistics. Chapters 7 to 12 discuss several of these problems, covering a wide range
of practical situations. Only very few methods sprang from the necessity of improving
already existing solutions, among them a multivariate
extension of McNemar's test, a multivariate extension of Fisher's exact probability test,
a solution to the multivariate Behrens-Fisher problem, and a multivariate goodness-of-fit
test for ordered categorical variables under order restrictions.
Most of the material in this book was developed for lecture notes for teaching my undergraduate
classes in nonparametric statistics to laurea levels at the Faculty of Statistical Sciences
and for some doctoral courses in Statistics, both at the University of Padova.
The same material was also used for one course in nonparametric methods for the Eighth
International Summer School on Probability and Mathematical Statistics, organised by the
Bulgarian Academy of Sciences and held in Varna on the Black Sea in June 1994,
and for a Summer School in Statistical Methodology organized by the Italian Statistical
Society (SIS) in September 1995 and 1996. Moreover, several related topics were the
subject-matter of a series of seminars and conferences presented to many university
departments, international research institutions and international meetings.
A substantial proportion of the material is original, and has been prepared specifically for
this book;part is taken from published articles. Some parts, especially relating to
simulations on the unconditional power behaviour of many tests, derive from dissertations
by a group of my students at the Faculty of Statistical Sciences. The introductory part,
concernining basic theory and univariate problems, presented in Chapters 2 to 5 is a brief
review of the existing literature, with little by way of personal contribution, in which the main
guidelines are based on conditionality, similarity and exchangeability principles.
With respect to the original collection of lecture notes, this new version contains
several improvements, and several new problems and related solutions are also discussed.
Some of these appear here for the first time, in the sense that they have not been
previously published in journals or presented at international meetings. This is partially
due to their novelty and partially to the fact that the referees of many journals are
relatively more cautious with papers which are substantially innovative and thus the
publishing process may become considerably longer than usual.
About the Contents
Chapter 1 contains an introduction to general aspects and principles concerning the
permutation approach. The main emphasis is on principles of conditionality, similarity and
exchangeability, relationships between conditional and unconditional inferences, why and
when conditioning may be necessary, why permutations result from both conditioning with
respect to the data set and exchangeability of data in the null hypothesis, etc...
Chapter 2,
through a heuristic discussion of a simple example on a problem with paired data, introduces
the concept of permutation testing, including discussions on conditional Monte Carlo methods
for estimating the distribution of a test statistic and on confidence intervals for the
so-called "treatment effect".
Chapter 3 formally presents of the theory of permutation tests for one-sample problems,
and includes a formal proof of conditional and unconditional unbiasedness of permutation
test for testing symmetry, a formal definition and derivation of conditional and unconditional
power functions, and a brief discussion on optimal permutation tests.
Chapter 4 contains a review of the most common multi-sample problems with heuristic solutions.
Chapter 5 is devoted to the formal theory of permutation testing for multi-sample problems,
and includes sections devoted to the main asymptotic properties of permutation tests and to
permutation Central Limit Theorems.
Chapter 6 concerns nonparametric combination methodology.
The presentation includes a discussion on assumptions, properties, sufficient conditions
for a complete theory of nonparametric combination of dependent tests, and practical
suggestions for making a reasonable selection of the combining function to be used
when dealing with practical problems. Also included are discussions on four examples:
the first illustrates that the methods do really take care nonparametrically of underlying
dependence relations on partial tests; the second concerns a comparison of quadratic
parametric and nonparametric testing solutions; the third applies the nonparametric
combination to problems of multi-aspect testing; and the fourth deals with exact solutions
of multivariate problems under order restrictions. Some asymptotic results are also discussed.
Chapter 7 examines several application problems solved through nonparametric combination
methodology. These include problems on: multivariate paired observations;
MANOVA with continuous and/or categorical variables; goodness-of-fit for ordered categorical
variables; isotonic inference for categorical variables; and multivariate homoscedastic
repeated measurements. There are also remarks on restricted alternatives.
Chapter 8 is entirely devoted to permutation analysis of factorial designs, and includes
exact solutions for main effects and interactions in balanced and unbalanced replicated
factorial designs, fractional designs, and unreplicated designs.
Chapter 9 concerns testing multivariate problems in which some of the data may be missing
either completely at random or not, provided that the exchangeability property is satisfied
in the null hypothesis.
Chapter 10 is devoted to a discussion of the multivariate permutation Behrens-Fisher problem.
Here, together with an approximate solution to the general problem when underlying distributions
are assumed to be symmetric, an exact solution to a restricted version of the problem,
when the exchangeability property is satisfied in the null hypothesis, is discussed.
Chapter 11 contains a discussion on problems with repeated measurements, including cases in
which the number of observations within each unit is larger than the number of units,
multivariate cases, missing values, and balanced and unbalanced factorial designs.
Chapter 12 discusses further application problems in the area of biostatistics.
Throughout the book, consequences of the main arguments, informal definitions, of certain,
important elements, and important concepts and relevant aspects of analysis are presented
as remarks. Such remarks are numbered by subsection and, when it is necessary to make reference to one of them,
it is cited by number and subsection; for example, remark 3,3.4.2, refers to the
third remark of subsection 2 of Section 4 in Chapter 3.
Examples, theorems, lemmas, propositions, figures, tables, etc. are numbered by chapter and
are quoted accordingly. As a rule, formulae are not numbered.
Generally, Theorems from the literature are reported without proof, whereas the most
important properties of permutation tests, regarding their conditional and unconditional
exactness, unbiasedness, consistency, power function, etc., are explicitly established and proved.
Simple proofs of more specific properties are sometimes proposed to the reader as exercises.
Several exercises and problems are proposed at the end of many Sections or Chapters.
A list of references may be found at the end of the volume.
The computer intensive calculation required by nonparametric combination methods are possible with the aid of the proper software.
Some of this software is available (free of charge) on the internet at http://www.stat.unipd.it/~pesarin and
http://www.wiley.com. At both these sites you will find a demonstration copy of NPC Test 2.0
(Non Parametrics Combinations of Dependent Permutation Tests) with which although subject to some
limitation, it is possible to perform some of the methods discussed in the book, for example,
two-sample analysis with continuos and binary variables, repeated measures, use of several
combination function and stratified analysis.Also avaliable are an SAS macro called NPC,
an S-PLUS code and the data sets of all examples discussed in the book.
The full version of NPC Test 2.0 is available from Methodologica (the company producing NPC Test).
This version of the software can also be used for: multi-aspect testing; analyses with missing
values, including non ignorable situations; multi-sample problems with continuous, categorical,
ordered categorical, and mixed variables; several combining functions;
repeated measures, etc. (See Appendix 12.8).
In order to obtain estimates of permutation distributions at any desired accuracy,
a special add-on procedure which enables practically unlimited conditional Monte Carlo iterations will be
available in future releases. For further information on NPC Test 2.0 and all available
updates send an email to info@methodologica.it or visit www.methodologica.it.
We also mention that Cytel Software Corporation (Cambridge Massachussetts) is planning to
integrate NPC Test into future versions of StatXact, the software which provides exact permutational inference for nonparametric and categorical data.
For available information on StatXact plus NPC Test visit www.cytel.com.
Background and Readership
In writing this book, we have assumed that readers are well acquainted with the very basic
concepts of statistical theory, especially with regard to: elementary probability theory;
estimation methods based on sampling from finite populations; testing of hypotheses;
elementary nonparametric methods; the concept and use of sufficiency; the elementary theory,
meaning and interpretation of conditional inference; Monte Carlo simulation techniques, etc.
In order to justify the methods and to give reasonable interpretations of results, most of
the problems and related methods are introduced by means of examples and heuristic arguments,
especially when intuition is sufficient to avoid ambiguities. However, although a number of
tests are explained and rationally justified also in terms of conditioning with respect to a
set of sufficient statistics in the null hypothesis, only a limited number of them are formally
constructed by direct derivation from the conditionality principle. Most complex problems,
which are generally introduced by means of heuristic reasoning, are also analysed using formal
arguments, especially if the proposed solutions are at least partially innovative.
This is in order to provide a rational background for the most important aspects of analyses,
methods and solutions, and to improve precision, so that possible misunderstandings can
hopefully be reduced to the minimum. However, we do use formal arguments and mathematics
essentially without exceeding undergraduate level.
Most theoretical arguments are discussed in Chapters 3, 5, 6 and partly also in Chapters 8, 9, and 10.
For better comprehension of the main applications, at least Sections 6.1 to 6.6 should be
read before Chapters 7 to 12, which are mostly devoted to discussions of application problems.
Readers with some background in permutation testing may start reading from Chapter 6, although
Chapters 2 to 5, together with some important concepts which are generally omitted from most
textbooks, contain a preparatory exposition of the main arguments and concepts as they are used
later. However, at a first reading, Chapters 3 and 5, which mostly concern theoretical aspects
for univariate problems with some mathematics, may be omitted without compromising understanding
of the main ideas, concepts, methods and application problems. In any case, as it contains the
basic principles and the inferential role of permutation analysis, all readers should begin by reding Chapter 1.
The book is intended for at least three kinds of readers: a) students of intermediate and advanced
courses in applied statistics, who are presumed to be mostly interested in applications;
b) students of advanced courses in statistics, who are interested in nonparametric theory and
methods; c) professional statisticians, researchers and practitioners of many areas of application
(not restricted to biostatistics), who are facing complex testing problems which may be broken
down into a finite set of simpler sub-problems.
Although the book does not cover all the mathematical aspects related to conditional testing
and nonparametric combination, to some extent several parts may also be useful for researchers
in mathematical statistics and statistical methodology, especially related to nonparametrics.
Relationship with Other Books
There are a relatively limited number of books which are fully or partially devoted to
permutation tests. Best known are those of Edgington (1995), Good (2000), Lunneborg (1999),
Manly (1997), Maritz (1995), and Sprent (1998). Moreover, most of the books on nonparametric
methods based on ranks contain one chapter or a few sections discussing the permutation approach.
General theory is summarized in Lehmann's (1986).
In general, the quoted literature focuses much more on univariate problems than on multivariate
ones. Thus, as the present book is mostly devoted to multivariate permutation methods, it may be
seen as complementary to existing literature. In this respect, the methods and related applications
discussed from Chapter 6 onwards are to a great extent new and original. However, as the first
five Chapters are devoted to univariate problems, this volume may also be seen as self-contained.
From a different point of view, it may be seen as complementary to books devoted to multiple
comparisons and multiple testing methods, such as those of Miller (1981), Hochberg and
Tamhane (1987), Westfall and Young (1993), and Hsu (1996). As a matter of fact, the method of
nonparametric combination of several dependent tests implies that a complex problem is broken
down into a finite set of simpler sub-problems, each admitting a suitable permutation solution,
followed by their combination. When partial inferences are also of interest, partial tests should
be adjusted for multiplicity, in accordance with multiple testing procedures.
In addition, Sections 4.4, 7.4 and 7.5, dedicated to permutation analysis of ordered categorical
variables, discuss problems and related solutions for multivariate stochastic dominance which
are complementary to the existing literature on these subjects (see, for instance, Cressie and Read, 1988, and Agresti, 1990).
Acknowledgments
In presenting this new version, 18 years after his death, I would first like to acknowledge
my master, colleague and friend Professor Odoardo Cucconi, whose intellectual stimuli introduced
me to the fascinating world of nonparametrics and to several of the open testing problems which
gave rise to part of the research related to this book. Moreover, I would like to remember his
teaching, especially his frequent recommendations not to hesitate, when dealing with a difficult
problem, to abandon already tested methods of analysis, if they are found to be unproductive or
unsuitable, and to proceed if necessary by revising the very basic ideas and principles, and
eventually to try other approaches, even partially or fully innovative. This aspect of his
teaching, although not easy to put into practice, was followed in dealing with many of the problems
discussed herein.
I would also like to express my thanks to the many colleagues who read parts of the written
material or attended my seminars and gave me constructive criticism, useful suggestions, new
methodological problems to solve, application problems and related data sets to analyse. They
include: Michel Broniatowski, Giorgio Celant, Daniele De Martini, Augusto Di Castelnuovo, Antony
Dusoir, Hammou El Barmi, Giovanni Fava, Phillip Good, Tim Hesterberg, Chihiro Hirotsu, Oliviero Lessi,
Tom Loughin, Cliff Lunneborg, Hossein Mansouri, Fortunato Munaņ, Patrick Onghena, Andrea Pallini,
Italo Pegoraro, Magnus Peterson, Vanamamalay Seshadri, Jordan Stoyanov, Takakazu Sugiyama,
Peter Westfall, and others.
I must not forget to thank my students: Rosa Arboretti, Federica Barzi, Piera Belluardo,
Rosita Bertacche, Elisa Bosco, Manuela Cazzaro, Giovanni Cossini, Francesco Dalla Valle,
Fabio Di Nuzzo, Livio Finos, Patrizia Furlan, Lorenzo Gaeta, Michele Gaffo, Anna Giraldo,
Francesca Grum, Alessandro Lago, Dario Mazzaro, Francesca Parpinel, Caterina Pasqualetto,
Mauro Zucchetto, and many others, for helping to deal with several application problems,
making power simulations and providing computing routines. Moreover, I wish to thank
Patrizia Piacentini for helping in the typing adjustment of parts of the book, and Gabriel
Walton and Malcolm Peebles for revising the English text.
I especially wish to express my thanks to Luigi Salmaso for his continuous collaboration and
help in: suggesting new aspects to be analysed, finding new problems, providing proof of some
Theorems, writing two important Chapters (the eight and twelfth), revising most of the book,
finding references, providing most computiong routines and power simulations, consulting for the
software package NPC Test, and designing SAS macro "NPC" and S-Plus code. (Visit www.stat.unipd.it/~salmaso).
I wish to thank Robert Calver, Sharon Clutton, Sarah Corney and Sian Jones of John Wiley Sons
in Chichester for their valuable publishing suggestions.
In addition, I would like to acknowledge the University of Padova and the Italian Ministry
for University and Scientific and Technological Research (MURST) for providing the financial
support for the necessary research and developing part of the software.
The responsibility for any mistakes and for the ideas expressed herein is mine alone.
|
Fortunato Pesarin
Padova, March 2001 |
|