PAGINA IN ITALIANO ENGLISH PAGE INDEX
book_page_flip_md_wht.gif - 18010 BytesRecent Book

Preface
Table of contents
Catalogue entry
Datasets
Software

E-Mail
Preface


This book represents an extended and thoroughly revised version of a collection of lecture notes on permutation testing of multidimensional hypotheses (Pesarin, 1999b). The key idea, which allows us to deal with a wide range of complex problems in easy-to-check conditions and which also provides generally good solutions, is based on the method of nonparametric combination of dependent tests. This method assumes that a testing problem is properly broken down into a set of simpler sub-problems, each provided with a proper permutation solution, and that these sub-problems can be jointly analysed in order to maintain underlying and possibly unknown dependence relations.

Research connected with the nonparametric combination method started about 25 years ago when a biologist asked for a solution to a rather unusual problem: of 40 seeds of a given plant, a random sample of 20 were sown in a normal soil and the other 20 in a fertilized soil. The expected effect of the treatment was that fertilizer stochastically increase one or more of the following three variables: i) probability of germination, ii) plant weight, and iii) the first two moments of the total surface of the leaves. It should be noted that the first variable is binary and the other two real valued. Moreover, it is difficult to assume a proper distributional model, such as normality, for quantitative variables and a model for underlying dependence relations.

The problem was naturally broken down into at least three sub-problems: one related to the binary variable on germination, the second to weight, and the third to total leaf surface. All component-wise sub-alternatives were restricted to positive increments, i.e. to three one-sided alternatives, with one categorical and, subject to germination, two real valued variables. If separately considered, the first sub-problem was easy to deal with, as it can be solved, for instance, by Fisher's exact probability test. Regarding the other two, due to the fact that, in the alternative, treatment is assumed to influence the probability of germination or, in other terms, data observability, analysis conditional on germinated seeds was found to be rather difficult. Moreover, the third sub-problem, being related to two concurrent aspects which must be jointly taken into account, was initially solved by only examining the second moment because, to the best of our knowledge, no general solution for concurrent multi-aspect problems was found in the literature. In addition, the overall combination, due to the unknown dependence relations between three partial tests, was extremely difficult, especially because of the lack of theory and of appropriate methods. Thus, the overall solution was found to be too difficult to deal with, and the problem therefore had to be left unsolved until proper methods and related coherent theories, one on the nonparametric combination of dependent tests, one on multi-aspect testing, and one on testing problems in which part of the data may be missing not completely at random, had been developed.

The complete solution to this challenging and complex problem is discussed in Section 9.6 for the first time. However, although the problem was at the very origin of the research which led to the development of nonparametric combination methodology and to several of its applications, I also would like to remark that almost all methods discussed herein originated from formal analysis of real problems for the most part related to biostatistics. Chapters 7 to 12 discuss several of these problems, covering a wide range of practical situations. Only very few methods sprang from the necessity of improving already existing solutions, among them a multivariate extension of McNemar's test, a multivariate extension of Fisher's exact probability test, a solution to the multivariate Behrens-Fisher problem, and a multivariate goodness-of-fit test for ordered categorical variables under order restrictions.

Most of the material in this book was developed for lecture notes for teaching my undergraduate classes in nonparametric statistics to laurea levels at the Faculty of Statistical Sciences and for some doctoral courses in Statistics, both at the University of Padova. The same material was also used for one course in nonparametric methods for the Eighth International Summer School on Probability and Mathematical Statistics, organised by the Bulgarian Academy of Sciences and held in Varna on the Black Sea in June 1994, and for a Summer School in Statistical Methodology organized by the Italian Statistical Society (SIS) in September 1995 and 1996. Moreover, several related topics were the subject-matter of a series of seminars and conferences presented to many university departments, international research institutions and international meetings.

A substantial proportion of the material is original, and has been prepared specifically for this book;part is taken from published articles. Some parts, especially relating to simulations on the unconditional power behaviour of many tests, derive from dissertations by a group of my students at the Faculty of Statistical Sciences. The introductory part, concernining basic theory and univariate problems, presented in Chapters 2 to 5 is a brief review of the existing literature, with little by way of personal contribution, in which the main guidelines are based on conditionality, similarity and exchangeability principles.

With respect to the original collection of lecture notes, this new version contains several improvements, and several new problems and related solutions are also discussed. Some of these appear here for the first time, in the sense that they have not been previously published in journals or presented at international meetings. This is partially due to their novelty and partially to the fact that the referees of many journals are relatively more cautious with papers which are substantially innovative and thus the publishing process may become considerably longer than usual.

About the Contents
Chapter 1 contains an introduction to general aspects and principles concerning the permutation approach. The main emphasis is on principles of conditionality, similarity and exchangeability, relationships between conditional and unconditional inferences, why and when conditioning may be necessary, why permutations result from both conditioning with respect to the data set and exchangeability of data in the null hypothesis, etc...
Chapter 2, through a heuristic discussion of a simple example on a problem with paired data, introduces the concept of permutation testing, including discussions on conditional Monte Carlo methods for estimating the distribution of a test statistic and on confidence intervals for the so-called "treatment effect".
Chapter 3 formally presents of the theory of permutation tests for one-sample problems, and includes a formal proof of conditional and unconditional unbiasedness of permutation test for testing symmetry, a formal definition and derivation of conditional and unconditional power functions, and a brief discussion on optimal permutation tests.
Chapter 4 contains a review of the most common multi-sample problems with heuristic solutions.
Chapter 5 is devoted to the formal theory of permutation testing for multi-sample problems, and includes sections devoted to the main asymptotic properties of permutation tests and to permutation Central Limit Theorems.
Chapter 6 concerns nonparametric combination methodology. The presentation includes a discussion on assumptions, properties, sufficient conditions for a complete theory of nonparametric combination of dependent tests, and practical suggestions for making a reasonable selection of the combining function to be used when dealing with practical problems. Also included are discussions on four examples: the first illustrates that the methods do really take care nonparametrically of underlying dependence relations on partial tests; the second concerns a comparison of quadratic parametric and nonparametric testing solutions; the third applies the nonparametric combination to problems of multi-aspect testing; and the fourth deals with exact solutions of multivariate problems under order restrictions. Some asymptotic results are also discussed.
Chapter 7 examines several application problems solved through nonparametric combination methodology. These include problems on: multivariate paired observations; MANOVA with continuous and/or categorical variables; goodness-of-fit for ordered categorical variables; isotonic inference for categorical variables; and multivariate homoscedastic repeated measurements. There are also remarks on restricted alternatives.
Chapter 8 is entirely devoted to permutation analysis of factorial designs, and includes exact solutions for main effects and interactions in balanced and unbalanced replicated factorial designs, fractional designs, and unreplicated designs.
Chapter 9 concerns testing multivariate problems in which some of the data may be missing either completely at random or not, provided that the exchangeability property is satisfied in the null hypothesis.
Chapter 10 is devoted to a discussion of the multivariate permutation Behrens-Fisher problem. Here, together with an approximate solution to the general problem when underlying distributions are assumed to be symmetric, an exact solution to a restricted version of the problem, when the exchangeability property is satisfied in the null hypothesis, is discussed.
Chapter 11 contains a discussion on problems with repeated measurements, including cases in which the number of observations within each unit is larger than the number of units, multivariate cases, missing values, and balanced and unbalanced factorial designs.
Chapter 12 discusses further application problems in the area of biostatistics.
Throughout the book, consequences of the main arguments, informal definitions, of certain, important elements, and important concepts and relevant aspects of analysis are presented as remarks. Such remarks are numbered by subsection and, when it is necessary to make reference to one of them, it is cited by number and subsection; for example, remark 3,3.4.2, refers to the third remark of subsection 2 of Section 4 in Chapter 3.
Examples, theorems, lemmas, propositions, figures, tables, etc. are numbered by chapter and are quoted accordingly. As a rule, formulae are not numbered.

Generally, Theorems from the literature are reported without proof, whereas the most important properties of permutation tests, regarding their conditional and unconditional exactness, unbiasedness, consistency, power function, etc., are explicitly established and proved. Simple proofs of more specific properties are sometimes proposed to the reader as exercises. Several exercises and problems are proposed at the end of many Sections or Chapters. A list of references may be found at the end of the volume.

The computer intensive calculation required by nonparametric combination methods are possible with the aid of the proper software. Some of this software is available (free of charge) on the internet at http://www.stat.unipd.it/~pesarin and http://www.wiley.com. At both these sites you will find a demonstration copy of NPC Test 2.0 (Non Parametrics Combinations of Dependent Permutation Tests) with which although subject to some limitation, it is possible to perform some of the methods discussed in the book, for example, two-sample analysis with continuos and binary variables, repeated measures, use of several combination function and stratified analysis.Also avaliable are an SAS macro called NPC, an S-PLUS code and the data sets of all examples discussed in the book.
The full version of NPC Test 2.0 is available from Methodologica (the company producing NPC Test). This version of the software can also be used for: multi-aspect testing; analyses with missing values, including non ignorable situations; multi-sample problems with continuous, categorical, ordered categorical, and mixed variables; several combining functions; repeated measures, etc. (See Appendix 12.8). In order to obtain estimates of permutation distributions at any desired accuracy, a special add-on procedure which enables practically unlimited conditional Monte Carlo iterations will be available in future releases. For further information on NPC Test 2.0 and all available updates send an email to info@methodologica.it or visit www.methodologica.it.

We also mention that Cytel Software Corporation (Cambridge Massachussetts) is planning to integrate NPC Test into future versions of StatXact, the software which provides exact permutational inference for nonparametric and categorical data. For available information on StatXact plus NPC Test visit www.cytel.com.

Background and Readership
In writing this book, we have assumed that readers are well acquainted with the very basic concepts of statistical theory, especially with regard to: elementary probability theory; estimation methods based on sampling from finite populations; testing of hypotheses; elementary nonparametric methods; the concept and use of sufficiency; the elementary theory, meaning and interpretation of conditional inference; Monte Carlo simulation techniques, etc.

In order to justify the methods and to give reasonable interpretations of results, most of the problems and related methods are introduced by means of examples and heuristic arguments, especially when intuition is sufficient to avoid ambiguities. However, although a number of tests are explained and rationally justified also in terms of conditioning with respect to a set of sufficient statistics in the null hypothesis, only a limited number of them are formally constructed by direct derivation from the conditionality principle. Most complex problems, which are generally introduced by means of heuristic reasoning, are also analysed using formal arguments, especially if the proposed solutions are at least partially innovative. This is in order to provide a rational background for the most important aspects of analyses, methods and solutions, and to improve precision, so that possible misunderstandings can hopefully be reduced to the minimum. However, we do use formal arguments and mathematics essentially without exceeding undergraduate level.

Most theoretical arguments are discussed in Chapters 3, 5, 6 and partly also in Chapters 8, 9, and 10.

For better comprehension of the main applications, at least Sections 6.1 to 6.6 should be read before Chapters 7 to 12, which are mostly devoted to discussions of application problems.
Readers with some background in permutation testing may start reading from Chapter 6, although Chapters 2 to 5, together with some important concepts which are generally omitted from most textbooks, contain a preparatory exposition of the main arguments and concepts as they are used later. However, at a first reading, Chapters 3 and 5, which mostly concern theoretical aspects for univariate problems with some mathematics, may be omitted without compromising understanding of the main ideas, concepts, methods and application problems. In any case, as it contains the basic principles and the inferential role of permutation analysis, all readers should begin by reding Chapter 1.
The book is intended for at least three kinds of readers: a) students of intermediate and advanced courses in applied statistics, who are presumed to be mostly interested in applications; b) students of advanced courses in statistics, who are interested in nonparametric theory and methods; c) professional statisticians, researchers and practitioners of many areas of application (not restricted to biostatistics), who are facing complex testing problems which may be broken down into a finite set of simpler sub-problems.
Although the book does not cover all the mathematical aspects related to conditional testing and nonparametric combination, to some extent several parts may also be useful for researchers in mathematical statistics and statistical methodology, especially related to nonparametrics.

Relationship with Other Books
There are a relatively limited number of books which are fully or partially devoted to permutation tests. Best known are those of Edgington (1995), Good (2000), Lunneborg (1999), Manly (1997), Maritz (1995), and Sprent (1998). Moreover, most of the books on nonparametric methods based on ranks contain one chapter or a few sections discussing the permutation approach. General theory is summarized in Lehmann's (1986). In general, the quoted literature focuses much more on univariate problems than on multivariate ones. Thus, as the present book is mostly devoted to multivariate permutation methods, it may be seen as complementary to existing literature. In this respect, the methods and related applications discussed from Chapter 6 onwards are to a great extent new and original. However, as the first five Chapters are devoted to univariate problems, this volume may also be seen as self-contained.
From a different point of view, it may be seen as complementary to books devoted to multiple comparisons and multiple testing methods, such as those of Miller (1981), Hochberg and Tamhane (1987), Westfall and Young (1993), and Hsu (1996). As a matter of fact, the method of nonparametric combination of several dependent tests implies that a complex problem is broken down into a finite set of simpler sub-problems, each admitting a suitable permutation solution, followed by their combination. When partial inferences are also of interest, partial tests should be adjusted for multiplicity, in accordance with multiple testing procedures.
In addition, Sections 4.4, 7.4 and 7.5, dedicated to permutation analysis of ordered categorical variables, discuss problems and related solutions for multivariate stochastic dominance which are complementary to the existing literature on these subjects (see, for instance, Cressie and Read, 1988, and Agresti, 1990).

Acknowledgments In presenting this new version, 18 years after his death, I would first like to acknowledge my master, colleague and friend Professor Odoardo Cucconi, whose intellectual stimuli introduced me to the fascinating world of nonparametrics and to several of the open testing problems which gave rise to part of the research related to this book. Moreover, I would like to remember his teaching, especially his frequent recommendations not to hesitate, when dealing with a difficult problem, to abandon already tested methods of analysis, if they are found to be unproductive or unsuitable, and to proceed if necessary by revising the very basic ideas and principles, and eventually to try other approaches, even partially or fully innovative. This aspect of his teaching, although not easy to put into practice, was followed in dealing with many of the problems discussed herein.
I would also like to express my thanks to the many colleagues who read parts of the written material or attended my seminars and gave me constructive criticism, useful suggestions, new methodological problems to solve, application problems and related data sets to analyse. They include: Michel Broniatowski, Giorgio Celant, Daniele De Martini, Augusto Di Castelnuovo, Antony Dusoir, Hammou El Barmi, Giovanni Fava, Phillip Good, Tim Hesterberg, Chihiro Hirotsu, Oliviero Lessi, Tom Loughin, Cliff Lunneborg, Hossein Mansouri, Fortunato Munaņ, Patrick Onghena, Andrea Pallini, Italo Pegoraro, Magnus Peterson, Vanamamalay Seshadri, Jordan Stoyanov, Takakazu Sugiyama, Peter Westfall, and others.

I must not forget to thank my students: Rosa Arboretti, Federica Barzi, Piera Belluardo, Rosita Bertacche, Elisa Bosco, Manuela Cazzaro, Giovanni Cossini, Francesco Dalla Valle, Fabio Di Nuzzo, Livio Finos, Patrizia Furlan, Lorenzo Gaeta, Michele Gaffo, Anna Giraldo, Francesca Grum, Alessandro Lago, Dario Mazzaro, Francesca Parpinel, Caterina Pasqualetto, Mauro Zucchetto, and many others, for helping to deal with several application problems, making power simulations and providing computing routines. Moreover, I wish to thank Patrizia Piacentini for helping in the typing adjustment of parts of the book, and Gabriel Walton and Malcolm Peebles for revising the English text.

I especially wish to express my thanks to Luigi Salmaso for his continuous collaboration and help in: suggesting new aspects to be analysed, finding new problems, providing proof of some Theorems, writing two important Chapters (the eight and twelfth), revising most of the book, finding references, providing most computiong routines and power simulations, consulting for the software package NPC Test, and designing SAS macro "NPC" and S-Plus code. (Visit www.stat.unipd.it/~salmaso).
I wish to thank Robert Calver, Sharon Clutton, Sarah Corney and Sian Jones of John Wiley Sons in Chichester for their valuable publishing suggestions.

In addition, I would like to acknowledge the University of Padova and the Italian Ministry for University and Scientific and Technological Research (MURST) for providing the financial support for the necessary research and developing part of the software.

The responsibility for any mistakes and for the ideas expressed herein is mine alone.

Fortunato Pesarin
Padova, March 2001




Department of Statistical Sciences, University of Padua pesarin@stat.unipd.it
www.stat.unipd.it/~pesarin

Department of Statistical Sciences

University of Padova

To have further informations please contact: webmaster@stat.unipd.it