The paper uses empirical process techniques to study the asymptotics of the least-squares estimator (LSE) for the fitting of a nonlinear regression function. By combining and extending ideas of Wu and Van de Geer, it establishes new consistency and central limit theorems that hold under only second moment assumptions on the errors. An application to a delicate example of Wu's illustrates the use of the new theorems, leading to a normal approximation to the LSE with unusual logarithmic rescalings.

In the definition of a two-person zero-sum game given by Von Neumann and Morgenstern it is assumed that both players know the rules of the game (e.g., the game tree, the information sets as well as the distributions of the ensuing payoffs for given strategy choices, etc.). We use the term pseudo-game to denote the case where at least one player does not have complete information. In this paper we restrict our attention to those pseudo-games in which player I, say, is only aware of his set of pure strategy choices (assumed to contain m elements: ) and not of player II's strategy choices (assumed to have uniformly bounded second moments). Player II is assumed to have complete information. More precisely, we shall study pseudo-games G that have the format given below: Let A = {a , ⋯, a } denote the pure strategy choices of player I. Denote by A the set of probability distributions p over A (player I's mixed strategy choices). We sometimes write p in the form (p(1), ⋯, p(m)), ∑ p(j) = 1, and p(j) ≥ 0, with the interpretation that when player I uses p he will play a with probability p(j). Any element of A that assigns mass 1 to some a ε A will be simply denoted by a. Let B denote the set (not necessarily finite) of pure strategies for player II. Let B be a fixed σ-field of subsets of B and denote by B the set of all probability distributions q over B (player II's mixed strategies). We assume that B contains all single point sets of B, so that B contains all finite probability distributions over B. We postulate that we are given for each pair (a, b) in the product space A × B a distribution P on the real line which represents the distribution of the loss incurred by player I (or gain by player II) if a ε A is the strategy choice of I and b ε B is the strategy choice of II. Contrary to the usual practice, the payoff for given pure strategy choices is thus allowed to be random. We do this in order that our main results may be proved in greater generality. An example of a pseudo-game with random payoffs is given in Section 2. The distributions P are assumed to have uniformly bounded second moments. For each a ε A, and fixed Borel set C, P (C) is assumed to be B-measurable. For each pair (a, b) ε A × B, let X be a random variable having P as its distribution. Suppose that players I and II are using strategies p and q, respectively. They can determine the payoff of the pseudo-game by first selecting an a ε A and a b ε B according to the distributions p and q, respectively, and then treating an observed value of X as the payoff. For every pair of strategies (p, q) that players I and II may use we define the expected value of the payoff R(p, q) by means of the equation: \begin{equation*}\tag{1.1} R(p, q) = \sum^m_{j=1} p(j) \int \lbrack\int x dP_{(a_j,b)}(x)\rbrack dq(b).\end{equation*} We are assuming that player I is only aware of the set A, while player II has complete information. However, by assuming instead that player I is also aware of the set B as well as the distributions P , (a, b) ε A × B, we can associate with every such pseudo-game G a game with complete information G'. Such concepts as "value" and "minimax strategy" do not carry over to pseudo-games. However by the minimax theorem, since A is assumed finite, every such game G' will have a value v and player I will have a minimax strategy p': \begin{equation*}\tag{1.2)} v_G = \sup_{q\varepsilon B^\ast} R(p', q) = \inf_{p\varepsilon A^\ast} \sup_{q\varepsilon B^\ast} R(p, q) = \sup_{q\varepsilon B^\ast} \inf_{p\varepsilon A^\ast} R(p, q).\end{equation*} Suppose now that players I and II are playing a sequence of identical pseudo-games of the type we have been describing; i.e., they play one game, observe their losses and play the same game again (with possibly different strategy choices), continuing in this manner ad-infinitum. We shall refer to the individual games that make up the sequence as the subgames of the sequence. When playing such a sequence of pseudo-games a strategy for player I would be a rule P that would tell him for every j, as a function of his past plays (mixed strategy choices) and losses what mixed strategy to play during the jth subgame; a strategy for player II would be a rule Q that would tell him for every j, as a function of his own, and his opponent's past plays and losses, what mixed strategy q to play during the jth subgame of the sequence. We are thus allowing player II to know what plays player I has made, but we are not granting I the same favor. Among the rules P available to player I we define a special class of rules to be called rules constant on intervals. If x is any real number let [ x] denote the largest integer that is less than or equal to x. For every , let Π(α) = (I (α), I (α), ⋯, I (α), ⋯) denote the partition on the set I of positive integers defined by the equations: \begin{equation*}\tag{1.3} I_n(\alpha) = \{(\sum^{n-1}_{k=1} \lbrack k^\alpha\rbrack) + 1, (\sum^{n-1}_{k=1}\lbrack k^\alpha\rbrack) + 2, \cdots, \sum^n_{k=1} \lbrack k^\alpha\rbrack\}; n = 1, 2, 3,\cdots.\end{equation*} For example, I (2) = {1}, I (2) = {2, 3, 4, 5}, I (2) = {6, 7, ⋯, 14}; etc. We shall refer to I (α) as the nth interval of the partition Π (α). Note that the cardinality of I (α) is [ n ]. Let us suppose that player I is using some rule P that assigns, with probability 1, the same mixed strategy to the ith subgame as it does to the jth subgame whenever i and j belong to the same interval I (α), n = 1, 2, 3, ⋯. In this case we say that P is constant on intervals. Thus if we say that player I is to play a certain strategy p during the nth interval of a partition Π(α), we mean that he is to play p during every subgame whose index belongs to I (α). The particular strategy that player I uses in the nth interval (a random variable depending on plays and losses occurring prior to the nth interval) will be denoted by p . For j = 1, 2, 3, ⋯, N, ⋯ let X represent the loss incurred by player I during the jth subgame. Note that the sequence {X } is a discrete stochastic process whose index set is the set I of positive integers and whose law of evolution is determined by the distributions P and by the rules P and Q that the players use. The first objective of this paper is to prove: THEOREM. Suppose players I and II are playing a sequence of identical pseudo-games G satisfying (i) and (ii): (i) Player I has m ≥ 2 pure strategy choices. (ii) The distributions P have uniformly bounded second moments and for each a ε A and every Borel set C, P (C) is B-measurable. Then there exists a class of rules {P} for player I such that for all rules Q that player II may use we have: We will show, that is, that the player with incomplete information can do as well asymptotically as he could if he had complete information. The members of {P} will all be constant on intervals. Our second objective will be to seek a strong convergence rate for N ∑ X . In the course of achieving this goal we will show that a good partition is obtained by setting α equal to (m + 2)/m.

In an important paper, Strassen discussed the existence of measures on product spaces with given marginals in the context of Polish spaces. We generalize his results to arbitrary Hausdorff spaces and indicate some applications such as measures with given support and stochastic inequalities on partially ordered Hausdorff spaces. In the second part of our paper, we state two results on the general moment problem, thus generalizing earlier theorems due to Kemperman.

Let $S$ be the number of successes in $n$ independent Bernoulli trials, where $p_j$ is the probability of success on the $j$th trial. Let $\mathbf{p} = (p_1, p_2, \cdots, p_n)$, and for any integer $c, 0 \leqq c \leqq n$, let $H(c \mid \mathbf{p}) = P\{S \leqq c\}$. Let $\mathbf{p}^{(1)}$ be one possible choice of $\mathbf{p}$ for which $E(S) = \lambda$. For any $n \times n$ doubly stochastic matrix $\Pi$, let $\mathbf{p}^{(2)} = \mathbf{p}^{(1)}\Pi$. Then in the present paper it is shown that $H(c \mid \mathbf{p}^{(1)}) \leqq H(c \mid \mathbf{p}^{(2)})$ for $0 \leqq c \leqq \lbrack\lambda - 2\rbrack$, and $H(c \mid \mathbf{p}^{(1)}) \geqq H(c \mid \mathbf{p}^{(2)})$ for $\lbrack\lambda + 2\rbrack \leqq c \leqq n$. These results provide a refinement of inequalities for $H(c \mid \mathbf{p})$ obtained by Hoeffding [3]. Their derivation is achieved by applying consequences of the partial ordering of majorization.

The present investigation is a follow up of [7] to a class of multiple regression problems, and is devoted to the construction of an estimate of regression parameter vector based on suitable rank statistics. Asymptotic linearity of these rank statistics in the multiple regression set up is established and the asymptotic multi-normality of the derived estimates is deduced. There exists the choice of the score-generating function to every basic distribution so that the asymptotic distribution of the estimates is the same as that of maximal-likelihood estimates.

Consider a GI/G/1 queue in which W is the waiting time of the nth customer, W(t) is the virtual waiting time at time t, and Q(t) is the number of customers in the system at time t. We let the extreme values of these processes be , and . The asymptotic behavior of the queue is determined by the traffic intensity ρ, the ratio of arrival rate to service rate. When and the service time has an exponential tail, limit theorems are obtained for W and W (t); they grow like log n or log t. When ρ ≥ 1, limit theorems are obtained for W , W (t), and Q (t); they grow like n or t if ρ = 1 and like n or t when . For the case , it is necessary to obtain the tail behavior of the maximum of a random walk with negative drift before it first enters the set (-∞, 0].

This paper deals with the problem of estimating the number of trials of a multinomial distribution, from an incomplete observation of the cell totals, under constraints on the cell probabilities. More specifically let (n , ⋯, n ) be distributed according to the multinomial law M(N; p , ⋯, p ) where N is the number of trials and the p 's are the cell probabilities, ∑ p being equal to 1. Suppose that only a proper subset of (n , ⋯, n ) is observable, that N, p , ⋯, p are unknown and that N is to be estimated. Without loss of generality, (n , ⋯, n ), l ≤ k may be taken to be the observable random vector. For fixed N, (n , ⋯, n , N - n) has the multinomial distribution M(N; p , ⋯, p ) where n denotes ∑ n and p denotes 1 - ∑ p . If the parameter space is such that N can take any nonnegative integral value and each p can take any value between 0 and 1, such that then, clearly, the only inference one can make about N is that . In specific situations, it might, however, be possible to postulate constraints of the type \begin{equation*}\tag{1.1} p_i = f_i(\theta),\quad i = 1, \cdots, l\end{equation*} where θ = (θ , ⋯, θ ) is a vector of r independent parameters and f are known functions. This may lead to estimability of N. The problem of estimating N in such a situation is studied here. The present investigation is motivated by the following problem. Experiments in particle physics often involve visual scanning of film containing photographs of particles (occurring, for instance, inside a bubble chamber). The scanning is done with a view to counting the number N of particles of a predetermined type (these particles will be referred to as events). But owing to poor visibility caused by such characteristics as low momentum, the distribution and configuration of nearby track patterns, etc., some events are likely to be missed during the scanning process. The question, then, is: How does one get an estimate of N? The usual procedure of estimating N is as follows. Film containing the N (unknown) events is scanned separately by w scanners (ordered in some specific way) using the same instructions. For each event E let a w-vector Z(E) be defined, such that the jth component Z of Z(E) is 1 if E is detected by the jth scanner and is 0 otherwise. Let J be the set of 2 w-vectors of 1's and 0's and let I by the vector of 0's. Let x be the number of events E whose Z(E) = I. For I ∈ J - {I }, the x 's are observed. A probability model is assumed for the results of the scanning process. That is, it is assumed that there is a probability p that Z(E) assumes the value I and that these p 's are constrained by equations of the type (1.1) (These constraints vary according to the assumptions made about the scanners and events, thus giving rise to different models. An example of p (θ) would be E(ν (1 - ν) ) where I is the jth component of I and expectation is taken with respect to the two-parameter beta density for v. This is the result of assuming that all scanners are equally efficient in detecting events, that the probability v that an event is seen by any scanner is a random variable and that the results of the different scans are locally independent. For a discussion of various models, see Sanathanan (1969), Chapter III. N is then estimated using the observed x 's and the constraints on the P 's, provided certain conditions (e.g., the minimum number of scans required) are met. The following formulation of the problem of estimating N, however, leads to some systematic study including a development of the relevant asymptotic distribution theory for the estimators. The Z(E)'s may be regarded as realizations of N independent identically distributed random variables whose common distribution is discrete with probabilities p at I (In particle counting problems, it is usually true that the particles of interest are sparsely distributed throughout the film on account of their Poisson distribution with low intensity. Thus in spite of the factors affecting their visibility outlined earlier, the events can be assumed to be independent.). The joint distribution of the x 's is, then, multinomial M(N; p , I ∈ J). The problem of estimating N is now in the form stated at the beginning of this section. Since the estimate depends on the constraints provided for the p 's, it is important to test the "fit" on the model selected. The conditional distribution of the x 's (I ≠ I ) given x is multinomial M(x; p /p(I ≠ I )) where x is defined as ∑ x and p as ∑ P . The corresponding χ goodness of fit test may therefore be used to test the adequacy of a model in question. Various estimators of N are considered in this paper and among them is, of course, the maximum likelihood estimator of N. Asymptotic theory for maximum likelihood estimation of the parameters of a multinomial distribution has been developed before for the case where N is known but not for the case where N is unknown. Asymptotic theory related to the latter case is developed is Section 4. The result on the asymptotic joint distribution of the relevant maximum likelihood estimators is stated in Theorem 2. A second method of estimation considered is that of maximizing the likelihood based on the conditional probability of observing (n ,⋯, n ), given n. This method is called the conditional maximum likelihood (C.M.L.) method. The C.M.L. estimator of N is shown (Theorem 2) to be asymptotically equivalent to the maximum likelihood estimator. Section 5 contains an extension of these results to the situation involving several multinomial distributions. This situation arises in the particle scanning context when the detected events are classified into groups based on some factor like momentum which is related to visibility of an event, and a separate scanning record is available for each group. A third method of estimation considered is that of equating certain linear combinations of the cell totals (presumably chosen on the basis of some criterion) to their respective expected values. Asymptotic theory for this method is given in Section 6. This discussion is motivated by a particular case which is applicable to some models in the particle scanning problem, using a criterion based on the method of moments for the unobservable random variable, given by the number of scanners detecting an event (Discussion of the particular case can be found in Sanathanan (1969) Chapter III.). In the next section we give some definitions and a preliminary lemma.

A U-statistic J is proposed for testing the hypothesis H that a new item has stochastically the same life length as a used item of any age (i.e., the life distribution F is exponential), against the alternative hypothesis H that a new item has stochastically greater life length (F̄(x)F̄(y) ≥ F̄(x + y), for all x ≥ 0, y ≥ 0, where F̄ = 1 - F). J is unbiased; in fact, under a partial ordering of H distributions, J is ordered stochastically in the same way. Consistency against H alternatives is shown, and asymptotic relative efficiencies are computed. Small sample null tail probabilities are derived, and critical values are tabulated to permit application of the test.