Human Spermatogenic Failure Purges Deleterious Mutation Load from the Autosomes and Both Sex Chromosomes, including the Gene

Download PDF České info

Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.10 [1.04–1.16], p<2×10⁻³), rare X-linked CNVs by 29%, (OR 1.29 [1.11–1.50], p<1×10⁻³), and rare Y-linked duplications by 88% (OR 1.88 [1.13–3.13], p<0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.2×10⁻⁵). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.

Published in the journal: . PLoS Genet 9(3): e32767. doi:10.1371/journal.pgen.1003349
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1003349

Summary

Introduction

Male infertility is a multifaceted disorder affecting nearly 5% of men of reproductive age. In spite of its prevalence and a considerable research effort over the past several decades, the underlying cause of male infertility is uncharacterized in up to half of all cases [1]. Some degree of spermatogenic impairment is present for most male infertility patients, and, in its most severe form, manifests as azoospermia, the lack of detectable spermatozoa in semen, or oligozoospermia, defined by the World Health Organization as less than 15 million sperm/mL of semen. Spermatogenesis is a complex multistep process that requires germ cells to (a) maintain a stable progenitor population through frequent mitotic divisions, (b) reduce ploidy of the spermatogonial progenitors from diploid to haploid through meiotic divisions, and (c) assume highly specialized sperm morphology and function through spermiogenesis. These steps involve the expression of thousands of genes and carefully orchestrated interactions between germ cells and somatic cells within the seminiferous tubules [2]. It is likely that a large proportion of idiopathic cases of spermatogenic failure are of uncharacterized genetic origin, but measuring the heritability of infertility phenotypes has been challenging.

Known genetic causes of non-obstructive azoospermia (NOA) include deletions in the azoospermia factor (AZF) regions of the Y chromosome [3], Klinefelter's syndrome [4], and other cytogenetically visible chromosome aneuploidies and translocations [5]. Beyond these well-established causes, which are observed in 25–30% of cases, the genetic architecture of spermatogenic impairment is currently unknown. One might expect a priori that rare or de novo, large effect mutations will be the central players in genetic infertility, and indeed other primary infertility phenotypes like disorders of gonadal development, isolated gonadotropin-releasing hormone deficiency, and globozoospermia, a disorder of sperm morphology and function, appear to be caused by essentially Mendelian mutations operating in a monogenic or oligogenic fashion [6], [7], [8]. Similarly, recurrent mutations of the AZF region on the Y chromosome are either completely penetrant (AZFa, AZFb/c) or highly penetrant (AZFc) risk factors for azoospermia. Our working model at the start of this study was that additional “AZF-like” loci existed in the genome, either on the Y chromosome or elsewhere, and that, much like recent progress in the analysis of developmental disorders of childhood, a large number of causal point mutations and submicroscopic deletions could be revealed in idiopathic cases by the appropriate use of genomic technology.

In this paper, we employ oligonucleotide SNP arrays as discovery technology to conduct a whole-genome screen for two rare genetic features in men with spermatogenic failure. First, we extract and analyze the probe intensity data to find rare copy number variants (CNVs). A growing number of CNVs have been associated with a host of complex disease states [9] including neurological disorders [10], [11], [12], [13], several autoimmune diseases [14], [15], type 2 diabetes [16], cardiovascular disease [17], and cancer [18], [19], [20], [21]. Now, a role for CNVs in male infertility is beginning to emerge [22], [23], [24], [25].

As a second approach to identify rare genetic variants, we use a population genetics modeling framework to identify large homozygous-by-descent (HBD) chromosome segments that may harbor recessive disease alleles. When applied to consanguineous families, so-called “HBD-mapping” has been an unequivocal success in identifying the location of causal variants for simple recessive monogenic diseases [26]. HBD analysis can also be used to screen for the location of rare variants in common disease case-control studies of unrelated individuals, using either a single-locus association testing framework or by testing for an autozygosity burden, frequently referred to as “inbreeding depression”: an enrichment of size or predicted functional impact of HBD regions aggregated across the genome. This approach has produced results for a growing list of common diseases, including schizophrenia [27], Alzheimer's disease [28], breast and prostate cancer [29].

In this study, we screened three cohorts of men with idiopathic spermatogenic failure in an attempt to identify rare, potentially causal mutations, and to better understand the genetic architecture of the disease (Table 1). We found a genomewide enrichment of large, rare CNVs in men with spermatogenic failure compared to normozoospermic or unphenotyped men (controls). We also identify a number of cases with unusual patterns of homozygosity, possibly the result of recent consanguineous matings. Our results show that spermatogenic output is a phenotype of the entire genome, not just the Y chromosome, place spermatogenic failure firmly among the list of diseases that feature a genomewide burden of rare deleterious mutations and provide a powerful organizing principle for understanding male infertility.

**Tab. 1. Case and control cohorts used in the study.**

Results

First, we attempted to find evidence for undiscovered dominant causes of spermatogenic failure by studying the genomewide distribution of CNVs in our primary cohort from Utah: 35 men with idiopathic non-obstructive azoospermia, 48 men with severe oligozoospermia, and 62 controls with normal semen analysis. All cases had previously tested negative when screened for canonical Y chromosome deletions. Samples were assayed with an Illumina 370K oligonucleotide array that provides both SNP and CNV content. There was no detectable difference in the average number of CNVs called per sample among the three groups (mean = 20, azoospermic; 19.5, oligozospermic; 20, normozoospermic), however, the majority of variants (61% on average) in any one sample were common polymorphisms.

Rare CNV burden is a feature of spermatogenic failure

When restricting our analysis to CNVs with a call frequency of less than 5%, a subset likely to be enriched for pathogenic events, we observed pronounced differences among groups (Table S1). Azoospermic and oligozoospermic men have nearly twice the amount of deleted sequence genomewide when compared to controls (p = 1.7×10⁻⁴, Wilcoxon rank sum test), and a nonsignificant 12% increase in the number of deletions per genome. When examining the even more restricted set of rare CNVs larger than 100 kb (Dataset S1), these associations are more pronounced: the rate of deletions in cases was twice that of controls (1.12 vs. 0.55, p = 9.7×10⁻⁴) and the amount of deleted sequence 2.6 times greater in cases (p = 8.8×10⁻⁴).

In order to replicate these initial findings, we assayed two additional cohorts –⁠ one group of 61 Caucasian men with severe spermatogenic impairment and 100 ethnicity-matched, unphenotyped controls, both collected at Washington University in St. Louis (WUSTL), and a larger case cohort of 179 Caucasian men with idiopathic azoospermia, primarily from medical practices in Porto, Portugal, matched to an unphenotyped control set of 974 Caucasian men collected by the UK National Blood Service (NBS, [30]). Although using different array platforms (Text S1), we observed replication of our initial association (Table S2 and Table S3); in the WUSTL cohort a 20% increase in the rate (p<0.05) and in the Porto cohort a 31% increase in rate (p<5×10⁻³). We excluded several artifactual explanations for this burden effect, including specific batch phenomena or population structure (Text S1, Figures S1, S2, S3, S4, S5). To better characterize these genomewide signals, we set out to search for clustering of pathogenic mutations on specific chromosomes.

We focused first on the Y chromosome as it is the location of most known mutations modulating human spermatogenesis (Figure 1, Figure S6). Y-linked microdeletions of the AZFa, AZFb, and AZFc regions are well-established causes of spermatogenic impairment, and thus we excluded from this study cases with AZF microdeletions visible by STS PCR. In the array data, we found no significant difference in the frequency of rare Y deletions between case and controls groups; however rare duplications were more abundant in Porto cases compared to the NBS controls (a 3-fold enrichment in Porto cohort, p = 1.9×10⁻³). We could classify the majority (>90%) of our samples to major Y haplogroups using SNP genotypes (Text S1), and, as expected, most of these samples fall into the two most common European haplogroups: I (22%) and R (70%). The observed duplication burden was not an artifact of differences in major Y haplogroup frequency between cases and controls, as association was essentially unchanged when only considering samples with haplogroup R1 (p = 3.3×10⁻³). Due to low probe coverage, only one Y-linked duplication was called in the Utah cohorts (in a control individual) and two in the WUSTL cohort (both in cases), so this burden of Y duplications was not replicated.

**Fig. 1. Rare variant burden in cases of spermatogenic impairment.**

Next we turned to the X chromosome, which is highly enriched for genes transcribed in spermatogonia [31]. In the Utah cohorts there were 71 gains and losses with a frequency of less than 5% on the X chromosome, cumulatively producing three times as much aneuploid sequence in azoospermic and oligozoospermic men compared with normozoospermic men (89 kb/person azoo, 45 kb/person oligo, 27 kb/person normozoospermic men, all cases versus controls p<0.03). This burden was strongly replicated in the Porto samples, which displayed a 1.6 fold enrichment of rare CNV on the X (p = 5×10⁻⁴) and the WUSTL samples (31% of cases with a rare X-linked CNV versus 16% of controls, p = 0.02 by permutation).

The genome-wide signal of CNV burden was not driven solely by sex chromosome events: considering only autosomal mutations in Utah samples there was an enrichment of aneuploid sequence in large deletions in azoospermic men (268 kb/person) and oligozoospermic men (308 kb/person) compared to control men (189 kb/person, p = 9.8×10⁻³), and an enrichment in the rate of deletions in all cases when considering just events >100 kb (1.9 fold enrichment, p = 6×10⁻³). In the Porto cohort, there was modest evidence for a higher rate of rare deletions of all sizes in azoospermic men (1.27 fold enrichment, not significant) as well as an increase in total amount of deleted sequence (345 kb/case vs. 258 kb/control, p<0.003).

In order to cleanly summarize our findings across all cohorts, we fit logistic regression models for each cohort, regressing case status onto CNV count for different classes of CNV. We also fit a linear mixed-effects logistic regression model to the total dataset for each CNV class, treating cohort as a random factor (Figure 1). In each regression model we controlled for population structure by including eigenvectors from a genomewide principal components analysis (Methods). On the basis of the combined analysis, we estimate that each rare autosomal deletion multiplicatively changes the odds of spermatogenic impairment by 10% (OR 1.10 [1.04–1.16], p<2×10⁻³), each rare X-linked CNV (gain or loss) by 29%, (OR 1.29 [1.11–1.50], p<1×10⁻³) and each rare Y-linked duplication by 88% (OR 1.88 [1.13–3.13], p<0.03).

Locus-specific analyses

Deletions of the AZF regions of the Y chromosome are often mediated by non-allelic homologous recombination (NAHR) between segmental duplications and are the most common known cause of spermatogenic failure. Because of their prognostic power and high rate of recurrence in the population, screening for AZF deletions is a standard part of the clinical workup for azoospermia. It would be of high clinical value if additional azoospermia susceptibility loci with significant recurrence rates could be identified.

We screened all cohorts for large (>100 kb) rearrangements flanked by homologous segmental duplications capable of generating recurrent events by NAHR [32]. There was no significant enrichment of gains or losses in cases across these hotspot regions when considered as an aggregate. Due to small sample sizes we found no single-locus associations, at these hotspot loci, or elsewhere, that met the strict criteria of genomewide significance in both the discovery and replication cohorts. Many of our single-cohort associations from one platform lack adequate probe coverage on other platforms for robust replication (Text S1). However, several loci were significant on joint analysis of all cohorts.

The best candidate for a novel locus generating NAHR-mediated infertility risk mutations is a 100 kb segment on chromosome Xp11.23 flanked by two nearly identical (>99.5% homology) 16 kb segmental duplications containing the sperm acrosome gene SPACA5 (Figure 2a, Figure S7). We identified 9 deletions of this locus spread across all patient cohorts (3 in PT, 1 in UT, 5 in WUSTL) compared to 8 in the pooled 1124 controls (2.8% frequency versus 0.7%, odds ratio = 3.96, p = 0.005, Fisher exact test). We genotyped the deletion by +/ −⁠ PCR in an additional cohort of 403 men with idiopathic NOA from Weill Cornell, and observed an additional 3 deletions (Figure S8, Text S1). In a prior case-control study of intellectual disability, investigators using qPCR estimated the allele frequency of this deletion to be 0.47% (10/2121) in a large Caucasian male control cohort [33]. Combining these data, we estimate the allele frequency of the deletion to be 1.6% in Caucasian cases, compared to 0.55% in Caucasian controls (OR 3.0, 95% CI 1.31–6.62, p = 0.007). The deleted region contains the X-linked cancer-testis (CT-X) antigen gene SSX6; the CT-X antigen family is a highly duplicated gene family on the X chromosome comprising 10% of all X-linked genes and is expressed specifically in testis. After controlling for differences in coverage across the array platforms used in this study, we find a significant enrichment of rare deletions of CT-X genes in all cases (p = 0.02); this finding did not extend to duplications or CT antigen genes on the autosomes (Table 2).

**Fig. 2. Discovery of recurrent deletions in azoospermia.**

**Tab. 2. X-linked cancer-testis antigens deleted in case and control samples.**

When analyzing all cohorts jointly, our strongest association (genomewide corrected p-value <0.002) is to both gains and losses involving a 200 kb tandem repeat on Yq11.22, DYZ19 (Figure S6, Figure S9), a human-specific array of 125 bp repeats first discovered as a novel band of heterochromatin in the Y chromosome sequencing project [34]. Tandem repeat arrays are often highly unstable sequence elements that can mutate by both replication-based and recombination-based (e.g. NAHR) mechanisms. In our data there were 9 gains and 11 losses at DYZ19 in 323 cases (combined frequency 6.1%), compared to 3 gains and 12 losses in 1136 controls (combined frequency 1.3%). While this finding may ultimately require painstaking technical work to conclusively validate, we have several reasons to believe the association is real. First, we have previously shown that it is possible to identify real copy number changes at VNTR loci using short oligonucleotide arrays [35]; second, copy number changes at this locus were identified by multiple platforms in the current study; third, the association is nominally significant in both the Utah and Porto cohorts; fourth the locus is within the AZFb/c region. The direction of copy number changes does appear to track with haplogroup –⁠ while 12/13 duplications occur on the R1 background, 14/15 deletions for which haplogroup could be determined occur on I or J background. Haplogroup assignments for the carriers of these CNVs were confirmed by standard short tandem repeat analysis (Text S1). The strong association between haplogroup and direction of copy number change is noteworthy; it may indicate that DYZ19 CNVs are merely correlated with other functional changes on these chromosomes, or perhaps the structure of these chromosomes predisposes them to recurrent gains (R1) or losses (I/J).

The gene DMRT1 is widely believed to be the sex-determination factor in avians, analogous to SRY in therians, and may play the same or similar role in all species that are based upon the ZW sex chromosome system [36]. DMRT1 encodes a transcription factor that can activate or repress target genes in Sertoli cells and premeiotic germ cells through sequence-specific binding [37]. In humans, DMRT1 is located on 9p24.3 in a small cluster with the related genes DMRT2 and DMRT3. Large terminal deletions of 9p are a known cause of syndromic XY sex-reversal, and although the role of the DMRT genes in the 9p deletion syndrome phenotype has not yet been defined, mouse experiments have shown that homozygous deletion of DMRT1 causes severe testicular hypoplasia [38], [39], [40].

We found two, perhaps identical, 132 kb deletions spanning DMRT1 in the Utah cohort in men with azoospermia, and a 1.8 Mb terminal duplication of 9p, spanning these genes, was seen in a single normozoospermic control from Utah (Figure 2b). All three of these rearrangements were validated by TaqMan assay (Figure S10, Text S1). Both men were recruited into the study in Salt Lake City, UT between 2002 and 2004. They self-reported their ancestry as Caucasian, and in both cases this assumption was clearly verified by principal components analysis of their genetic data (Figure S2). There was no evidence that the two deletion carriers were closely related upon comparison of their whole-genome SNP genotypes. Testis biopsies were performed on both men; these indicated apparent Sertoli cell only syndrome in the first and spermatocytic arrest in the second. Both men exhibited apparently normal male habitus and virilization with no phenotypic similarities to 9p deletion syndrome.

We obtained Affymetrix 6.0 array data from a previously published genomewide association study of idiopathic NOA in Han Chinese [41] comprised of 979 cases and 1734 controls (Text S1). After processing these samples with our CNV calling pipeline, we observed an additional 3 deletions of DMRT1 exonic sequence in cases (0.3%) and none in controls (Figure 2B, Figure S11). From these combined array data we estimate a frequency of DMRT1 exonic deletion of 0.38% (5/1306) in cases and 0% (0/2858) in controls (OR = Infinity, [2.0-Inf], p = 0.003). We obtained the two largest control SNP array datasets in the Database of Genomic Variants (DGV), representing CNV calls from 4519 samples typed with platforms of equal or higher probe density to the ones used here [42], [43]. None of these samples contained CNV of any sort affecting DMRT1. Finally, we screened an additional set of 233 idiopathic NOA cases from Weill Cornell, and 135 controls with the TaqMan validation assay and identified an additional 3 deletions (2 in cases, 1 in controls, Text S1, Figure S12). As this qPCR assay interrogates intronic sequence, the functional consequences of these 3 deletions are unclear. Our array data have revealed some of the smallest coding deletions of DMRT1 reported to date in humans, and should help to clarify the critical regions of 9p involved in testicular development and function.

Notably, using a bespoke reanalysis of the intensity data, we did not see evidence for CNVs involving the gene PRDM9, a recently characterized zinc finger methyltransferase that appears to control the location of recombination hotspots in a diversity of mammalian species. Heterozygosity of PRDM9 zinc finger copy number has been shown to cause sterility in male hybrids of Mus m. domesticus and Mus m. musculus due to meiotic arrest [44].

Functional impact

The identification of functional or physical annotations enriched in case-associated CNVs can be a powerful step in constructing models to classify pathogenic variants. We searched for significant case-specific aggregation of CNVs in several classes of functional sequence, including 195 genes previously shown to result in spermatogenic defects when mutated in the mouse [45], all protein and non-protein coding genes, and 525 testis genes that are differentially expressed during human spermatogenesis (Text S1). Deletion of X -⁠ or Y-linked exonic sequence conferred the strongest risk (OR = 1.87 [1.30–2.68], p<1×10⁻³). Very similar risk was associated with deletion of exonic sequence from testis genes differentially expressed during spermatogenesis, despite the fact that only 15% of these genes are located on the sex chromosomes (OR = 1.85 [1.01–3.39], p<0.05). Deletion of any exonic sequence was also associated with disease (OR = 1.25 [1.07–1.46], p<5×10⁻³). Deletion of miRNAs was not associated, nor was deletion of the 195 mouse spermatogenic genes [45], which were very rarely deleted in either cases or controls.

We hypothesized that at least some of the functional impact of CNV burden on fertility was a result of disruption of haploinsufficient (HI) genes, as has been demonstrated for neuropsychiatric and developmental disease [46]. For each singleton deletion in our collections we used a recently described modeling framework to calculate the probability that the deletion is pathogenic due to dominant disruption of a haploinsufficient gene [47]. Much to our surprise, HI scores from deletions in infertility cases were much smaller than those from cases of autism and developmental disorders and in fact indistinguishable from controls (mean HI score −1.16 in controls, −1.02 in all spermatogenic impairment cases, p = 0.49 by Wilcoxon rank sum test; Figure 3). Likewise there was no enrichment of large rearrangements within 45 known genomic disorder regions in cases [46]. In contrast to previously described diseases that feature CNV burden, spermatogenic impairment may be more likely to result from large effect recessive mutations, or perhaps the additive effect of deleterious mutations across many loci. We sought to uncover support for recessive mutation load in our cases by assessing the impact of inbreeding, or elevated rates of homozygosity, on disease risk by applying a population genetic approach to the SNP genotype data from our samples [48].

**Fig. 3. Disruption of predicted haploinsufficient genes is infrequent in spermatogenic failure.**

HBD analyses

The major genetic side effect of consanguineous mating is a genome-wide increase in the probability that both paternal and maternal alleles are homozygous-by-descent. This probability is often summarized as the inbreeding coefficient, F, and can be estimated from analysis of pedigree structure or by direct observation of genomewide SNP genotypes.

Due to differences in demographic history and culture, the extent of background homozygosity in the genome is expected to vary when comparing diverse populations throughout the globe. The haplotype modeling algorithms implemented in the software package BEAGLE estimate the background patterns of linkage disequilibrium and homozygosity across a set of samples, allowing population-specific information to be used to assess the evidence that any given section of a genome is likely to be homozygous-by-descent (HBD). During the course of our study we concluded that standard PCA-based approaches to stratification are insufficient to correct for population structure during the analysis of inbreeding, even when using population genetic methods like BEAGLE (Text S1, Figure S13). The problem comes not from spurious identification of HBD, but from spurious association of HBD with disease status when case and controls are sampled from groups with different levels of background relatedness. For instance, in a recent survey of 17 Caucasian cohorts, estimates of the average inbreeding coefficient, F, varied from 0.09% to 0.61%, with UK-based cohorts showing the lowest F and the one Portuguese cohort showing the highest [27]. While PCA-based methods traditionally detect and correct for differences in allele frequencies among groups, we believe that they do not detect differences in inbreeding that can be readily incorporated into a case-control testing framework. In the following section, we use data from 622 healthy adults from Spain, who we believe form a more appropriate control group for the Porto case cohort (Methods, Text S1, Figure S13).

Analyzing each cohort separately, BEAGLE identified 5343 chromosome segments likely to represent HBD regions (HBDRs) across all samples. We excluded low-level admixture as a spurious source of HBD (Figure S3). Only three of these segments were identified as apparent artifacts induced by large heterozygous deletions (287 kb, 817 kb, and 877 kb in size) and were removed before subsequent analyses. As expected, the distribution of HBD across all samples was L-shaped, with the majority of HBDRs shorter than 1 Mb and a few intermediate and very large events observed (Figure 4b). The largest HBDR identified spanned all of chromosome 2 in an azoospermic individual, indicative of uniparental isodisomy of the entire chromosome. Clinical reports of UPD2 are extremely rare –⁠ there are 7 previous reports of UPD2 that have been ascertained through association with an autosomal recessive disorder [49]. In each of these cases a recessive disorder that lead to clinical presentation was identified. There is currently no proof of imprinted genes on chromosome 2 from either mouse or human data. We performed whole exome sequencing on this individual, and using a simple scoring scheme based on functional annotation and population genetic data, identified a homozygous missense mutation of the INHBB gene as the most unusual damaging homozygous lesion in the genome of this individual (Figure 5, Text S1). The biology of the INHBB gene product strongly implicates this mutation as a causal factor but without additional functional or epidemiological evidence such a conclusion is speculative (Figure 6).

**Fig. 4. Patterns of homozygosity in men with low sperm count.**

**Fig. 5. Analysis of exome sequencing data identifies a candidate azoospermia mutation in the case of UPD2.**

Homozygous missense mutation of <i>INHBB</i> identified in the case of UPD2. — **Fig. 6. Homozygous missense mutation of *INHBB* identified in the case of UPD2.**

Setting aside this case of UPD2, we found only modest evidence for an enrichment of homozygosity in men with spermatogenic impairment (Figure 4a, Table 3). Our hypothesis was that, if a large percentage of cases of azoospermia were attributable to large-effect autosomal recessive Mendelian mutations, we would see a corresponding increase in the proportion of cases with large values of F. The average inbreeding coefficient was numerically higher in each case cohort compared to its matched control cohort (Table 3). We used a logistic regression mixed model framework to test for association between autozygosity and disease, while controlling for population structure, fitting models that treated autozygosity as both a categorical variable (e.g. inbreeding coefficient >6.25%, yes or no) and a continuous variable (F, Methods). While the estimated effect of inbreeding on disease risk was positive in every model that we tested, the corresponding odds ratios did not differ significantly from 1 in any version (Table 3). There were fewer than 10 HBD regions shared by 2 or more cases, supporting the model that spermatogenic efficiency has a polygenic basis. We also tested for case-specific aggregation of HBD segments using the same association framework as that used for CNVs. We did not identify any significant patterns. Based on published analyses of small-effect recessive risk mutations in other complex diseases, we believe our current sample size would be underpowered to detect association between very old inbreeding (e.g. due to shared ancestors 15 generations ago). It is possible that large cohorts, consisting of over 10,000 cases, may be needed to accurately estimate the relationship between low-level variation in inbreeding (F values smaller than 0.1) and azoospermia risk, as well as map specific risk alleles [27], [50].

**Tab. 3. Summary of inbreeding coefficient estimates across cohorts, and association testing.**

Discussion

We report here the largest whole genome study to date investigating the role of rare variants in infertility, examining data from 323 cases of male infertility and 1,136 controls. These data demonstrate that rare CNVs are a major risk factor for spermatogenic impairment, and while confirming the central role of the Y chromosome in modulating spermatogenic output, our risk estimates for autosomal and X-linked CNVs indicate that this phenotype is influenced by rare variation across the entire genome. The controls from two of the cohorts were unphenotyped, and given the estimated prevalence of azoospermia (1%), we may have underestimated the risk associated with these large rearrangements.

We observed 5 deletions of DMRT1 coding sequence in cases and none in over 7,000 controls. These deletions ranged in size from 54 kb to over 2 Mb (Table 4). DMRT1 is situated in a region of chromosome 9p that has been identified as a source of syndromic and non-syndromic forms of XY gonadal dysgenesis (GD). The deletions of this region that are associated with syndromic forms of GD are usually 4–10 Mb in size, while isolated GD has been reported for deletions smaller than 1 Mb [40], [51], [52]. Despite frequent involvement of DMRT1 in these putative causal mutations, there is variability in both the phenotypic outcome affiliated with each deletion and the extent of DMRT1 coding sequence contained therein. At least two cases of GD have been linked to deletions near but not overlapping DMRT1 –⁠ one 700 kb mutation 30 kb distal to DMRT1 in a case of complete XY GD that was inherited from an apparently normal mother, and a second 260 kb de novo deletion about 250 kb distal to DMRT1 [39], [40]. Both of these deletions overlapped the genes KANK1 and DOCK8. On the other hand, two smaller deletions, one a 25 kb deletion of DMRT1 exons 1 and 2, and one a 35 kb deletion of exons 3 and 4, have been observed in patients with complete GD and bilateral ovotesticular disorder of sexual development, respectively [51], [52]. Based on the clinical records of patients in our current study, there is no chance that our DMRT1 deletion carriers could represent misdiagnosis of a condition as severe as complete XY GD, which presents with the appearance of female genitalia. Indeed, two of our DMRT1 deletion carriers were subject to testicular biopsies. Our observations here suggest that hemizygous deletion of DMRT1 is a lesion that shows variable expressivity that may depend on the sequence of the undeleted DMRT1 allele, variation in other sequences on chromosome 9p, and the state of other factors in the pathways regulating testicular development and function. Strictly speaking, statements that hemizygous deletions of DMRT1 are “sufficient” to cause GD or spermatogenic failure need to be qualified at this point until we gain a better understanding of the effects of genetic background. For instance, in most studies of DMRT1 deletion, the undeleted DMRT1 allele is rarely sequenced. Is the mode of action dominant or recessive?

<i>DMRT1</i> deletions detected by array in the current study. — **Tab. 4. *DMRT1* deletions detected by array in the current study.**

Deletions of the Y chromosome have long been appreciated as a cause of azoospermia, and we have now shown here that Y-linked duplications are also significant risk factors for spermatogenic failure. The precise definition of the duplication sensitive sequences awaits further investigation. Historically, Y duplications have been much less studied than Y deletions, as +/ −⁠ STS PCR is the standard assay for assessing Y chromosome copy number variation in both the clinical and research setting. Quantitative PCR methods for measuring Y chromosome gene dosage have been described in the literature, and applied almost exclusively to studying the phenotypic effects of duplication of genes in the AZFc region [53]. Results of these investigations are conflicting, with studies of Europeans reporting no association between AZFc partial duplication and spermatogenic impairment [54], while reproducible associations have been reported in east Asian cohorts [55], [56]. Notably, we identified some duplications on the Y chromosome greater than 2.5 Mb in size, all spanning the AZFc locus (Figure S6), in 8/179 cases (those typed on Affymetrix 6.0), compared to 13/972 controls (OR 3.45 [1.21–9.12], p<0.01). Rearrangements of this size on the autosomes confer staggering risk for other forms of disease; for example, by one recent estimate CNVs larger than 3 Mb have an OR of 47.7 for intellectual disability and/or developmental delay [46]. Our results suggest that Y chromosome structure may be more dosage sensitive than previously appreciated, and we speculate that some genes and non-coding sequences of the Y chromosome may be under stabilizing selection for copy number [57].

Three recent studies have used array-based approaches to characterize CNVs in men with azoospermia. Our finding of an X-linked CNV burden in men with spermatogenic failure has been replicated and described elsewhere [58]. In a second study, Tuttelmann et al. evaluated 89 severe oligozoospermic, 37 azoospermic, and 100 normozoospermic control men using Agilent 244K and 400K arrays and identified a number of CNVs potentially involved in male infertility [24]. Third, Stouffs et al. assayed nine azoospermic men and twenty control samples using the 244K array and followed-up CNVs of interest by q-PCR in up to 130 additional controls [25]. Using the criterion of at least 51% reciprocal overlap, we have identified a number of CNVs in the current study that overlap with case-specific CNVs in the Tuttelmann and Stouffs studies. The majority of these CNVs appear to be relatively common polymorphisms and not case-specific in our larger dataset; however several noteworthy CNVs overlap between studies and are absent, or are present at a very low frequency in controls. For example, Tuttelmann et al. identified a private duplication on Xq22.2 in an oligozoospermic man [24], and we identified an overlapping duplication in an oligozoospermic man from the present study (ChrX:103065826–103205985, NCBI36). These duplications alter the copy number of a small number of testis-specific or testis-expressed variants of histone 2B (H2BFWT, H2BFXP, H2BFM). No CNVs in this region were identified in more than 1600 controls. Tuttelmann et al. also identified an azoospermic man with a deletion and another with a duplication on 8q24.3, encompassing the genes PLEC1 and MIR661 [24]. We identified an oligozoospermic man with a duplication of the same region, affecting the same functional elements (chr8 : 145064091–145118650, NCBI36). CNVs of this locus are very rare, with a frequency of about 0.005% in our controls and 0.0025% in controls used for a recent study of developmental delay [46]. It is important to note that new variants will frequently be discovered whenever a discovery technology such as array CGH is applied to a new sample set, and the observation that a variant is patient-specific is not in itself remarkable, especially when one is investigating very small sample sizes.

Our observation of low deletion HI scores in cases raises a number of considerations for future studies of the genetics of spermatogenic impairment. We interpret low HI scores in cases as evidence against a widespread role for dominant, highly penetrant deletions in spermatogenic failure. It is possible that our case recruitment, which pre-screened for normal karyotype, may have removed all large HI score events; however our identification of two large HI deletions of WT1 and MAPK1 indicate otherwise (Figure 3). A second concern is that the data used to train the haploinsufficiency prediction algorithm is in part based on features of deletions known to cause dominant pediatric disease, and that an analogous approach trained on fertility phenotypes may lead to different conclusions. There are few examples of dominant loss-of-function mutations causing isolated infertility in humans and only 5 of the >200 mouse infertility mutants described in a previous review showed a phenotype in heterozygous form [45], so fitting a model of a dominant infertility mutation may be challenging in the short term. Nonetheless, developing disease-specific pathogenicity scores for infertility phenotypes should be a priority.

Despite the differences between the genetic signatures of spermatogenic impairment and severe developmental disease noted above, there are connections in their epidemiology. Recent results estimate a 9.9% rate of birth defects in children conceived by intracytoplasmic sperm injection (ICSI), the technology typically employed for assisting cases of severe male factor infertility, which is an OR of 1.77 compared to unassisted reproduction [59]. Among several possible explanations for this finding, our data raise the possibility that mutations that compromise gonadal function may act pleiotropically to disrupt development in other tissues. A better understanding of the genetic basis of male infertility is urgently needed in order to improve risk assessment for couples considering assisted reproduction.

Clinical genomics is a paradigm in need of robust applications, and our finding of a large CNV burden in cases suggest that some infertility mutations may have the high penetrance required for clinical utility. Indeed some mutation screens are already used clinically in the management of male infertility. Although the presence of azoospermia can be easily assessed using a standard laboratory test, many men with azoospermia will have sperm production within the testis and be candidates for testicular sperm retrieval. We have already identified that the specific AZF deletion (a, b or b/c) has a dramatic effect on the prognosis of sperm retrieval (vs. AZFc-deleted males) [60]. In the present study, we have identified deletion of DMRT1 coding sequence as a genetic event that appears highly predictive of spermatogenic failure. In depth characterization of carriers is now needed to understand how this mutation affects the prognosis of sperm retrieval. Similar whole genome tests may provide critical prognostic information that can help to characterize the chance of successful treatment for couples with non-obstructive azoospermia, avoiding expensive and needlessly invasive interventions, while potentially providing guidance for new therapeutic interventions.

Methods

Ethics statement

All DNA samples used in this study were derived from peripheral blood lymphocytes collected from individuals giving IRB-approved informed consent. The following IRBs were involved: INSA Ethics Committee and Hospital Authority (Portugal), University of Utah IRB, and Washington University in St. Louis IRB (#201107177). All samples of genomic DNA to be analysed in this study i) belong to DNA banks that have been established throughout the years; ii) are coded; and iii) each individual has signed a declaration of informed consent before donating his genomic DNA for analysis, authorizing molecular studies to be performed with this material.

Patient cohorts

All cases were deemed idiopathic following a standard clinical workup, which included screening for Y chromosome deletions. Controls from the Utah cohort were men with normal semen analysis, remaining controls were not phenotyped on semen quality. Full details of the source and diagnosis of samples in this study are available in Supplemental Methods. When using SNP arrays, CNV analysis is more sensitive to experimental noise than SNP genotyping, and we used different sample QC metrics to inform CNV and SNP stages of our project. As a result, we have slightly larger sample sizes for the HBD analyses than for the CNV analyses.

Population structure

The individuals studied here were sourced from diverse geographic locations (Table 1, Text S1). All primary samples (e.g. 323 cases and 1133 control samples subjected to whole-genome genetic analysis) were of self-reported Caucasian ancestry, but it was necessary to take additional steps to control for population structure in all aspects of the analysis. First, genetic ancestry of each sample was assessed by principal components analysis and ethnicity outliers were removed (Figure S2, Figure S3). Second, eigenvectors generated by this principal components analysis were used as covariates in both CNV association and inbreeding coefficient association analyses. For analyses focusing on the Y chromosome, we performed analyses conditioning on Y haplogroup to provide the most stringent possible correction for population structure with available data. Lastly, we conducted alternate association analyses with the Porto case cohort using a smaller, but more geographically proximal Spanish control cohort (Figure S5).

Identification of CNVs, regions of homozygosity-by-descent

Three array platforms were used for CNV discovery: Illumina 370K (Utah), Illumina OmniExpress (Washington University), and Affymetrix 6.0 (Porto, Cornell, Nanjing). Full details of sample processing and array experiments are available in Supplemental Methods. Three CNV calling algorithms were used to generate CNV maps for each individual typed with Illumina technology: GADA, a sparse Bayesian learning approach [61]; PennCNV, a Hidden Markov Model (HMM)-based method originally designed for the Illumina platform [62]; and QuantiSNP 2.0, another HMM-based method for Illumina [63]. CNVs called by 2 of 3 algorithms were retained for analysis. CNV calling for Affymetrix 6.0 was performed with Birdsuite [64]. Due to the complexity of calling CNVs on the sex chromosomes, for all array datasets we implemented a bespoke normalization and calling procedure that used only the GADA algorithm to call CNVs from the X and Y chromosomes. For full details of CNV calling see Supplemental Methods.

Regions of homozygosity-by-descent (HBD) were identified using BEAGLE 3.0 [48]. SNPs with no-call rates >5% were removed prior to HBD analysis. As BEAGLE uses a model for background linkage disequilibrium that is fit from the data, cases and controls from each cohort were analyzed simultaneously and separately to assess cohort-specific biases in calling HBD. Prior to downstream analysis, we identified and removed a small number of reported HBD regions that corresponded to rare, large hemizygous deletions.

Inbreeding coefficients for each individual were calculated from their HBD data using the formula:

CNV and HBD association analyses

Due to differences in array content, CNV frequencies were determined on a per-platform basis. All CNV calls made on a given platform, in both cases and controls, were combined into CNV regions using a threshold of 50% reciprocal overlap to defined two events as the same ([35]). We defined the CNV frequency as the proportion of all samples (cases and controls) containing that CNV.

We constructed several statistical tests to measure differences between cases and controls. We used Mann-Whitney U tests to test for differences in the total amount of aneuploid sequence per genome. We used standard logistic regression to test for CNV load on chromosome compartments (e.g. the autosomes, X chromosome) and a small number of functional features (genes, miRNA, etc). To control for population structure these models included the first 10 principal components from PCA analysis of the SNP genotype data from all cohorts (Figure S2). We used a permutation strategy for genomewide, locus-by-locus testing for association at all genes and in 500 kb non-overlapping genomic windows. The permutation strategy, implemented with the software package PLINK, calculates nominal and genomewide p-values by permuting case-control labels [65]. To present consistent summaries of CNV burden for the entire study (all cohorts combined), we used linear mixed-effects logistic regression, treating cohort as a random factor and compared these to effect size estimates for each cohort separately using standard logistic regression (Figure 1). The mixed effects modeling framework controls for SNP platform as each case-control cohort was typed on a different platform; a similar use of mixed-effect modeling was recently described in a meta-analysis of schizophrenia SNP data [27].

Analogous tests were conducted on HBD segments from the original discovery cohort and the combined primary and replication datasets.

Validation assay

We performed validation and replication analyses of DMRT1 deletions with and assay based on Taqman PCR. Copy number was assessed using a pre-designed assay #Hs06833797_cn within the DMRT1 gene against an RNase P reference (assay # 4403326; both assays from Applied Biosystems, Carlsbad, CA, USA) according to manufacturer's recommendations.

Supporting Information

Zdroje

1. KrauszC (2011) Male infertility: pathogenesis and clinical diagnosis. Best practice & research Clinical endocrinology & metabolism 25 : 271–285.

2. SchultzN, HamraFK, GarbersDL (2003) A multitude of genes expressed solely in meiotic or postmeiotic spermatogenic cells offers a myriad of contraceptive targets. Proc Natl Acad Sci U S A 100 : 12201–12206.

3. TiepoloL, ZuffardiO (1976) Localization of factors controlling spermatogenesis in the nonfluorescent portion of the human Y chromosome long arm. Hum Genet 34 : 119–124.

4. LanfrancoF, KamischkeA, ZitzmannM, NieschlagE (2004) Klinefelter's syndrome. Lancet 364 : 273–283.

5. YatsenkoAN, YatsenkoSA, WeedinJW, LawrenceAE, PatelA, et al. (2010) Comprehensive 5-year study of cytogenetic aberrations in 668 infertile men. The Journal of urology 183 : 1636–1642.

6. KoscinskiI, ElinatiE, FossardC, RedinC, MullerJ, et al. (2011) DPY19L2 deletion as a major cause of globozoospermia. American journal of human genetics 88 : 344–350.

7. SykiotisGP, PitteloudN, SeminaraSB, KaiserUB, CrowleyWFJr (2010) Deciphering genetic disease in the genomic era: the model of GnRH deficiency. Science translational medicine 2 : 32rv32.

8. LeePA, HoukCP, AhmedSF, HughesIA (2006) Consensus statement on management of intersex disorders. International Consensus Conference on Intersex. Pediatrics 118: e488–500.

9. StankiewiczP, LupskiJR (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61 : 437–455.

10. SebatJ, LakshmiB, MalhotraD, TrogeJ, Lese-MartinC, et al. (2007) Strong association of de novo copy number mutations with autism. Science 316 : 445–449.

11. TamGW, RedonR, CarterNP, GrantSG (2009) The role of DNA copy number variation in schizophrenia. Biol Psychiatry 66 : 1005–1012.

12. WilsonGM, FlibotteS, ChopraV, MelnykBL, HonerWG, et al. (2006) DNA copy-number analysis in bipolar disorder and schizophrenia reveals aberrations in genes involved in glutamate signaling. Hum Mol Genet 15 : 743–749.

13. MeffordHC, MuhleH, OstertagP, von SpiczakS, BuysseK, et al. (2010) Genome-wide copy number variation in epilepsy: novel susceptibility loci in idiopathic generalized and focal epilepsies. PLoS Genet 6: e1000962 doi:10.1371/journal.pgen.1000962

14. PtacekT, LiX, KelleyJM, EdbergJC (2008) Copy number variants in genetic susceptibility and severity of systemic lupus erythematosus. Cytogenet Genome Res 123 : 142–147.

15. SchaschlH, AitmanTJ, VyseTJ (2009) Copy number variation in the human genome and its implication in autoimmunity. Clin Exp Immunol 156 : 12–16.

16. JeonJP, ShimSM, NamHY, RyuGM, HongEJ, et al. (2010) Copy number variation at leptin receptor gene locus associated with metabolic traits and the risk of type 2 diabetes mellitus. BMC Genomics 11 : 426.

17. PollexRL, HegeleRA (2007) Copy number variation in the human genome and its implications for cardiovascular disease. Circulation 115 : 3130–3138.

18. TchatchouS, BurwinkelB (2008) Chromosome copy number variation and breast cancer risk. Cytogenet Genome Res 123 : 183–187.

19. FrankB, BermejoJL, HemminkiK, SutterC, WappenschmidtB, et al. (2007) Copy number variant in the candidate tumor suppressor gene MTUS1 and familial breast cancer risk. Carcinogenesis 28 : 1442–1445.

20. BraudeI, VukovicB, PrasadM, MarranoP, TurleyS, et al. (2006) Large scale copy number variation (CNV) at 14q12 is associated with the presence of genomic abnormalities in neoplasia. BMC Genomics 7 : 138.

21. LaFramboiseT, WeirBA, ZhaoX, BeroukhimR, LiC, et al. (2005) Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Comput Biol 1: e65 doi:10.1371/journal.pcbi.0010065

22. HansenS, EichlerEE, FullertonSM, CarrellD (2010) SPANX gene variation in fertile and infertile males. Syst Biol Reprod Med 55 : 18–26.

23. JorgezCJ, WeedinJW, SahinA, Tannour-LouetM, HanS, et al. (2011) Aberrations in pseudoautosomal regions (PARs) found in infertile men with Y-chromosome microdeletions. J Clin Endocrinol Metab 96: E674–679.

24. TuttelmannF, SimoniM, KlieschS, LedigS, DworniczakB, et al. (2011) Copy number variants in patients with severe oligozoospermia and sertoli-cell-only syndrome. PLoS ONE 6: e19426 doi:10.1371/journal.pone.0019426

25. StouffsK, VandermaelenD, MassartA, MentenB, VergultS, et al. (2012) Array comparative genomic hybridization in male infertility. Human reproduction 27 : 921–929.

26. KuCS, NaidooN, TeoSM, PawitanY (2011) Regions of homozygosity and their impact on complex diseases and traits. Human genetics 129 : 1–15.

27. KellerMC, SimonsonMA, RipkeS, NealeBM, GejmanPV, et al. (2012) Runs of homozygosity implicate autozygosity as a schizophrenia risk factor. PLoS Genet 8: e1002656 doi:10.1371/journal.pgen.1002656

28. NallsMA, GuerreiroRJ, Simon-SanchezJ, BrasJT, TraynorBJ, et al. (2009) Extended tracts of homozygosity identify novel candidate genes associated with late-onset Alzheimer's disease. Neurogenetics 10 : 183–190.

29. Enciso-MoraV, HoskingFJ, HoulstonRS (2010) Risk of breast and prostate cancer is not associated with increased homozygosity in outbred populations. European journal of human genetics: EJHG 18 : 909–914.

30. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 : 661–678.

31. WangPJ, McCarreyJR, YangF, PageDC (2001) An abundance of X-linked genes expressed in spermatogonia. Nature genetics 27 : 422–426.

32. StankiewiczP, LupskiJR (2002) Genome architecture, rearrangements and genomic disorders. Trends Genet 18 : 74–82.

33. LugtenbergD, Zangrande-VieiraL, KirchhoffM, WhibleyAC, OudakkerAR, et al. (2010) Recurrent deletion of ZNF630 at Xp11.23 is not associated with mental retardation. American journal of medical genetics Part A 152A: 638–645.

34. SkaletskyH, Kuroda-KawaguchiT, MinxPJ, CordumHS, HillierL, et al. (2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423 : 825–837.

35. ConradDF, PintoD, RedonR, FeukL, GokcumenO, et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature 464 : 704–712.

36. SmithCA, RoeszlerKN, OhnesorgT, CumminsDM, FarliePG, et al. (2009) The avian Z-linked gene DMRT1 is required for male sex determination in the chicken. Nature 461 : 267–271.

37. MurphyMW, SarverAL, RiceD, HatziK, YeK, et al. (2010) Genome-wide analysis of DNA binding and transcriptional regulation by the mammalian Doublesex homolog DMRT1 in the juvenile testis. Proceedings of the National Academy of Sciences of the United States of America 107 : 13360–13365.

38. RaymondCS, ParkerED, KettlewellJR, BrownLG, PageDC, et al. (1999) A region of human chromosome 9p required for testis development contains two genes related to known sexual regulators. Hum Mol Genet 8 : 989–996.

39. Tannour-LouetM, HanS, CorbettST, LouetJF, YatsenkoS, et al. (2010) Identification of de novo copy number variants associated with human disorders of sexual development. PLoS ONE 5: e15392 doi:10.1371/journal.pone.0015392

40. BarbaroM, BalsamoA, AnderlidBM, MyhreAG, GennariM, et al. (2009) Characterization of deletions at 9p affecting the candidate regions for sex reversal and deletion 9p syndrome by MLPA. Eur J Hum Genet 17 : 1439–1447.

41. HuZ, XiaY, GuoX, DaiJ, LiH, et al. (2012) A genome-wide association study in Chinese men identifies three risk loci for non-obstructive azoospermia. Nature genetics 44 : 183–186.

42. ItsaraA, CooperGM, BakerC, GirirajanS, LiJ, et al. (2009) Population analysis of large copy number variants and hotspots of human genetic disease. American journal of human genetics 84 : 148–161.

43. ShaikhTH, GaiX, PerinJC, GlessnerJT, XieH, et al. (2009) High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications. Genome research 19 : 1682–1690.

44. MiholaO, TrachtulecZ, VlcekC, SchimentiJC, ForejtJ (2009) A mouse speciation gene encodes a meiotic histone H3 methyltransferase. Science 323 : 373–375.

45. MatzukMM, LambDJ (2008) The biology of infertility: research advances and clinical challenges. Nat Med 14 : 1197–1213.

46. CooperGM, CoeBP, GirirajanS, RosenfeldJA, VuTH, et al. (2011) A copy number variation morbidity map of developmental delay. Nature genetics 43 : 838–846.

47. HuangN, LeeI, MarcotteEM, HurlesME (2010) Characterising and predicting haploinsufficiency in the human genome. PLoS Genet 6: e1001154 doi:10.1371/journal.pgen.1001154

48. BrowningSR, BrowningBL (2010) High-resolution detection of identity by descent in unrelated individuals. American journal of human genetics 86 : 526–539.

49. KantarciS, RaggeNK, ThomasNS, RobinsonDO, NoonanKM, et al. (2008) Donnai-Barrow syndrome (DBS/FOAR) in a child with a homozygous LRP2 mutation due to complete chromosome 2 paternal isodisomy. American journal of medical genetics Part A 146A: 1842–1847.

50. KellerMC, VisscherPM, GoddardME (2011) Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics 189 : 237–249.

51. LedigS, HiortO, WunschL, WieackerP (2012) Partial deletion of DMRT1 causes 46,XY ovotesticular disorder of sexual development. European journal of endocrinology/European Federation of Endocrine Societies 167 : 119–124.

52. LedigS, HiortO, SchererG, HoffmannM, WolffG, et al. (2010) Array-CGH analysis in patients with syndromic and non-syndromic XY gonadal dysgenesis: evaluation of array CGH as diagnostic tool and search for new candidate loci. Human reproduction 25 : 2637–2646.

53. MachevN, SautN, LongepiedG, TerriouP, NavarroA, et al. (2004) Sequence family variant loss from the AZFc interval of the human Y chromosome, but not gene copy loss, is strongly associated with male infertility. Journal of medical genetics 41 : 814–825.

54. GiachiniC, LafaceI, GuarducciE, BalerciaG, FortiG, et al. (2008) Partial AZFc deletions and duplications: clinical correlates in the Italian population. Human genetics 124 : 399–410.

55. LinYW, HsuLC, KuoPL, HuangWJ, ChiangHS, et al. (2007) Partial duplication at AZFc on the Y chromosome is a risk factor for impaired spermatogenesis in Han Chinese in Taiwan. Human mutation 28 : 486–494.

56. LuC, ZhangF, YangH, XuM, DuG, et al. (2011) Additional genomic duplications in AZFc underlie the b2/b3 deletion-associated risk of spermatogenic impairment in Han Chinese population. Human molecular genetics 20 : 4411–4421.

57. ReppingS, van DaalenSK, BrownLG, KorverCM, LangeJ, et al. (2006) High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nature genetics 38 : 463–467.

58. KrauszC, GiachiniC, Lo GiaccoD, DaguinF, ChianeseC, et al. (2012) High resolution X chromosome-specific array-CGH detects new CNVs in infertile males. PLoS ONE 7: e44887 doi:10.1371/journal.pone.0044887

59. DaviesMJ, MooreVM, WillsonKJ, Van EssenP, PriestK, et al. (2012) Reproductive technologies and the risk of birth defects. The New England journal of medicine 366 : 1803–1813.

60. HoppsCV, MielnikA, GoldsteinM, PalermoGD, RosenwaksZ, et al. (2003) Detection of sperm in men with Y chromosome microdeletions of the AZFa, AZFb and AZFc regions. Human reproduction 18 : 1660–1665.

61. Pique-RegiR, OrtegaA, AsgharzadehS (2009) Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics 25 : 1223–1230.

62. WangK, LiM, HadleyD, LiuR, GlessnerJ, et al. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome research 17 : 1665–1674.

63. ColellaS, YauC, TaylorJM, MirzaG, ButlerH, et al. (2007) QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic acids research 35 : 2013–2025.

64. KornJM, KuruvillaFG, McCarrollSA, WysokerA, NemeshJ, et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics 40 : 1253–1260.

65. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 81 : 559–575.

66. FirthHV, RichardsSM, BevanAP, ClaytonS, CorpasM, et al. (2009) DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. American journal of human genetics 84 : 524–533.

67. SandersSJ, Ercan-SencicekAG, HusV, LuoR, MurthaMT, et al. (2011) Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70 : 863–885.

68. MalhotraD, McCarthyS, MichaelsonJJ, VacicV, BurdickKE, et al. (2011) High frequencies of de novo CNVs in bipolar disorder and schizophrenia. Neuron 72 : 951–963.

69. PriceAL, PattersonNJ, PlengeRM, WeinblattME, ShadickNA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 38 : 904–909.

70. PriceAL, TandonA, PattersonN, BarnesKC, RafaelsN, et al. (2009) Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 5: e1000519 doi:10.1371/journal.pgen.1000519