Loci Associated with -Glycosylation of Human Immunoglobulin G Show Pleiotropy with Autoimmune Diseases and Haematological Cancers

Download PDF České info

Glycosylation of immunoglobulin G (IgG) influences IgG effector function by modulating binding to Fc receptors. To identify genetic loci associated with IgG glycosylation, we quantitated N-linked IgG glycans using two approaches. After isolating IgG from human plasma, we performed 77 quantitative measurements of N-glycosylation using ultra-performance liquid chromatography (UPLC) in 2,247 individuals from four European discovery populations. In parallel, we measured IgG N-glycans using MALDI-TOF mass spectrometry (MS) in a replication cohort of 1,848 Europeans. Meta-analysis of genome-wide association study (GWAS) results identified 9 genome-wide significant loci (P<2.27×10⁻⁹) in the discovery analysis and two of the same loci (B4GALT1 and MGAT3) in the replication cohort. Four loci contained genes encoding glycosyltransferases (ST6GAL1, B4GALT1, FUT8, and MGAT3), while the remaining 5 contained genes that have not been previously implicated in protein glycosylation (IKZF1, IL6ST-ANKRD55, ABCF2-SMARCD3, SUV420H1, and SMARCB1-DERL3). However, most of them have been strongly associated with autoimmune and inflammatory conditions (e.g., systemic lupus erythematosus, rheumatoid arthritis, ulcerative colitis, Crohn's disease, diabetes type 1, multiple sclerosis, Graves' disease, celiac disease, nodular sclerosis) and/or haematological cancers (acute lymphoblastic leukaemia, Hodgkin lymphoma, and multiple myeloma). Follow-up functional experiments in haplodeficient Ikzf1 knock-out mice showed the same general pattern of changes in IgG glycosylation as identified in the meta-analysis. As IKZF1 was associated with multiple IgG N-glycan traits, we explored biomarker potential of affected N-glycans in 101 cases with SLE and 183 matched controls and demonstrated substantial discriminative power in a ROC-curve analysis (area under the curve = 0.842). Our study shows that it is possible to identify new loci that control glycosylation of a single plasma protein using GWAS. The results may also provide an explanation for the reported pleiotropy and antagonistic effects of loci involved in autoimmune diseases and haematological cancer.

Published in the journal: . PLoS Genet 9(1): e32767. doi:10.1371/journal.pgen.1003225
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1003225

Summary

Introduction

Glycosylation is a ubiquitous post-translational protein modification that modulates the structure and function of polypeptide components of glycoproteins [1], [2]. N-glycan structures are essential for multicellular life [3]. Mutations in genes involved in modification of glycan antennae are common and can lead to severe or fatal diseases [4]. Variation in protein glycosylation also has physiological significance, with immunoglobulin G (IgG) being a well-documented example. Each heavy chain of IgG carries a single covalently attached bi-antennary N-glycan at the highly conserved asparagine 297 residue in each of the CH2 domains of the Fc region of the molecule. The attached oligosaccharides are structurally important for the stability of the antibody and its effector functions [5]. In addition, some 15–20% of normal IgG molecules have complex bi-antennary oligosaccharides in the variable regions of light or heavy chains [6], [7]. 36 different glycans (Figure 1) can be attached to the conserved Asn297 of the IgG heavy chain [8], [9], leading to hundreds of different IgG isomers that can be generated from this single glycosylation site.

**Fig. 1. Structures of glycans separated by HILIC-UPLC analysis of the IgG glycome.**

Glycosylation of IgG has important regulatory functions. The absence of galactose residues in association with rheumatoid arthritis was reported nearly 30 years ago [10]. The addition of sialic acid dramatically changes the physiological role of IgGs, converting them from pro-inflammatory to anti-inflammatory agents [11], [12]. Addition of fucose to the glycan core interferes with the binding of IgG to FcγRIIIa and greatly diminishes its capacity for antibody dependent cell-mediated cytotoxicity (ADCC) [13], [14]. Structural analysis of the IgG-Fc/FcγRIIIa complex has demonstrated that specific glycans on FcγRIIIa are also essential for this effect of core-fucose [15] and that removal of core fucose from IgG glycans increases clinical efficacy of monoclonal antibodies, enhancing their therapeutic effect through ADCC mediated killing [16]–[18].

New high-throughput technologies, such as high/ultra performance liquid chromatography (HPLC/UPLC), MALDI-TOF mass spectrometry (MS) and capillary electrophoresis (CE), allow us to quantitate N-linked glycans from individual human plasma proteins. Recently, we performed the first population-based study to demonstrate physiological variation in IgG glycosylation in three European founder populations [19]. Using UPLC, we showed exceptionally high individual variability in glycosylation of a single protein -⁠ human IgG -⁠ and substantial heritability of the observed measurements [19]. In parallel, we quantitated IgG N-glycans in another European population (Leiden Longevity Study –⁠ LLS) by mass spectrometry. In this study, we combined those high-throughput glycomics measurements with high-throughput genomics to perform the first genome wide association (GWA) study of the human IgG N-glycome.

Results

Genome-wide association study and meta-analysis

We separated a single protein (IgG) from human plasma and quantitated its N-linked glycans using two state-of-the-art technologies (UPLC and MALDI-TOF MS). Their comparative advantages in GWA studies were difficult to predict prior to the conducted analyses, so both were used -⁠ one in each available cohort. We performed 77 quantitative measurements of IgG N-glycosylation using ultra performance liquid chromatography (UPLC) in 2247 individuals from four European discovery populations (CROATIA-Vis, CROATIA-Korcula, ORCADES, NSPHS). In parallel, we measured IgG N-glycans using MALDI-TOF mass spectrometry (MS) in 1848 individuals from another European population (Leiden Longevity Study (LLS)). Descriptions of these population cohorts are found in Table S11. Aiming to identify genetic loci involved in IgG glycosylation, we performed a GWA study in both cohorts. Associations at 9 loci reached genome-wide significance (P<2.27×10⁻⁹) in the discovery meta-analysis and at two loci in the replication cohort. The two loci identified in the latter cohort were associated with the analogous glycan traits in the former cohort as detailed in the subsection “Replication of our findings”. Both UPLC and MS methods for quantitation of N-glycans were found to be amenable to GWA studies. Since our UPLC study gave a considerably greater yield of significant findings in comparison to MS study, the majority of our results section focuses on the findings from the discovery population cohort, which was studied using the UPLC method.

Among the nine loci that passed the genome-wide significance threshold, four contained genes encoding glycosyltransferases (ST6GAL1, B4GALT1, FUT8 and MGAT3), while the remaining five loci contained genes that have not been implicated in protein glycosylation previously (IKZF1, IL6ST-ANKRD55, ABCF2-SMARCD3, SUV420H1-CHKA and SMARCB1-DERL3). As a rule, the implicated genes were associated with several N-glycan traits. The explanation and notation of the 77 N-glycan measures is presented in Table S1. It comprises 23 directly measured quantitative IgG glycosylation traits (shown in Figure 1) and 54 derived traits. Descriptive statistics of these measures in the discovery cohorts are presented in Table S2. GWA analysis was performed in each of the populations separately and the results were combined in an inverse-variance weighted meta-analysis. Summary data for each gene region showing genome-wide association (p<27.2×10⁻⁹) or found to be strongly suggestive (2.27×10⁻⁹<p<5×10⁻⁸) are presented in Table 1. Summary data for all single-nucleotide polymorphisms (SNPs) and traits with suggestive associations (p<1×10⁻⁵) are presented in Table S3, with population-specific and pooled genomic control (GC) factors reported in Table S4.

Tab. 1. A complete list of genetic markers that showed genome-wide significant (P<2.27E-9) or strongly suggestive (P≤5E-08) association with glycosylation of Immunoglobulin G analysed by UPLC in the discovery meta-analysis.

The most statistically significant association was observed in a region on chromosome 3 containing the gene ST6GAL1 (Table 1, Figure S1A). ST6GAL1 codes for the enzyme sialyltransferase 6 which adds sialic acid to various glycoproteins including IgG glycans (Figure 2), and is therefore a highly biologically plausible candidate. In this region of about 70 kilobases (kb) we identified 37genome-wide significant SNPs associated with 14 different IgG glycosylation traits, generally reflecting sialylation of different glycan structures (Table 1). The strongest association was observed for the percentage of monosialylation of fucosylated digalactosylated structures in total IgG glycans (IGP29, see Figure 1 and Table S1 for notation), for which a SNP rs11710456 explained 17%, 16%, 18% and 3% of the trait variation for CROATIA-Vis, CROATIA-Korcula, ORCADES and NSPHS respectively (meta-analysis p = 6.12×10⁻⁷⁵). NSPHS had a very small sample size in this analysis (N = 179) and may not provide an accurate portrayal of the variance explained in this particular population (estimated as 3%). Although the allele frequency is similar between all populations, in the forest plot (Figure S1A) although NSPHS does overlap with the other populations, the 95% CI is much larger. It is also possible that there are population-specific genetic and/or environmental differences in NSPHS that are affecting the amount of variance explained by this SNP. After analysis conditioning on the top SNP (rs11710456) in this region, the SNP rs7652995 still reached genome-wide significance (p = 4.15×10⁻¹³). After adjusting for this additional SNP, the association peak was completely removed. This suggests that there are several genetic factors underlying this association. Conditional analysis of all other significant and suggestive regions resulted in the complete removal of the association peak.

**Fig. 2. A summary of changes to IgG N-glycan structures that were associated with 16 loci identified through GWA study.**

We also identified 28 SNPs showing genome-wide significant associations with 11 IgG glycosylation traits (2.70×10⁻¹¹<p<4.73×10⁻⁸) at a locus on chromosome 9 spanning over 60 kb (Figure S1B). This region includes B4GALT1, which codes for the galactosyltransferase responsible for the addition of galactose to IgG glycans (Figure 2). The glycan traits showing genome-wide association included the percentage of FA2G2S1 in the total fraction (IGP17), the percentage of FA2G2 in the total and neutral fraction (IGP13, IGP53), the percentage of sialylation of fucosylated structures without bisecting GlcNAc (IGP24, IGP26), the percentage of digalactosylated structures in the total neutral fraction (IGP57) and, in the opposite direction, the percentage of bisecting GlcNAc in fucosylated sialylated structures (IGP36–IGP40).

A large (541 kb) region on chromosome 14 harbouring the FUT8 gene contained 167 SNPs showing significant associations with 12 IgG glycosylation traits reflecting fucosylation of IgG glycans (Figure S1C). FUT8 codes for fucosyltransferase 8, an enzyme responsible for the addition of fucose to IgG glycans (Figure 2). The strongest association (1.08×10⁻²²<p<2.60×10⁻¹⁷) was observed with the percentage of A2 glycans in total and neutral fractions (IGP2, IGP42) and for derived traits related to the proportion of fucosylation (IGP58, IGP59 and IGP61; all in the opposite direction). In summary, SNPs at the FUT8 locus influence the proportion of fucosylated glycans, and, in the opposite direction, the percentages of A2, A2G1 and A2G2 glycans which are not fucosylated.

On chromosome 22, two loci were associated with IgG glycosylation. The first region, containing SYNGR1-TAB1-MGAT3-CACNA1I genes, spans over 233 kb. This region harboured 60 SNPs showing genome-wide significant association with 17 IgG glycosylation traits (Figure S1D). Association was strongest between SNP rs909674 and the incidence of bisecting GlcNAc in all fucosylated disialylated structures (IGP40, p = 9.66×10⁻²⁵) and the related ratio IGP39 (p = 8.87×10⁻²⁴). In summary, this locus contained variants influencing levels of fucosylated species and the ratio between fucosylated (especially disialylated) structures with and without bisecting GlcNAc (Figure 2). Since MGAT3 codes for the enzyme N-acetylglucosaminyltransferase III (beta-1,4-mannosyl-glycoprotein-4-beta-N-acetylglucosaminyltransferase), which is responsible for the addition of bisecting GlcNAc to IgG glycans, this gene is the most biologically plausible candidate.

Bioinformatic analysis of known and predicted protein-protein interactions using String 9.0 software (http://string-db.org/) showed that interactions between the clusters of FUT8-B4GALT1-MGAT3 genes and ST6GAL1-B4GALT1-MGAT3 genes had high confidence score: FUT8-B4GALT1 of 0.90; FUT8-MGAT3 of 0.95; ST6GAL1-B4GALT1 of 0.90; and ST6GAL1-MGAT3 of 0.73. The glycosyltranferase genes at the four GWAS loci -⁠ ST6GAL1, B4GALT1, FUT8, and MGAT3 –⁠ are responsible for adding sialic acid, galactose, fucose and bisecting GlcNAc to IgG glycans, thus demonstrating the proof of principle that a single protein glycosylation GWAS approach can identify biologically important glycan pathways and their networks. Interestingly, ST6GAL1 has been previously associated with Type 2 diabetes [20], MGAT3 with Crohn's disease [21], primary biliary cirrhosis [22] and cardiac arrest [23], and FUT8 with multiple sclerosis, blood glutamate levels [24] and conduct disorder [25] (Table 2). We have recently shown changes in plasma N-glycan profile between patients with attention-deficit hyperactivity disorder (ADHD), autism spectrum disorders and healthy controls, and identified loci influencing plasma N-glycome with pleiotropic effects on ADHD [26], [27].

Tab. 2. An analysis of pleiotropy between loci associated with IgG glycans and previously reported disease/trait susceptibility loci, with linkage disequilibrium computed between the most significantly associated SNPs.

Novel candidate genes involved with N-glycosylation

In addition to four loci containing genes for enzymes known to be involved in IgG glycosylation, our study also found five unexpected associations showing genome-wide significance. In the second region on chromosome 22 we observed genome-wide significant associations of 10 SNPs with 20 IgG glycosylation traits. The region spans 49 kb and contains the genes SMARCB1-DERL3 (Figure S1E). The strongest associations (8.63×10⁻¹⁷<p<3.00×10⁻¹³) were observed between SNP rs2186369 and the percentage of FA2[6]BG1 in total and neutral fractions (IGP9, IGP49) and levels of fucosylated structures with bisecting GlcNAc (IGP66, IGP68, IGP70, IGP71 in the same direction and IGP72 in the opposite direction). Thus, the SMARCB1-DERL3 locus appears to specifically influence levels of fucosylated monogalactosylated structures with bisecting GlcNAc (Figure 2). DERL3 is a promising functional candidate, because it encodes a functional component of endoplasmic reticulum (ER)-associated degradation for misfolded luminal glycoproteins [28]. However, SMARCB1 is also known to be important in antiviral activity, inhibition of tumour formation, neurodevelopment, cell proliferation and differentiation [29]. The region has also been implicated in the regulation of γ-glutamyl-transferase (GGT) [30] (Table 2).

A locus on chromosome 7 spanning 26kb contained 11 SNPs showing genome-wide significant associations with 13 IgG glycosylation traits (Figure S1F). The strongest association (p = 1.87×10⁻¹³) was observed between SNP rs6421315 located in IKZF1 and the percentage of fucosylation of agalactosylated structures without bisecting GlcNAc (IGP63). Thus, SNPs at this locus influence the percentage of non-fucosylated agalactosylated glycans, the fucosylation ratio in agalactosylated glycans (in opposite directions for glycan species with and without bisecting GlcNAc), and the ratio of fucosylated structures with and without bisecting GlcNAc (Figure 2). The IKZF1 gene encodes the DNA-binding protein Ikaros, acting as a transcriptional regulator and associated with chromatin remodelling. It is considered to be the important regulator of lymphocyte differentiation and has been shown to influence effector pathways through control of class switch recombination [31], thus representing a promising functional candidate [32]. There is overwhelming evidence that IKZF1 variants are associated with childhood acute lymphoblastic leukaemia [33], [34] and several diseases with an autoimmune component: systemic lupus erythematosus (SLE) [35]–[37], type 1 diabetes [38], [39], Crohn's disease [40], systemic sclerosis [41], malaria [42] and erythrocyte mean corpuscular volume [43] (Table 2).

SNPs at several other loci also showed genome-wide significant association with a number of different IgG glycosylation traits (Figure S1G–S1P). Chromosome 5 SNP rs17348299, located in IL6ST-ANKRD55 was significantly associated (6.88×10⁻¹¹<p<2.39×10⁻⁹) with six IgG glycosylation traits, including FA2 and FA2G2 in total and neutral fractions (IGP3, IGP13, IGP43, IGP53) and the percentage of agalactosylated and digalactosylated structures in total neutral IgG glycans (IGP55, IGP57) (Figure 2). The protein encoded by IL6ST is a signal transducer shared by many cytokines, including interleukin 6 (IL6), ciliary neurotrophic factor (CNTF), leukaemia inhibitory factor (LIF), and oncostatin M (OSM). Variants in IL6ST have been associated with rheumatoid arthritis and multiple myeloma, but also with components of metabolic syndrome [44]–[46].

The chromosome 7 SNP rs2072209 located in LAMB1 was strongly suggestively associated with the percentage of fucosylation of digalactosylated (with bisecting GlcNAc) structures (IGP69; p = 1.16×10⁻⁸) (Figure 2). LAMB1 (laminin beta 1) is a member of a family of extracellular matrix glycoproteins that are the major non-collagenous constituent of basement membranes. It is thought to mediate the attachment, migration and organization of cells into tissues during embryonic development by interacting with other extracellular matrix components. It has been associated with ulcerative colitis in several large-scale studies in European and Japanese populations, suggesting that changes in the integrity of the intestinal epithelial barrier may contribute to the pathogenesis of the disease [47]–[51] (Table 2).

Another particularly interesting finding was the suggestive association between rs404256 in the BACH2 gene on chromosome 6 and IGP7, defined through proportional contribution of FA2[6]G1 in all IgG glycans (p = 7.49×10⁻⁹). BACH2 is B-cell-specific transcription factor that can act as a suppressor or promoter; among many known functions, it has been shown to “orchestrate” transcriptional activation of B-cells, modify the cytotoxic effects of anticancer drugs and regulate IL-2 expression in umbilical cord blood CD4⁺ T cells [52]. BACH2 has been previously associated with a spectrum of diseases with autoimmune component: type 1 diabetes [53]–[56], Graves' disease [57], celiac disease [58], Crohn's disease [21] and multiple sclerosis [59] (Table 2).

The chromosome 11 SNP rs4930561 located in the SUV420H1-CHKA gene was associated with percentage of FA1 in neutral (IGP41; p = 8.88×10⁻¹⁰) and total (IGP1; p = 1.30×10⁻⁸) fractions of IgG glycans. SUV420H1 codes for histone-lysine N-methyltransferase which specifically trimethylates lysine 20 of histone H4 and could therefore affect activity of many different genes; it is thought to be involved in proviral silencing in somatic and germ line cells through epigenetic mechanisms [60]. CHKA has a key role in phospholipid biosynthesis and may contribute to tumour cell growth. We recently reported a number of strong associations between lipidomics and glycomics traits in human plasma [61]. Thus, an enzyme involved in phospholipid synthesis is also a possible candidate because the lipid environment is known to affect glycosyltransferases activity [61].

Three further loci were identified as strongly suggestive through GWAS and deserve attention for their possible pleiotropic effects. SNP rs9296009 in PRRT1 (proline-rich transmembrane protein 1) was associated with IGP23 (p = 3.79×10⁻⁰⁸) while variants in PRRT1 previously showed associations with nodular sclerosis and Hodgkin lymphoma [62]. Moreover, rs1049110 in HLA-DQA2-HLA-DQB2 was associated with IGP2 and IGP42 (p = 1.64×10⁻⁰⁸ and 4.44×10⁻⁰⁸, respectively). This SNP is in nearly complete linkage disequilibrium with two other SNPs in this region that have previously been associated with SLE and hepatitis B [63] (Table 2). Another SNP in this region has been linked with narcolepsy [64]. Finally, rs7224668 in SLC38A10, a putative sodium-dependent amino acid/proton antiporter, showed significant association with IGP31 (p = 3.33×10⁻⁰⁸). Although the function of this gene is not understood, it has been associated with autism and longevity [65], [66].

The remaining three signals implicated ABCF2-SMARCD3 region (rs1122979 was associated with IGP 2, 5, 42, 45, with p-value ranging between 2.10×10⁻¹⁰<p<1.89×10⁻⁹), RECK (rs4878639 was suggestively associated with IGP17; p = 3.51×10⁻⁸) and PEX5 (rs12828421 suggestively associated with IGP41; p = 4.48×10⁻⁸). The function of ABCF2 (ATP-binding cassette, sub-family F, member 2) is not well understood. SMARCD3 stimulates nuclear receptor mediated transcription; it belongs to the neural progenitors-specific chromatin remodelling complex (npBAF complex) and the neuron-specific chromatin-remodelling complex (nBAF complex). RECK is known to be a strong suppressor of tumour invasion and metastasis, regulating metalloproteinases which are involved in cancer progression. PEX 5 binds to the C-terminal PTS1-type tripeptide peroxisomal targeting signal and plays an essential role in peroxisomal protein import (www.genecards.org).

Results from an independent cohort using MS quantitation method

The parallel effort in the outbred Leiden Longevity Study (LLS) was based on a different N-glycan quantitation method (MS). While UPLC groups glycans according to structural similarities, MS groups them by mass. Furthermore, MS analysis focused on Fc glycans while UPLC measures both Fc and Fab glycans, thus traits measured by the two methods could not have been directly compared. Glycosylation patterns of IgG1 and IgG2 were investigated by analysis of tryptic glycopeptides, with six glycoforms per IgG subclass measured. The intensities of all glycoforms were related to the monogalactosylated, core-fucosylated biantennary species, providing five relative intensities registered per IgG subclass (Tables S5 and S6). The analysis identified two loci as genome-wide significant -⁠ implicating MGAT3 (p = 1.6×10⁻¹⁰ for G1FN, analogous to UPLC IGP9; p = 3.12×10⁻⁸ for G0FN, analogous to UPLC IGP5), and B4GALT1 (p = 5.4×10⁻⁸ for G2F, analogous to UPLC IGP13) confirming GWAS signals in the discovery meta-analysis.

Replication of our findings

We then sought a separate independent replication of the other 14 genome-wide significant and strongly suggestive signals identified in the discovery analysis, which was performed in the LLS cohort, appreciating that the quantitated N-glycan traits do not exactly match between the two cohorts. SNPs were chosen for replication based on initial meta-analysis results of genotype data prior to imputed analysis. All five traits measured in LLS cohort were tested for association with all the selected SNPs (Table S6). We were able to reproduce association to ST6GAL1 (p = 8.1×10⁻⁷ for G2F, substrate for sialyltransferase) and SMARCB1-DERL3 (p = 1.6×10⁻⁷ for G1N, analogous to UPLC IGP9). Weaker, though nominally significant associations were confirmed at IKZF1 (p = 2.3×10⁻³ for G1N), SLC38A10 (p = 4.8×10⁻³ for G2N), IL6ST-ANKRD55 (p = 1.3×10⁻² for G0N) and ABCF2-SMARCD3 (p = 2.7×10⁻² for G2N). The fact that we did not replicate associations at the other 8 loci was not unexpected, because those 8 loci showed association with UPLC-measured N-glycan traits that do not compare to any of the traits measured by MS (see Table S5 for comparison of MS and UPLC traits).

Functional experiment: Ikzf1 haplodeficiency results in altered N-glycosylation of IgG

IKZF1 is considered to be the important regulator governing differentiation of T cells into CD4+ and CD8+ T cells [67]. Since glycan traits associated with IKZF1 were related to the presence and absence of core-fucose and bisecting GlcNAc, we analysed the promoter region of MGAT3 (codes for enzyme that adds bisecting GlcNAc to IgG glycans) in silico and identified two binding sites for IKZF1 that were conserved between humans and mice, while recognition sites for IKZF1 were not found in the promoter region of FUT8 (which codes for an enzyme that adds core-fucose to IgG glycans). Since the promoter regions of MGAT3 were conserved between humans and mice, we used Ikzf1 knockout mice [68] as a model to study the effects of IKZF1 deficiency on IgG glycosylation. IgG was isolated from the plasma of 5 heterozygous knockout mice and 5 wild-type controls. The summary of the results of IgG glycosylation analysis is presented in Table 3, while complete results are presented in Table S7. We observed a number of alterations in glycome composition that were all consistent with the role of IKZF1 in the down-regulation of fucosylation and up-regulation of the addition of bisecting GlcNAc to IgG glycans; 12 out of 77 IgG N-glycans measures showed statistically significant difference (p<0.05) between wild type and heterozygous Ikzf1 knock-outs, where 5 mice from each group were compared (Table 3). The empirical version of Hotelling's test demonstrated global significance (p = 0.03) of difference between distributions of IgG glycome between wild type and Ikzf1 knock-out mice, where 5 mice from each group were compared. While the tests for differences between individual glycome measurements did not reach strict statistical significance after conservative Bonferroni correction (p = 0.05/77 = 0.0006), we observed that 12 out of 77 (15%) IgG N-glycans measures showed nominally significant difference (p<0.05) between wild type and heterozygous Ikzf1 knock-outs (Table 3). Significant results from the global difference test ensure that difference between the two groups does exist, and it is most likely due to the difference between (at least some of) the measurements which demonstrated nominal significance. Observed alterations in glycome composition were all consistent with the role of IKZF1 in the down-regulation of fucosylation and up-regulation of the addition of bisecting GlcNAc to IgG glycans.

Twelve groups of IgG N-glycans (of 77 measured) that showed nominally significant difference (p<0.05) in observed values between 5 mice that were heterozygous <i>Ikzf1</i> knock-outs (Neo) and 5 wild-type controls (wt). — Tab. 3. Twelve groups of IgG N-glycans (of 77 measured) that showed nominally significant difference (p<0.05) in observed values between 5 mice that were heterozygous *Ikzf1* knock-outs (Neo) and 5 wild-type controls (wt).

Investigating the biomarker potential of IgG N-glycans in Systemic Lupus Erythematosus (SLE)

Given that IKZF1 has been convincingly associated with SLE in previous studies [35]–[37], and that functional studies in heterozygous knock-out mice in our study showed clear differences in profiles of several IgG N-glycan traits, we explored an intriguing hypothesis: whether the same IgG N-glycan traits that were significantly affected in Ikzf1 knock-out mice could be demonstrated to differ between human SLE cases and controls. If this were true, then pleiotropy between the effects of IKZF1 on SLE and on IgG N-glycans in human plasma, revealed by independent GWA studies, would lead to a discovery of a novel class of biomarkers of SLE –⁠ IgG N-glycans –⁠ which could possibly extend their usefulness in prediction of other autoimmune disorders, cancer and neuropsychiatric disorders, through the same mechanism.

To test this hypothesis, we measured IgG N-glycans in 101 SLE cases and 183 matched controls (typically two controls per case), recruited in Trinidad (see materials and methods for further details). Table 4 shows the results of the measurements: for 10 of 12 N-glycan traits chosen on a basis of the experiments in mice (Table 3). The entire dataset for all glycans can be found in Table S8. There was a statistically significant difference (p<0.05) between SLE cases and controls, which was generally not the case with other groups of N-glycans (data not shown). Moreover, the significance of the difference was striking in some cases, e.g. p<10⁻¹⁴ for IGP48, p<10⁻¹³ for IGP8, and p<10⁻⁶ for IGP64. Furthermore, the differences in the direction of effect in mice were strikingly preserved in humans (Table 4). The most significant differences observed across all 77 IgG N-glycans measurements between SLE cases and controls (Table 4) were overlapping well with the 12 N-glycan groups that were significantly changed in functional experiments in Ikzf1 knock-out mice.

Groups of IgG N-glycans from <em class="ref">Table 3</em> that showed statistically significant difference in observed values (corrected by sex, age, and African admixture) between 101 Afro-Caribbean cases with SLE and 183 controls. — Tab. 4. Groups of IgG N-glycans from *Table 3* that showed statistically significant difference in observed values (corrected by sex, age, and African admixture) between 101 Afro-Caribbean cases with SLE and 183 controls.

To strengthen our findings and control for possible bias, we repeated the analysis excluding all the cases on corticosteroid treatment at the time of interview (77/101) and subsequently all the cases that were not on corticosteroid treatment at the time of interview (24/101). Although the power of the analysis decreased due to reduced number of cases, the results did not change and they remained highly statistically significant. We also hypothesized that the observed glycan changes may not be specific to SLE, but may be caused by corticosteroid treatment, or secondary to any inflammatory process. For this reason, and in SLE cases only, we investigated whether corticosteroid treatments and/or CRP measurements, were associated with IgG N-glycan traits. Analysis for CRP was repeated with CRP treated as a binary variable (with cut-off value at 10 mg/L). In all these analyses, the initial results held and were not changed: the association of IgG N-glycans and SLE remained striking, while the association with corticosteroid treatment and CRP was not (Table S9). Finally, we also repeated the analysis adjusting for percent African admixture, as it has been reported that SLE in Afro-Caribbean population is associated with African admixture [69]. However, this adjustment only had a minor and non-systemic effect on the previous results, and the reported observations remained.

We then validated biomarker potential of IGP48, the IgG N-glycan trait most significantly associated with SLE status, in prediction of SLE in 101cases and 183 matched controls. We used the PredictABEL package for R (see materials and methods) [70]. As shown in Figure 3, age, sex and African admixture did not have any predictive power for this disease, but addition of IGP48 substantially increased sensitivity and specificity of prediction, with area under receiver-operator curve (AUC) increasing from 0.515 (95% confidence interval (CI): 0.441–0.590) to 0.842 (0.791–0.893). It is likely that further additions of other IgG N-glycans could provide even more accurate predictions. To cross-validate this result, we split our dataset with SLE cases and controls into a “training set” (2/3; 67 cases and 122 controls) and “test set” (1/3; 34 cases and 61 controls). Area under ROC-curve (AUC) was calculated for the test dataset. The whole process was repeated 1000 times, to allow computation of the mean AUC (and 95% CI) in the test datasets. Mean AUC was virtually unchanged compared to AUC obtained when using the complete dataset and no training, which suggests that the predictive power of IGP48 on SLE is very robust.

**Fig. 3. Validation of biomarker potential of IGP48 IgG N-glycan percentage in prediction of Systemic Lupus Erythematosus (SLE) in 101 Afro-Caribbean cases and 183 matched controls.**

Discussion

This study clearly demonstrates that the recent developments in high-throughput glycomics and genomics now allow identification of genetic loci that control N-glycosylation of a single plasma protein using a GWAS approach. This progress should allow many similar follow-up studies of genetic regulation of N-glycosylation of other important plasma proteins, thus bringing unprecedented insights into the role of protein glycosylation in systems biology. As a prelude to this discovery, we recently reported the results of the first GWA study of the overall human plasma N-glycome using the HPLC method. Although the study was of a comparable sample size (N∼2000), it only identified genome-wide associations with two glycosyltransferases and one transcription factor (HNF1a) [71]. We believe that the power of our initial study was reduced because N-glycans in human plasma originate from different glycoproteins where they have different functions and undergo protein-specific, or tissue-specific glycosylation. In this study the largest percentage of variance explained by a single association was 16–18% where as in the N-glycan study this was 1–6%. Furthermore, concentrations of individual glycoproteins in plasma vary in many physiological processes, introducing substantial “noise” to the quantitation of the whole-plasma N-glycome.

In this study we avoided both problems by isolating a single protein from plasma (IgG), which is produced by a single cell type (B lymphocytes), thus effectively excluding differential regulation of gene expression in different tissues, and the “noise” introduced by variation in plasma IgG concentration and by N-glycans on other plasma proteins. The only remaining “noise” in our system was the incomplete separation of some glycan structures (which co-eluted from the UPLC column) and the presence of Fab glycans on a subset of IgG molecules, but for the majority of glycan structures this “noise” was well below 10% [19]. We expected that the specificity of our phenotype and precision of the measurement provided by novel UPLC and MS methods should substantially increase the power of the study to detect genome-wide associations. Prior to analysis we could not predict which quantitation method would work better in GWA study design (UPLC vs. MS), so we used them both, each in one separate cohort of comparable sample size (N∼2000).

The UPLC method yielded many more, and much stronger, genome-wide association signals in comparison to our previous study of the total plasma N-glycome in virtually same sample set of examinees [27], [71]. Sixteen loci were identified in association with glycan traits with p-values<5×10⁻⁸ and nine reached the strict genome wide threshold of 2.27×10⁻⁹. The parallel study in the LLS cohort using MS quantitation has independently identified two of those 16 loci, showing genome-wide association with N-glycan traits. MS quantitation also allowed us to replicate 6 further loci identified in the discovery analysis, using comparable N-glycan traits measured by the two methods. However, in this follow-up analysis we were unable to replicate associations for the remaining 8 loci. This was not unexpected, because those glycosylation traits correspond to different fucosylated glycans; since fucosylation was not quantified by MS, the association between glycans measured by MS and those regions should not be expected.

Among the nine loci that reached genome-wide statistical significance, four involved genes encoding glycosyltransferases known to glycosylate IgG (ST6GALI, B4GALT1, FUT8, MGAT3,). The enzyme beta1,4-galactosyltransferase 1 is responsible for the addition of galactose to IgG glycans. Interestingly, variants in B4GALT1 gene did not affect the main measures of IgG galactosylation, but rather differences in sialylation and the percentage of bisecting GlcNAc. These associations are still biologically plausible, because galactosylation is a prerequisite for sialylation, and enzymes which add galactose and bisecting GlcNAc compete for the same substrate [72]. A potential candidate for B4GALT1 regulator is IL6ST, which codes for interleukin 6 signal transducer, because it showed stronger associations with the main measures of IgG galactosylation than B4GALT1 itself. Molecular mechanisms behind this association remain elusive, but early work on IL6 (then called PHGF) suggested that it may be relevant for glycosylation pathways in B lymphocytes [73].

Core-fucosylation of IgG has been intensively studied due to its role in antibody-dependent cell-mediated cytotoxicity (ADCC). This mechanism of killing is considered to be one of the major mechanisms of antibody-based therapeutics against tumours. Core-fucose is critically important in this process, because IgGs without core fucose on the Fc glycan have been found to have ADCC activity enhanced by up to 100-fold [74]. Alpha-(1,6)-fucosyltransferase (fucosyltransferase 8) catalyses the transfer of fucose from GDP-fucose to N-linked type complex glycopeptides, and is encoded by the FUT8 gene. We found that SNPs located near this gene influenced overall levels of fucosylation. The directly measured IgG glycome traits most strongly associated with SNPs in the FUT8 region consisted of A2, and, less strongly, A2G1 and A2G2. These associations are biologically plausible as these glycans serve as substrates for fucosyltransferase 8. Interestingly, SNPs located near the IKZF1 gene influenced fucosylation of a specific subset of glycans, especially those without bisecting GlcNAc, and were also related to the ratio of fucosylated structures with and without bisecting GlcNAc. This suggests the IKZF1 gene encoding Ikaros as a potential indirect regulator of fucosylation in B-lymphocytes by promoting the addition of bisecting GlcNAc, which then inhibits fucosylation. The analysis of IgG glycosylation in Ikzf1 haplodeficient mice confirmed the postulated role of Ikaros in the regulation of IgG glycosylation (Table 3). The effect of Ikzf1 haplodeficiency on IgG glycans manifested mainly in the decrease in bisecting GlcNAc on different glycan structures. The increase in fucose was observed only in a subset of structures, but since very high level of fucosylation was present in the wild type mouse (up to 99.8%), a further increase could not have been demonstrated.

Nearly all genome-wide significant loci in our study have already been clearly demonstrated to be associated with autoimmune diseases, haematologic cancers, and some of them also with chronic inflammation and/or neuropsychiatric disorders. Although the literature on those associations is extensive, we tried to highlight only those associations that were identified using genome-wide association studies in datasets independent from our study. We gave prominence to associations arising from GWA studies because they are typically replicable; GWA studies have sufficient power to detect true associations, and require stringent statistical testing and replication to avoid false positive results. They have been reviewed and summarized in Table 2. The table implies abundant pleiotropy between loci that control N-glycosylation (in this case, of IgG protein) and loci that have been implicated in many human diseases. Autoimmune diseases (including SLE, RA, UC and over 80 others) are generally thought to be triggered by aggressive responses of the adaptive immune system to self antigens, resulting in tissue damage and pathological sequelae [38]. Among other mechanisms, IgG autoantibodies are responsible for the chronic inflammation and destruction of healthy tissues by cross-linking Fc receptors on innate immune effector cells [75]. Class and glycosylation of IgG are important for pathogenicity of autoantibodies in autoimmune diseases (reviewed in [76]). Removal of IgG glycans leads to the loss of the proinflammatory activity, suggesting that in vivo modulation of antibody glycosylation might be a strategy to interfere with autoimmune processes [75]. Indeed, the removal of IgG glycans by injections of EndoS in vivo interfered with autoantibody-mediated proinflammatory processes in a variety of autoimmune models [75].

Results from our study suggest that IgG N-glycome composition is regulated through a complex interplay between loci affecting an overlapping spectrum of glycome measurements, and through interaction of genes directly involved in glycosylation and those that presumably have a “higher-level” regulatory function. SNPs at several different loci in this GWA study showed genome-wide significant associations with the same or similar IgG glycosylation traits. For example, SNPs at loci on chromosomes 9 (B4GALT1 region) and 3 (ST6GAL1 region) both influenced the percentage of sialylation of galactosylated fucosylated structures (without bisecting GlcNAc) in the same direction. SNPs at these loci also influenced the ratio of fucosylated monosialylated structures (with and without bisecting GlcNAc) in the opposite direction. SNPs at the locus on chromosome 9 (B4GALT1), and two loci on chromosome 22 (MGAT3 and SMARCB1-DERL3 region) simultaneously influenced the ratio of fucosylated disialylated structures with and without bisecting GlcNAc. SNPs at loci on chromosome 7 (IKZF1 region) and 14 (FUT8 region) influenced an overlapping range of traits: percentage of A2 and A2G1 glycans, and, in the opposite direction, the percentage of fucosylation of agalactosylated structures.

Finally, this study demonstrated that findings from “hypothesis-free” GWA studies, when targeted at a well defined biological phenotype of unknown relevance to human health and disease (such as N-glycans of a single plasma protein), can implicate genomic loci that were not thought to influence protein glycosylation. Moreover, unexpected pleiotropy of the implicated loci that linked them to diseases has changed this study from “hypothesis-free” to “hypothesis-driven” [77], and led us to explore biomarker potential of a very specific IgG N-glycan trait in prediction of a specific disease (SLE) with considerable success. To our knowledge, this is one of the first convincing demonstrations that GWA studies can lead to biomarker discovery for human disease. This study offers many additional opportunities to validate the role of further N-glycan biomarkers for other diseases implicated through pleiotropy.

Conclusions

A new understanding of the genetic regulation of IgG N-glycan synthesis is emerging from this study. Enzymes directly responsible for the addition of galactose, fucose and bisecting GlcNAc may not have primary responsibility for the final IgG N-glycan structures. For all three processes, genes that are not directly involved in glycosylation showed the most significant associations: IL6ST-ANKRD55 for galactosylation; IKZF1 for fucosylation; and SMARCB1-DERL3 for the addition of bisecting GlcNAc. The suggested higher-level regulation is also apparent from the differences in IgG Fab and Fc glycosylation, observed in human IgG [78], [79] and different myeloma cell lines [80], and further supported by recent observation that various external factors exhibit specific effects on glycosylation of IgG produced in cultured B lymphocytes [81].

Moreover, this study showed that it is possible to identify loci that control glycosylation of a single plasma protein using a GWAS approach, and to develop a novel class of disease biomarkers. This should lead to large advances in understanding of the role of protein glycosylation in the future. This study identified 16 genetic loci that are likely to be part of a much larger genetic network that regulates the complex process of IgG N-glycosylation and several further loci that show suggestive association with glycan traits and merit further study. Genetic variants in several of these genes were previously associated with a number of inflammatory, neoplastic and neuropsychiatric diseases across ethnically diverse populations, all of which could benefit from earlier and more accurate diagnosis based on molecular biomarkers. Variations in individual SNPs have relatively small effects, but when several polymorphisms are combined in a complex pathway like N-glycosylation, the final product of the pathway -⁠ in this case IgG N-glycan -⁠ can be significantly different, with consequences for IgG function and possibly also disease susceptibility. Our results may also provide an explanation for the reported pleiotropy and antagonistic genetic effects of loci involved in autoimmune diseases and hematologic cancers [39], [77].

Materials and Methods

Ethics statement

All research in this study that involved human participants has been approved by the appropriate ethics committees: the Ethics Committee of the University of Split Medical School for all Croatian examinees from Vis and Korcula islands; the Local Research Ethics Committees in Orkney and Aberdeen for the Orkney Complex Disease Study (ORCADES); the University of Uppsala (Dnr 2005 : 325) for all examinees from Northern Sweden; the Leiden University Medical Centre Ethical Committee for all participants in the Leiden Longevity Study (LLS); and the Ethics Committee of the London School of Hygiene and Tropical Medicine for all SLE cases and controls from Trinidad. All ethics approvals were given in compliance with the Declaration of Helsinki (World Medical Association, 2000). All human subjects included in this study have signed appropriate informed consent.

Study participants—discovery and replication cohorts

All population studies recruited adult individuals within a community irrespective of any specific phenotype. Fasting blood samples were collected, biochemical and physiological measurements taken and questionnaire data for medical history as well as lifestyle and environmental exposures were collected following similar protocols. Basic cohort descriptives are included in Table S11.

The CROATIA-Vis study includes 1008 Croatians, aged 18–93 years, who were recruited from the villages of Vis and Komiža on the Dalmatian island of Vis during 2003 and 2004 within a larger genetic epidemiology program [82]. The CROATIA-Korcula study includes 969 Croatians between the ages of 18 and 98 [83]. The field work was performed in 2007 and 2008 in the eastern part of the island, targeting healthy volunteers from the town of Korčula and the villages of Lumbarda, Žrnovo and Račišće.

The Orkney Complex Disease Study (ORCADES) was performed in the Scottish archipelago of Orkney and collected data between 2005 and 2011 [84]. Data for 889 participants aged 18 to 100 years from a subgroup of ten islands, were used for this analysis.

The Northern Swedish Population Health Study (NSPHS) is a family-based population study including a comprehensive health investigation and collection of data on family structure, lifestyle, diet, medical history and samples for laboratory analyses from peoples living in the north of Sweden [84]. Complete data were available from 179 participants aged 14 to 91 years.

DNA samples were genotyped according to the manufacturer's instructions on Illumina Infinium SNP bead microarrays (HumanHap300v1 for CROATIA-Vis, HumanHap300v2 for ORCADES and NSPHS and HumanCNV370v1 for CROATIA-Korcula). Genotypes were determined using Illumina BeadStudio software. Genotyping was successfully completed on 991 individuals from CROATIA-Vis, 953 from CROATIA-Korcula, 889 from ORCADES and 700 from NSPHS, providing a platform for genome-wide association study of multiple quantitative traits in these founder populations.

The Leiden Longevity Study (LLS) has been described in detail previously [85]. It is a family based study and consists of 1671 offspring of 421 nonagenarian sibling pairs of Dutch descent, and their 744 partners. 1848 individuals with available genotypic and IgG measurements data were included in the current analysis. Within the Leiden Longevity Study 1345 individuals were genotyped using Illumina660 W (Rotterdam, Netherlands) and 503 individuals were genotyped using Illumina OmniExpress (Estonian Biocentre, Genotyping Core Facility, Estonia).

Isolation of IgG and glycan analysis

In the discovery population cohorts (CROATIA-Vis, CROATIA-Korcula, ORCADES, and NSPHS), the IgG was isolated using protein G plates and its glycans analysed by UPLC in 2247 individuals, as reported previously [19]. Briefly, IgG glycans were labelled with 2-AB fluorescent dye and separated by hydrophilic interaction ultra-performance liquid chromatography (UPLC). Glycans were separated into 24 chromatographic peaks and quantified as relative contributions of individual peaks to the total IgG glycome. The majority of peaks contained individual glycan structures, while some contained more structures. Relative intensities of each glycan structure in each UPLC peak were determined by mass spectrometry as reported previously [19]. On the basis of these 24 directly measured “glycan traits”, additional 54 “derived traits” were calculated. These include the percentage of galactosylation, fucosylation, sialylation, etc. described in the Table S1. When UPLC peaks containing multiple traits were used to calculate derived traits, only glycans with major contribution to fluorescence intensity were used.

In the replication population cohort (Leiden Longevity Study), the IgG was isolated from plasma samples of 1848 participants. Glycosylation patterns of IgG1 and IgG2 were investigated by analysis of tryptic glycopeptides using MALDI-TOF MS. Six glycoforms per IgG subclass were determined by MALDI-TOFMS. Since the intensities of all glycoforms were related to the monogalactosylated, core-fucosylated biantennary species (glycoform B), five relative intensities were registered per IgG subclass [86].

Genotype and phenotype quality control

Genotyping quality control was performed using the same procedures for all four discovery populations (CROATIA-Vis, CROATIA-Korcula, ORCADES, and NSPHS). Individuals with a call rate less than 97% were removed as well as SNPs with a call rate less than 98% (95% for CROATIA-Vis), minor allele frequency less than 0.02 or Hardy-Weinberg equilibrium p-value less than 1×10⁻¹⁰. 924 individuals passed all quality control thresholds from CROATIA-Vis, 898 from CROATIA-Korcula, 889 from ORCADES and 656 from NSPHS.

Extreme outliers (those with values more than 3 times the interquartile distances away from either the 75th or the 25th percentile values) were removed for each glycan measure to account for errors in quantification and to remove individuals not representative of normal variation within the population. After phenotype quality control the number of individuals with complete phenotype and covariate information for the meta-analysis was 2247, consisting of 906 men and 1341 women (802 from CROATIA-Vis, 851 from CROATIA-Korcula, 415 from ORCADES, 179 from NSPHS).

In Leiden Longevity Study, GenomeStudio was used for genotyping calling algorithm. Sample call rate was >95%, and SNP exclusions criteria were Hardy-Weinberg equilibrium p value<10⁻⁴, SNP call rate<95%, and minor allele frequency <1%. The number of the overlapping SNPs that passed quality controls in both samples was 296,619.

To combine the data from the different array sets and to increase the overall coverage of the genome to up to 2.5 million SNPs, we imputed autosomal SNPs reported in the Haplotype Mapping Project (release #22, http://hapmap.ncbi.nlm.nih.gov) CEU sample. Based on the SNPs that were genotyped in all arrays and passed quality control, the imputation programmes MACH (http://www.sph.umich.edu/csg/abecasis/MACH/) or IMPUTE2 (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html) were used to obtain ca. 2.5 million SNPs for further analysis.

For replication of genome-wide significant hits identified in the discovery meta-analysis, all SNPs listed in were used and looked up in LLS. The only exception was rs11621121, which had low imputation accuracy and did not pass quality control criteria. For this SNP, a set of 11 proxy SNPs from HapMap r. 22 (all with R²>0.85) was studied. All studied SNPs had imputation quality of 0.3 or greater.

Genome-wide association analysis

In the discovery populations, genome-wide association analysis was firstly performed for each population and then combined using an inverse-variance weighted meta-analysis for all traits. Each trait was adjusted for sex, age and the first 3 principal components obtained from the population-specific identity-by-state (IBS) derived distances matrix. The residuals were transformed to ensure their normal distribution using quantile normalisation. Sex-specific analyses were adjusted for age and principal components only. The residuals expressed as z-scores were used for association analysis. The “mmscore” function of ProbABEL [87] was used for the association test under an additive model. This score test for family based association takes into account relationship structure and allowed unbiased estimations of SNP allelic effect when relatedness is present between examinees. The relationship matrix used in this analysis was generated by the “ibs” function of GenABEL (using weight = “freq” option), which uses genomic data to estimate the realized pair-wise kinship coefficient. All lambda values for the population-specific analyses were below 1.05 (Table S4), showing that this method efficiently accounts for family structure.

Inverse-variance weighted meta-analysis was performed using the MetABEL package (http://www.genabel.org) for R. SNPs with poor imputation quality (R²<0.3) were excluded prior to meta-analysis. Principal component analysis was performed using R to determine the number of independent traits used for these analyses (Table S10). 21 principal components explained 99% of the variance so an association was considered statistically significant at the genome-wide level if the p-value for an individual SNP was less than 2.27×10⁻⁹ (5×10⁻⁸/22 traits) [88]. SNPs were considered strongly suggestive with p-values between 5×10⁻⁸ and 2.27×10⁻⁹. Regions of association were visualized using the web-based software LocusZoom [89] to display the linkage disequilibrium (LD) of the region based on hg18/1000 Genomes June 1010 CEU data. The effect of the most significant SNP in each gene region expressed as percentage of the variance explained was calculated for each glycan trait adjusted for sex, age and first 3 principal components in each cohort individually using the “polygenic” function of the GenABEL package for R. Conditional analysis was undertaken for all significant and suggestive regions. GWAS was performed as described above with the additional adjustment for the dosage of the top SNP in the region for only the chromosome containing the association. Subsequent meta-analysis was performed as described previously and the results visualised using LocusZoom to ensure that the association peak have been removed.

In LLS, all IgG measurements were log-transformed. The score statistic for testing for an additive effect of a diallelic locus on quantitative phenotype was used. To account for relatedness in offspring data we used the kinship coefficients matrix when computing the variance of the score statistic. Imputation was dealt with by accounting for loss of information due to genotype uncertainty [90]. For the association analysis of the GWAS data, we applied the score test for the quantitative trait correcting for sex and age using an executable C++ program QTassoc (http://www.lumc.nl/uh, under GWAS Software). For further details we refer to supplementary online information.

Experiments in Ikzf1 knockout mice

The Ikzf1^+/− mice harbouring the Neo-PAX5-IRES-GFP knock in allele were obtained from Meinrad Busslinger (IMP, Vienna) and backcrossed to C57BL/6 mice. Both wild-type and Ikzf1Neo^+/− animals at the age of about 8 months were subjected to retro-orbital puncture to collect blood in the presence of EDTA. Samples were centrifuged for 10 minutes at room temperature and plasma was harvested. IgG was isolated and subjected to glycan analyses.

Statistical significance of the difference in distributions of IgG glycome between wild type and the Ikzf1^+/− mice was assessed using empirical version of the Hotelling's test. In brief, the empirical distribution of the Hotelling's T² statistics was worked out by permuting the group status of the animals at random without replacement 10,000 times. This empirical distribution was then contrasted with the original value of T², with the proportion of empirically observed T² values greater than or equal to the original T² regarded as the empirical p-value.

Dataset with SLE cases and matched controls

A total of 101 SLE cases and 183 controls from Trinidad were studied. The inclusion criteria for cases and controls in Trinidad were designed to restrict the sample to individuals without Indian or Chinese ancestry. Cases and controls were eligible to be included if they were resident in northern Trinidad (excluding the southern part of the island where Indians are in the majority) and they had Christian (rather than Hindu, Muslim or Chinese) first names. Identification of cases was carried out by contacting all physicians specializing in rheumatology, nephrology and dermatology at the two main public hospitals in northern Trinidad and asking for a list of all SLE patients from their out-patient clinics. At the main dermatology clinic a register of cases since 1992 was available. Furthermore, a systematic search of: (a) outpatient records at the two hospitals, (b) hospital laboratory test results positive for auto-antibodies (anti-nuclear or anti-double-stranded DNA antibody titre >1∶256) and (c) histological reports of skin biopsy examination consistent with SLE was performed. Lastly, SLE cases were also identified through the Lupus Society of Trinidad and Tobago (90% of those patients were also identified through one of the two main public hospitals). For each case, randomly chosen households in the same neighbourhood were sampled by the field team to obtain (where possible) two controls, matched with the case for sex and for 20-year age group. Cases and controls were interviewed at home or in the project office by using a custom made questionnaire.

The case definition of SLE was based on American Rheumatism Association (ARA) criteria [91], applied to medical records (available for more than 90% of cases), and to the medical history given by the patient. Informed consent for blood sampling and the use of the sample for genetic studies including estimation of admixture was obtained from each participant. Initial case ascertainment identified 264 possible cases of SLE. Of these, 72 (27%) were excluded either on the basis of their names or because their medical history did not meet ARA criteria for the diagnosis of SLE. Of the remaining 192 individuals, 54 had incomplete addresses or were not resident in northern Trinidad, four were too ill to be interviewed, eight were aged less than 18 years and two refused to participate. For 80% (99/124) of cases, two matched controls were obtained: the response rate from those invited to participate as controls was 70%. The total sample consisted of 124 cases and 219 controls aged over 20 years who completed the questionnaire. Blood samples were obtained from 122 cases and 219 controls and DNA was successfully extracted from 93% (317/341) of these. IgG glycans were successfully measured in 303 individuals. Age at sampling was not available for 17 individuals and 2 individuals were lost due to the ID mismatch.

To test predictive power of selected glycan trait, we fitted logistic regression models (including and excluding the glycan) and used predRisk function of PredictABEL package for R to evaluate the predictive ability.

Supporting Information

Zdroje

1. OpdenakkerG, RuddPM, PontingCP, DwekRA (1993) Concepts and principles of glycobiology. FASEB Journal 7 : 1330–1337.

2. SkropetaD (2009) The effect of individual N-glycans on enzyme activity. Bioorg. Med.Chem 17 : 2645–2653.

3. MarekKW, VijayIK, MarthJD (1999) A recessive deletion in the GlcNAc-1-phosphotransferase gene results in peri-implantation embryonic lethality. Glycobiology 9 : 1263–1271.

4. JaekenJ, MatthijsG (2007) Congenital disorders of glycosylation: a rapidly expanding disease family. Annu Rev Genomics Hum Genet 8 : 261–278.

5. KobataA (2008) The N-linked sugar chains of human immunoglobulin G: their unique pattern, and their functional roles. Biochim Biophys Acta 1780 : 472–478.

6. JefferisR (2005) Glycosylation of recombinant antibody therapeutics. Biotechnol Prog 21 : 11–16.

7. ZhuD, OttensmeierCH, DuMQ, McCarthyH, StevensonFK (2003) Incidence of potential glycosylation sites in immunoglobulin variable regions distinguishes between subsets of Burkitt's lymphoma and mucosa-associated lymphoid tissue lymphoma. Br J Haematol 120 : 217–222.

8. SuttonBJ, PhillipsDC (1983) The three-dimensional structure of the carbohydrate within the Fc fragment of immunoglobulin G. Biochem Soc Trans 11 : 130–132.

9. HaradaH, KameiM, TokumotoY, YuiS, KoyamaF, et al. (1987) Systematic fractionation of oligosaccharides of human immunoglobulin G by serial affinity chromatography on immobilized lectin columns. Anal Biochem 164 : 374–381.

10. ParekhRB, DwekRA, SuttonBJ, FernandesDL, LeungA, et al. (1985) Association of rheumatoid arthritis and primary osteoarthritis with changes in the glycosylation pattern of total serum IgG. Nature 316 : 452–457.

11. KanekoY, NimmerjahnF, RavetchJV (2006) Anti-inflammatory activity of immunoglobulin G resulting from Fc sialylation. Science 313 : 670–673.

12. AnthonyRM, RavetchJV (2010) A novel role for the IgG Fc glycan: the anti-inflammatory activity of sialylated IgG Fcs. J Clin Immunol 30 Suppl 1: S9–14.

13. NimmerjahnF, RavetchJV (2008) Fcgamma receptors as regulators of immune responses. Nat Rev Immunol 8 : 34–47.

14. FerraraC, StuartF, SondermannP, BrunkerP, UmanaP (2006) The carbohydrate at FcgammaRIIIa Asn-162. An element required for high affinity binding to non-fucosylated IgG glycoforms. J Biol Chem 281 : 5032–5036.

15. MizushimaT, YagiH, TakemotoE, Shibata-KoyamaM, IsodaY, et al. (2011) Structural basis for improved efficacy of therapeutic antibodies on defucosylation of their Fc glycans. Genes to Cells 16 : 1071–1080.

16. ShinkawaT, NakamuraK, YamaneN, Shoji-HosakaE, KandaY, et al. (2003) The absence of fucose but not the presence of galactose or bisecting N-acetylglucosamine of human IgG1 complex-type oligosaccharides shows the critical role of enhancing antibody-dependent cellular cytotoxicity. J Biol Chem 278 : 3466–3473.

17. IidaS, MisakaH, InoueM, ShibataM, NakanoR, et al. (2006) Nonfucosylated therapeutic IgG1 antibody can evade the inhibitory effect of serum immunoglobulin G on antibody-dependent cellular cytotoxicity through its high binding to FcgammaRIIIa. Clin Cancer Res 12 : 2879–2887.

18. PreithnerS, ElmS, LippoldS, LocherM, WolfA, et al. (2006) High concentrations of therapeutic IgG1 antibodies are needed to compensate for inhibition of antibody-dependent cellular cytotoxicity by excess endogenous immunoglobulin G. Mol Immunol 43 : 1183–1193.

19. PucicM, KnezevicA, VidicJ, AdamczykB, NovokmetM, et al. (2011) High throughput isolation and glycosylation analysis of IgG-variability and heritability of the IgG glycome in three isolated human populations. Mol Cell Proteomics 10: M111 010090.

20. KoonerJS, SaleheenD, SimX, SehmiJ, ZhangW, et al. (2011) Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nat Genet 43 : 984–989.

21. FrankeA, McGovernDP, BarrettJC, WangK, Radford-SmithGL, et al. (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet 42 : 1118–1125.

22. MellsGF, FloydJA, MorleyKI, CordellHJ, FranklinCS, et al. (2011) Genome-wide association study identifies 12 new susceptibility loci for primary biliary cirrhosis. Nat Genet 43 : 329–332.

23. AouizeratBE, VittinghoffE, MusoneSL, PawlikowskaL, KwokPY, et al. (2011) GWAS for discovery and replication of genetic loci associated with sudden cardiac arrest in patients with coronary artery disease. BMC Cardiovasc Disord 11 : 29.

24. BaranziniSE, SrinivasanR, KhankhanianP, OkudaDT, NelsonSJ, et al. (2010) Genetic variation influences glutamate concentrations in brains of patients with multiple sclerosis. Brain 133 : 2603–2611.

25. DickDM, AlievF, KruegerRF, EdwardsA, AgrawalA, et al. (2011) Genome-wide association study of conduct disorder symptomatology. Mol Psychiatry 16 : 800–808.

26. PivacN, KnezevicA, GornikO, PucicM, IglW, et al. (2011) Human plasma glycome in attention-deficit hyperactivity disorder and autism spectrum disorders. Mol Cell Proteomics 10: M110 004200.

27. HuffmanJE, KnezevicA, VitartV, KattlaJ, AdamczykB, et al. (2011) Polymorphisms in B3GAT1, SLC9A9 and MGAT5 are associated with variation within the human plasma N-glycome of 3533 European adults. Hum Mol Genet 20 : 5000–5011.

28. OdaY, OkadaT, YoshidaH, KaufmanRJ, NagataK, et al. (2006) Derlin-2 and Derlin-3 are regulated by the mammalian unfolded protein response and are required for ER-associated degradation. J Cell Biol 172 : 383–393.

29. PottierN, CheokMH, YangW, AssemM, TraceyL, et al. (2007) Expression of SMARCB1 modulates steroid sensitivity in human lymphoblastoid cells: identification of a promoter SNP that alters PARP1 binding and SMARCB1 expression. Hum Mol Genet 16 : 2261–2271.

30. ChambersJC, ZhangW, SehmiJ, LiX, WassMN, et al. (2011) Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 43 : 1131–1138.

31. SellarsM, Reina-San-MartinB, KastnerP, ChanS (2009) Ikaros controls isotype selection during immunoglobulin class switch recombination. J Exp Med 206 : 1073–1087.

32. KlugCA, MorrisonSJ, MasekM, HahmK, SmaleST, et al. (1998) Hematopoietic stem cells and lymphoid progenitors express different Ikaros isoforms, and Ikaros is localized to heterochromatin in immature lymphocytes. Proc Natl Acad Sci U S A 95 : 657–662.

33. TrevinoLR, YangW, FrenchD, HungerSP, CarrollWL, et al. (2009) Germline genomic variants associated with childhood acute lymphoblastic leukemia. Nat Genet 41 : 1001–1005.

34. PapaemmanuilE, HoskingFJ, VijayakrishnanJ, PriceA, OlverB, et al. (2009) Loci on 7p12.2, 10q21.2 and 14q11.2 are associated with risk of childhood acute lymphoblastic leukemia. Nat Genet 41 : 1006–1010.

35. Cunninghame GrahamDS, MorrisDL, BhangaleTR, CriswellLA, SyvanenAC, et al. (2011) Association of NCF2, IKZF1, IRF8, IFIH1, and TYK2 with Systemic Lupus Erythematosus. PLoS Genet 7: e1002341 doi:10.1371/journal.pgen.1002341.

36. HanJW, ZhengHF, CuiY, SunLD, YeDQ, et al. (2009) Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus. Nat Genet 41 : 1234–1237.

37. GatevaV, SandlingJK, HomG, TaylorKE, ChungSA, et al. (2009) A large-scale replication study identifies TNIP1, PRDM1, JAZF1, UHRF1BP1 and IL10 as risk loci for systemic lupus erythematosus. Nat Genet 41 : 1228–1233.

38. DavidsonA, DiamondB (2001) Autoimmune diseases. N Engl J Med 345 : 340–350.

39. SwaffordAD, HowsonJM, DavisonLJ, WallaceC, SmythDJ, et al. (2011) An allele of IKZF1 (Ikaros) conferring susceptibility to childhood acute lymphoblastic leukemia protects against type 1 diabetes. Diabetes 60 : 1041–1044.

40. BarrettJC, HansoulS, NicolaeDL, ChoJH, DuerrRH, et al. (2008) Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet 40 : 955–962.

41. GorlovaO, MartinJE, RuedaB, KoelemanBP, YingJ, et al. (2011) Identification of novel genetic markers associated with clinical phenotypes of systemic sclerosis through a genome-wide association strategy. PLoS Genet 7: e1002178 doi:10.1371/journal.pgen.1002178.

42. JallowM, TeoYY, SmallKS, RockettKA, DeloukasP, et al. (2009) Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat Genet 41 : 657–665.

43. GaneshSK, ZakaiNA, van RooijFJ, SoranzoN, SmithAV, et al. (2009) Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Nat Genet 41 : 1191–1198.

44. StahlEA, RaychaudhuriS, RemmersEF, XieG, EyreS, et al. (2010) Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet 42 : 508–514.

45. GottardoL, De CosmoS, ZhangYY, PowersC, PrudenteS, et al. (2008) A polymorphism at the IL6ST (gp130) locus is associated with traits of the metabolic syndrome. Obesity (Silver Spring) 16 : 205–210.

46. BirmannBM, TamimiRM, GiovannucciE, RosnerB, HunterDJ, et al. (2009) Insulin-like growth factor-1 -⁠ and interleukin-6-related gene variation and risk of multiple myeloma. Cancer Epidemiol Biomarkers Prev 18 : 282–288.

47. BarrettJC, LeeJC, LeesCW, PrescottNJ, AndersonCA, et al. (2009) Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region. Nat. Genet 41 : 1330–1334.

48. SilverbergMS, ChoJH, RiouxJD, McGovernDP, WuJ, et al. (2009) Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet 41 : 216–220.

49. McGovernDP, GardetA, TorkvistL, GoyetteP, EssersJ, et al. (2010) Genome-wide association identifies multiple ulcerative colitis susceptibility loci. Nat Genet 42 : 332–337.

50. AndersonCA, BoucherG, LeesCW, FrankeA, D'AmatoM, et al. (2011) Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 43 : 246–252.

51. AsanoK, MatsushitaT, UmenoJ, HosonoN, TakahashiA, et al. (2009) A genome-wide association study identifies three new susceptibility loci for ulcerative colitis in the Japanese population. Nat Genet 41 : 1325–1329.

52. KamioT, TokiT, KanezakiR, SasakiS, TandaiS, et al. (2003) B-cell-specific transcription factor BACH2 modifies the cytotoxic effects of anticancer drugs. Blood 102 : 3317–3322.

53. GrantSF, QuHQ, BradfieldJP, MarchandL, KimCE, et al. (2009) Follow-up analysis of genome-wide association data identifies novel loci for type 1 diabetes. Diabetes 58 : 290–295.

54. CooperJD, SmythDJ, SmilesAM, PlagnolV, WalkerNM, et al. (2008) Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat Genet 40 : 1399–1401.

55. BarrettJC, ClaytonDG, ConcannonP, AkolkarB, CooperJD, et al. (2009) Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat Genet 41 : 703–707.

56. PlagnolV, HowsonJM, SmythDJ, WalkerN, HaflerJP, et al. (2011) Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLoS Genet 7: e1002216 doi:10.1371/journal.pgen.1002216.

57. ChuX, PanCM, ZhaoSX, LiangJ, GaoGQ, et al. (2011) A genome-wide association study identifies two new risk loci for Graves' disease. Nat Genet 43 : 897–901.

58. DuboisPC, TrynkaG, FrankeL, HuntKA, RomanosJ, et al. (2010) Multiple common variants for celiac disease influencing immune gene expression. Nat Genet 42 : 295–302.

59. SawcerS, HellenthalG, PirinenM, SpencerCC, PatsopoulosNA, et al. (2011) Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476 : 214–219.

60. MatsuiT, LeungD, MiyashitaH, MaksakovaIA, MiyachiH, et al. (2010) Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464 : 927–931.

61. IglW, PolasekO, GornikO, KnezevicA, PucicM, et al. (2011) Glycomics meets lipidomics-associations of N-glycans with classical lipids, glycerophospholipids, and sphingolipids in three European populations. Mol Biosyst 7 : 1852–1862.

62. CozenW, LiD, BestT, Van Den BergDJ, GourraudPA, et al. (2012) A genome-wide meta-analysis of nodular sclerosing Hodgkin lymphoma identifies risk loci at 6p21.32. Blood 119 : 469–475.

63. ChungSA, TaylorKE, GrahamRR, NitithamJ, LeeAT, et al. (2011) Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production. PLoS Genet 7: e1001323 doi:10.1371/journal.pgen.1001323.

64. HorH, KutalikZ, DauvilliersY, ValsesiaA, LammersGJ, et al. (2010) Genome-wide association study identifies new HLA class II haplotypes strongly protective against narcolepsy. Nat Genet 42 : 786–789.

65. Celestino-SoperPB, ShawCA, SandersSJ, LiJ, MurthaMT, et al. (2011) Use of array CGH to detect exonic copy number variants throughout the genome in autism families detects a novel deletion in TMLHE. Hum Mol Genet 20 : 4360–4370.

66. YashinAI, WuD, ArbeevKG, UkraintsevaSV (2010) Joint influence of small-effect genetic variants on human longevity. Aging (Albany NY) 2 : 612–620.

67. PrasadRB, HoskingFJ, VijayakrishnanJ, PapaemmanuilE, KoehlerR, et al. (2010) Verification of the susceptibility loci on 7p12.2, 10q21.2, and 14q11.2 in precursor B-cell acute lymphoblastic leukemia of childhood. Blood 115 : 1765–1767.

68. SouabniA, CobaledaC, SchebestaM, BusslingerM (2002) Pax5 promotes B lymphopoiesis and blocks T cell development by repressing Notch1. Immunity 17 : 781–793.

69. MolokhiaM, McKeigueP (2006) Systemic lupus erythematosus: genes versus environment in high risk populations. Lupus 15 : 827–832.

70. KunduS, AulchenkoYS, van DuijnCM, JanssensAC (2011) PredictABEL: an R package for the assessment of risk prediction models. Eur J Epidemiol 26 : 261–264.

71. LaucG, EssafiA, HuffmanJE, HaywardC, KneževićA, et al. (2010) Genomics meets glycomics -⁠ The first GWAS study of human N-glycome identifies HNF1alpha as a master regulator of plasma protein fucosylation. PLoS Genet 6: e1001256 doi:10.1371/journal.pgen.1001256.

72. FukutaK, AbeR, YokomatsuT, OmaeF, AsanagiM, et al. (2000) Control of bisecting GlcNAc addition to N-linked sugar chains. J Biol Chem 275 : 23456–23461.

73. Van DammeJ, OpdenakkerG, SimpsonRJ, RubiraMR, CayphasS, et al. (1987) Identification of the human 26-kD protein, interferon beta 2 (IFN-beta 2), as a B cell hybridoma/plasmacytoma growth factor induced by interleukin 1 and tumor necrosis factor. J Exp Med 165 : 914–919.

74. ShieldsRL, LaiJ, KeckR, O'ConnellLY, HongK, et al. (2002) Lack of fucose on human IgG1 N-linked oligosaccharide improves binding to human Fcgamma RIII and antibody-dependent cellular toxicity. J Biol Chem 277 : 26733–26740.

75. AlbertH, CollinM, DudziakD, RavetchJV, NimmerjahnF (2008) In vivo enzymatic modulation of IgG glycosylation inhibits autoimmune disease in an IgG subclass-dependent manner. Proc Natl Acad Sci U S A 105 : 15005–15009.

76. BaudinoL, Azeredo da SilveiraS, NakataM, IzuiS (2006) Molecular and cellular basis for pathogenicity of autoantibodies: lessons from murine monoclonal autoantibodies. Springer Semin Immunopathol 28 : 175–184.

77. SivakumaranS, AgakovF, TheodoratouE, PrendergastJG, ZgagaL, et al. (2011) Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet 89 : 607–618.

78. YouingsA, ChangSC, DwekRA, ScraggIG (1996) Site-specific glycosylation of human immunoglobulin G is altered in four rheumatoid arthritis patients. Biochem J 314(Pt 2): 621–630.

79. WormaldMR, RuddPM, HarveyDJ, ChangSC, ScraggIG, et al. (1997) Variations in oligosaccharide-protein interactions in immunoglobulin G determine the site-specific glycosylation profiles and modulate the dynamic motion of the Fc oligosaccharides. Biochemistry 36 : 1370–1380.

80. MimuraY, AshtonPR, TakahashiN, HarveyDJ, JefferisR (2007) Contrasting glycosylation profiles between Fab and Fc of a human IgG protein studied by electrospray ionization mass spectrometry. J Immunol Methods 326 : 116–126.

81. WangJ, BalogCI, StavenhagenK, KoelemanCA, SchererHU, et al. (2011) Fc-glycosylation of IgG1 is modulated by B-cell stimuli. Mol Cell Proteomics 10: M110 004655.

82. VitartV, RudanI, HaywardC, GrayNK, FloydJ, et al. (2008) SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat Genet 40 : 437–442.

83. ZemunikT, BobanM, LaucG, JankovicS, RotimK, et al. (2009) Genome-wide association study of biochemical traits in Korcula Island, Croatia. Croat Med J 50 : 23–33.

84. McQuillanR, LeuteneggerAL, Abdel-RahmanR, FranklinCS, PericicM, et al. (2008) Runs of homozygosity in European populations. Am J Hum Genet 83 : 359–372.

85. SchoenmakerM, de CraenAJ, de MeijerPH, BeekmanM, BlauwGJ, et al. (2006) Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study. Eur J Hum Genet 14 : 79–84.

86. RuhaakLR, UhHW, BeekmanM, KoelemanCA, HokkeCH, et al. (2010) Decreased levels of bisecting GlcNAc glycoforms of IgG are associated with human longevity. PLoS ONE 5: e12566 doi:10.1371/journal.pone.0012566.

87. AulchenkoYS, RipkeS, IsaacsA, van DuijnCM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23 : 1294–1296.

88. Pe'erI, YelenskyR, AltshulerD, DalyMJ (2008) Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 32 : 381–385.

89. PruimRJ, WelchRP, SannaS, TeslovichTM, ChinesPS, et al. (2010) LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26 : 2336–2337.

90. UhHW, DeelenJ, BeekmanM, HelmerQ, RivadeneiraF, et al. (2011) How to deal with the early GWAS data when imputing and combining different arrays is necessary. Eur J Hum Genet 20 : 572–576.

91. TanEM, CohenAS, FriesJF, MasiAT, McShaneDJ, et al. (1982) The 1982 revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum 25 : 1271–1277.