Research Article

Self-declared ethnicity and genomic ancestry in prostate cancer patients from Brazil


Some studies of polymorphisms in prostate cancer (PCa) analyze individuals in a uniform manner, regardless of genetic ancestry. However, PCa aggressiveness differs between subjects of African descent and those of European extraction. Thus, genetic ancestry analysis may be used to detect population stratification in case-control association studies. We genotyped 11 ancestry informative markers to estimate the contributions of African, European, and Amerindian ancestries in a case-control sample of 213 individuals from Bahia State, Northeast Brazil, including 104 PCa patients. We compared this data with self-reported ancestry and the stratification of cases by PCa aggressiveness according to Gleason score. A larger African genetic contribution (44%) was detected among cases, and a greater European contribution (61%) among controls. Self-declaration data revealed that 74% of PCa patients considered themselves non-white (black and brown), and 41.3% of controls viewed themselves as white. Our data showed a higher degree of European ancestry among fast-growing cancer cases than those of intermediate and slow development. This differs from many previous studies, in which the prevalence of African ancestry has been reported for all grades. Differences were observed between degrees of PCa aggressiveness in terms of genetic ancestry. In particular, the greater European contribution among patients with high-grade PCa indicates that a population’s genetic structure can influence case-control studies. This investigation contributes to our understanding of the genetic basis of tumor aggressiveness among groups of different genetic ancestries, especially admixed populations, and has significant implications for the assessment of inter-population heterogeneity in drug treatment effects.


The Brazilian population is one of the most admixed in the world, owing to five centuries of interethnic unions between Amerindian, European, and African groups (Barbosa 1994; Alves-Silva et al., 2000; Andrade 2003; Andrade and Rocha 2005; Lins et al., 2010). However, stratification based on skin color does not reflect actual ancestral genetic contribution (Parra et al., 2003; Pimenta et al., 2006; Suarez-Kurtz et al., 2007; Santos et al., 2009; Beuten et al., 2011; Lins et al., 2011; Pena et al., 2009, 2011Parra et al., 2003).

Several studies have demonstrated relationships between particular ethnic groups and certain diseases, such as cancer. For example, using a meta-analysis, Zhao et al. (2015) found that the HNF1B rs4430796 (A>G) polymorphism decreases prostate cancer (PCa) risk among Caucasians, Americans, and Asians.

According to Instituto Nacional do Câncer, approximately 61,200 new PCa cases will be diagnosed in 2016. This corresponds to an estimated prevalence of 61.82 new cases per 100,000 men (Ministério da Saúde/Instituto Nacional de Câncer José Alencar Gomes da Silva, (2015). Previous studies have shown that polymorphisms implicated in PCa are more common among individuals of African descent than those with European ancestry (Bonilla et al., 2011; Okobia et al., 2011; Ricks-Santi et al., 2012). Ricks-Santi et al. (2012) examined 77 single nucleotide polymorphisms (SNPs) in genes encoding ribonuclease L (RNASEL), vitamin D receptor (VDR), and cytochrome P3A5 (CYP3A5). Of these variants, eight were significantly associated with PCa risk in African Americans. However, the relationship between PCa and genomic ancestry in the Brazilian population is poorly understood, and no accurate information exists concerning differences between ethnic groups in this respect.

One strategy applied to eliminate the confounding effect that population stratification introduces to case-control studies involves the use of ancestry informative markers (AIMs). Previous studies using AIMs have revealed the susceptibility of PCa association analyses to such confounding influence, and the need to take into account substructure caused by differences in relative ancestral genetic contributions between cases and controls (Erdei et al., 2011; Ricks-Santi et al., 2012; Petrovics et al., 2015). Prior investigations have explored the relationship between “skin color” and genomic ancestry estimated by AIMs (Parra et al., 2003; Reddy et al., 2003; Pimenta et al., 2006; Suarez-Kurtz et al., 2007; Whitman et al., 2010; Lins et al., 2011; Ricks-Santi et al., 2012). However, the present study evaluated genomic ancestry in a sample of PCa patients and controls from the southern region of Bahia State, Northeast Brazil, and its correlation with self-declared “race”.


Ethical aspects

This study was approved by the Research Ethics Committee of Universidade Estadual de Santa Cruz (CEP/UESC No. 11/2011), and informed consent was obtained from each volunteer. The cases comprised 104 men diagnosed with PCa, recruited at the ONCOSUL Clinic in Itabuna and the Oncology Clinic of Ilhéus - CLIONI. In addition, as the control sample, 109 men were enrolled following public campaigns. To be included in the control group, individuals had to have no family history of cancer up to the 2nd degree of kinship, and a prostate-specific antigen (PSA) level < 3 ng/mL. For data analysis, skin colors were grouped into white (European descendants) and non-white (other descents), according to self-reported categories. It is worthy of note that the official Brazilian census categories consist of white, brown, black, yellow, and indigenous; however, none of the participants declared themselves as ''yellow'' or ''indigenous''.

Collection of biological samples and genotyping

Blood samples were collected over an 18-month period (2008-2010). Peripheral blood (5 mL) was obtained from a single venipuncture and separated into two vacuum tubes: one dry and the other containing anticoagulant. Blood from the former was used for total PSA measurement, whereas the latter was taken to Laboratório de Farmacologia e Epidemiologia Molecular (LAFEM) at the UESC for DNA isolation using a FlexiGene DNA Kit (QIAGEN, Hilden, Germany). DNA was quantified with a GeneQuant spectrophotometer (GE Healthcare, Little Chalfont, UK).

To estimate genomic ancestry, 11 AIMs were selected based on frequency differences greater than 40% between the parental groups (Hoggart et al., 2003; Luizon et al., 2008). These AIMs exhibited many differences in allele frequencies (δ) between parental populations; therefore, they can be used to characterize the genetic composition of admixed populations (Parra et al., 2003; Beuten et al., 2011; Pena et al., 2011). Genotyping was performed as described by Shriver et al. (2003) and Luizon et al. (2008). Conventional polymerase chain reaction (PCR) was used to identify insertions/deletions and Alu insertions APOA1 - rs3138522, SB19.3*1 - rs3138524, MID-52 - rs16344, PV92 - rs3138523, MID-575 - rs140864, AT3 - rs3138521, while PCR-restriction fragment length polymorphism was employed to detect LPL- rs285, RB2300 - rs2252544, DRD2 - rs1079598, and CKMM - rs4884 SNPs. Both analyses were performed for all samples. Amplified products were separated by electrophoresis on agarose gels (2 and 3%) and stained with GelRed, according to the protocol established in the LAFEM. For MID AIMs, fragments were separated on a 12% denaturing polyacrylamide gel and stained with silver nitrate.

Gleason score

Gleason scores were estimated by prostate biopsy, tumor growth rate, and metastatic tendency, and were obtained by summing the most abundant Gleason patterns in the sample. The Gleason score refers to PCa architecture, with 1 (least aggressive) being well differentiated, and 5 (most aggressive) poorly differentiated. The two most common Gleason patterns observed in each specimen were added to give a total score ranging from 2 (1 + 1) to 10 (5 + 5). In this study, we categorized the samples as follows: 1) low aggressiveness, Gleason score 2-4 (well differentiated); 2) intermediate aggressiveness, Gleason score 5-6 (moderately differentiated); and 3) high aggressiveness, Gleason score 7-10 (undifferentiated).

Data analysis

The Prism 6.02 software was used to calculate the average general demographic characteristics of the study population Prism 6.02 (GraphPad Software, La Jolla, CA, USA). Allele frequencies among the samples for each locus were estimated by a direct count using FSTAT version 2.8 (Goudet, 1995). The GENEPOP program (; Raymond and Rousset, 1995) was used to test conformance to Hardy-Weinberg equilibrium (HWE) and accurately assess population differentiation. Estimates of ethnic proportions were calculated by the gene identity method using ADMIX95 (


The general characteristics of the study sample are presented in Table 1.

General characteristics of the study sample.

Variable Controls (%) Cases (%)
Mean age (years) 56 ± 9.9 74.4 ± 7.9
Gleason score - 5.9 ± 1.4
PSA average (means ± SD) <3a 32 ± 150
White 45 (41.3) 27 (26)
Non-white 64 (58.7) 77 (74)

PSA = prostate specific antigen; SD = standard deviation. aLevels lower than 3 ng/mL.

Allele frequencies of the 11 AIMs among the present samples differed from those described in the literature for African, European, and Amerindian ancestral populations (Table 2). Most *1 allele (insertion or absence of a restriction site) frequencies in our study sample were intermediate compared to those described in the ancestral populations, suggesting that mixing processes have occurred in the case and control groups. Furthermore, two markers (rs140864 and rs4884) were found at frequencies similar to those associated with European descent, and the PV92 (rs3138523) AIM was present at a rate close to that observed among Africans. In addition, MID52 (rs16344) was the only site that demonstrated a frequency close to that observed in the indigenous population (Table 2).

Ancestry informative marker allele frequencies in ancestral populations and case and control groups, and differential frequencies (δ) between ethnic groups worldwide (African1, European1, and Amerindian1,2).

Allele frequency δ
AIM Type/allele Cases (N = 104) Controls (N = 109) AFR EUR AMR AFR/EUR AFR/AMR EUR/AMR
rs3138521 Indel/ins 0.490 0.375 0.854 0.262 0.082 0.592 0.772 0.180
rs285 SNP/T 0.630 0.581 0.970 0.470 0.449 0.500 0.521 0.021
rs2252544 SNP/G 0.675 0.631 0.930 0.310 0.186 0.620 0.744 0.124
rs3138524 Alu/ins 0.617 0.682 0.418 0.927 0.649 0.509 0.231 0.278
rs16383 Indel/del 0.569 0.449 0.261 0.815 0.081 0.554 0.180 0.734
rs3138522 Alu/ins 0.853 0.896 0.407 0.937 0.974 0.530 0.567 0.037
rs1079598 SNP/C 1.000 1.000 0.063 0.144 0.665 0.081 0.602 0.521
rs4884 SNP/T 0.368 0.300 0.164 0.313 0.904 0.149 0.740 0.591
rs16344 Indel/del 0.760 0.808 0.363 0.077 0.763 0.286 0.400 0.686
rs3138523 Alu/ins 0.203 0.264 0.212 0.160 0.776 0.052 0.564 0.616
rs140864 Indel/ins 0.098 0.047 0.124 0.004 0.584 0.120 0.460 0.580
Ancestry estimate (%)
European 46 ± 3 61 ± 2
African 44 ± 2 33 ± 1
Amerindian 10 ± 2 6 ± 1
R2 99.8 99.92 P 0.084

AIM = ancestry informative marker; indel = insertion/deletion; SNP = single nucleotide polymorphism; ins = insertion; del = deletion; AFR = African; EUR = European; AMR = Amerindian; R2 = adjustment coefficient for the admixture model. 1Shriver et al. (2003); 2Luizon et al. (2008).

The 11 selected markers were found to be informative and able to differentiate between the ancestral populations. High δ values were recorded between Amerindian and African ancestral groups, demonstrating that these populations are more distantly related from each other.

Estimates of ancestry according to AIMs were as follows for the case and control groups, respectively: European, 46 and 61%; African, 44 and 33%; and Amerindian, 10 and 6% (Table 2). Differences were observed when comparing AIM-based admixture estimates and ethnicity evaluated by self-reported skin color (Table 3). The former indicated a greater European contribution of 46 and 61%, whereas the latter suggested a higher non-white contribution of 74 and 58.7% among cases and controls, respectively.

Genetic ancestry versus self-declared skin color (analysis of variance).

Contribution Cases (%) P Controls (%) P EUR (%) AFR (%) AMR (%)
AIMs “Color” AIMs “Color”
EUR / White 46 26 61 41.3 64.9 24.6 10.5
AFR / Non-white 54 74 0.96 39 58.7 0.98 47.5 46.3 6.2
Stratification of PCa patients by genetic ancestry and Gleason score, a measure of tumor aggressiveness, revealed that in the low and intermediate groups, similar African (46.2 and 46.1%, respectively) and European ancestral contributions (45.9 and 44.5%, respectively) were present. However, in the highly aggressive category, a higher estimate of European (46%) than African (40.7%) genetic ancestry was observed. All categories were associated with a low Amerindian contribution (Table 4).

Distribution of genetic ancestry among prostate cancer patients in different Gleason score categories.

Ancestry Low (%) Intermediate (%) High (%)
European 45.9 44.5 46
African 46.2 46.1 40.7
Amerindian 7.8 9.5 13.3

Low score = 1-4; intermediate score = 5-6; high score = 7-10.


In this study, we investigated the relationship between genomic ancestry, self-declared ethnicity, and PCa aggressiveness. The Brazilian population is genetically heterogeneous regarding ancestry, which allowed us to investigate the presence of population substructure and its influence on PCa development.

For this reason, it is crucial to assess the contribution of key groups (African, European, and Amerindian) to case-control samples of admixed populations. The significant values of certain variables in our analysis (HWE, pairwise associations between unlinked loci, and gene differentiation; data not shown) indicated the presence of population substructure. Variations in allele frequencies between subsamples regarding ancestral populations may be explained by the history of Brazil’s formation and recent admixture events. Bahia was the birthplace of Brazilian colonization and has a highly diverse population (Barbosa 1994; Andrade 2003; Andrade and Rocha 2005). Moreover, as ethnicity has been associated with PCa risk (Bonilla et al., 2011; Okobia et al., 2011) the collection of samples are not random processes.

The largest European contributions observed among the case and control groups were similar to those recorded in previous studies having examined AIMs in samples of the Brazilian population. For example, Pena et al. (2011) evaluated genetic ancestry in North, Northeast, Southeast, and South Brazil, finding it to be relatively uniform, and concluding that individuals reporting themselves to be “black” or “brown” in fact have a high European genetic contribution. Furthermore, Xiang et al. (2014) conducted a meta-analysis in which a significant association between the 17q12 rs4430796 polymorphism and PCa risk was identified in both Caucasian and Asian groups, but not in African-Americans.

Further analyses sampling the population of Salvador using autosomal markers and mitochondrial DNA estimated the largest ancestral contribution to be African, followed by European and Amerindian (Abe-Sandes et al., 2010; Felix et al., 2010). Thus, the current results from an all-male sample indicate that Bahia’s southern region presents an ethnic admixture moderately different from that found in the capital. This is consistent with the hypothesis suggested by Azevêdo et al. (1982), which describes a "whitening" phenomenon from the coast to the state’s interior.

Comparison of the AIM estimates of admixture with self-reported skin color showed that the study population was classified as white had 64.9, 24.6, and 10.5% European, African, and Amerindian contributions, respectively, whereas among non-whites, these values were 47.5, 46.3, and 6.2%, respectively. These findings are similar to those of Paschoalin et al. (2003), who analyzed the influence of ethnicity on PCa prevalence in northeastern Brazil, demonstrating that patients anthropologically classified as white exhibited 67.5% European, 20.8% African, and 11.7% Amerindian contributions, while in the mixed group, these values were 54.8, 36.3, and 8.9%, and among black participants were 45.3, 45.9, and 8.8%, respectively. Despite the small number of individuals analyzed in our study, the ancestry estimates obtained were close to those reported in the 2010 census by the Brazilian Institute of Geography and Statistics (IBGE 2011). Furthermore, as in the study of Brum et al. (2013), in which only 12 markers were used to differentiate ancestral populations, we were able to distinguish between the contributions of ancestral groups using a small number of AIMs.

The dissimilarity between estimated genomic ancestry and self-declared skin color might be explained by the subjectivity with which the latter is classified, such that among “mulatto” individuals, African and European influences may be found, leading to an underestimation principally of European ancestry. This is supported by the fact that the correlation between skin color and genomic ancestry does not apply in admixed populations, such as that found in Brazil (Pena et al., 2009).

We used the Gleason score as a measure of PCa aggressiveness within the patient group. This method is an effective prognostic tool and predictor of disease behavior (Berndt et al., 2015). Comparing genomic ancestry with PCa degree according to Gleason score showed that in cases of low and intermediate aggressiveness, the African genetic contribution was slightly higher than the European proportion. In the highly-aggressive group, the largest ancestral contribution was European, followed by African and Amerindian. Thus, patients with low and intermediate Gleason scores had a greater degree of African ancestry, contrary to previous reports of faster PCa progression among black American men (Powell et al., 2010). Furthermore, our analysis showed that patients with rapid PCa growth had a higher European genetic contribution, although several studies have demonstrated that black men are at greater risk of PCa compared to white men, especially in North America (Bouchardy et al., 1991), and present with a more advanced stage at diagnosis (Paschoalin et al., 2003; Coleman et al., 2008; Rebbeck et al., 2013). Our results can be attributed to the high rate of admixture in the Brazilian population resulting from five centuries of interethnic unions between parental groups. In addition, the genetic etiology of PCa varies between ethnicities, and the frequency of alleles conferring risk in different populations is so variable that the effects of such variants may not be detected in some groups (Rebbeck et al., 2013). Further studies are required to evaluate the association between ethnicity and PCa aggressiveness.

The differences between the genomic contributions among high-grade PCa patients in the present work and those found in other studies cannot be fully explained by population structure. However, this approach holds great promise for further research. In addition, our data show that Brazilian patients should be observed and analyzed differently from those of other populations.

The Brazilian population is genetically heterogeneous due to interethnic differences that have influenced genetic patterns, such as population subdivision and admixture. In this study, we identified an increased contribution of European genetic ancestry among patients with highly aggressive PCa, indicating that ethnic differences are present in admixed populations, and showing that individuals from separate populations should be treated differently in case-control studies. Thus, our results underline the importance of assessing genetic ancestry, since it provides a more accurate evaluation of a population’s genetic structure, which has significant implications for potential inter-group heterogeneity in drug treatment effects.

Study limitations

Despite the present study being performed in cancer referral centers in southern Bahia, the sample size was smaller than anticipated due to the great difficulty encountered in recruiting PCa sufferers, with only a small number of men agreeing to participate. Furthermore, the characterization of an ideal group of matched controls for this pathology was challenging. Consequently, the average age in the control group was lower than among the cases. This difficulty derived from the recruitment of control subjects at health fairs, which attract a younger and diverse audience. Moreover, the recruitment of healthy men over the age of 60 years was problematic because of their social and cultural background. The inclusion of individuals in the control group based on only the absence of a family history of cancer and PSA levels < 3 ng/mL may have resulted in men with the disease being incorporated as controls, thus introducing bias into our analysis. In addition, only a small number of AIMs were used in our study. However, in an attempt to address this problem, we chose markers with differential frequencies greater than or equal to 40%, increasing their power to distinguish between ancestral populations.