# Using artificial neural networks to select upright cowpea (*Vigna unguiculata*) genotypes with high productivity and phenotypic stability

### Abstract

Cowpea (*Vigna unguiculata*) is grown in three Brazilian regions: the Midwest, North, and Northeast, and is consumed by people on low incomes. It is important to investigate the genotype x environment (GE) interaction to provide accurate recommendations for farmers. The aim of this study was to identify cowpea genotypes with high adaptability and phenotypic stability for growing in the Brazilian Cerrado, and to compare the use of artificial neural networks with the Eberhart and Russell (1966) method. Six trials with upright cowpea genotypes were conducted in 2005 and 2006 in the States of Mato Grosso do Sul and Mato Grosso. The data were subjected to adaptability and stability analysis by the Eberhart and Russell (1966) method and artificial neural networks. The genotypes MNC99-537F-4 and EVX91-2E-2 provided grain yields above the overall environment means, and exhibited high stability according to both methods. Genotype IT93K-93-10 was the most suitable for unfavorable environments. There was a high correlation between the results of both methods in terms of classifying the genotypes by their adaptability and stability. Therefore, this new approach would be effective in quantifying the GE interaction in upright cowpea breeding programs.

### INTRODUCTION

Cowpea [*Vigna unguiculata* (L.) Walp.] is one of the most important and strategic food sources in tropical and subtropical regions of the world (Torres et al., 2015a). Brazil is the third-largest producer of this crop in the world, which is grown in the Midwest, North, and Northeast, and is consumed by people on low incomes (Oliveira et al., 2013). However, Almeida et al. (2012) reported that a supply deficit often occurs in these regions, because the average Brazilian yield is extremely low (300 kg/ha). One way of increasing yield is to identify genotypes with a high yield that are suitable for Brazilian soil and climatic conditions (Santos et al., 2014a).

Crop production depends on genetic and environmental factors, in addition to interactions between them, which when significant, result in differential genotype behavior in different environmental conditions (Cruz et al., 2012). Therefore, when quantifying the magnitude of the genotype x environment interaction (GE), we should identify stable genotypes with wide adaptation capacities that can be grown in a range of environments, i.e., genotypes adapted to unfavorable environments that are suitable for small farmers using low-tech equipment, and genotypes responsive to improved environments that are suitable for high-tech equipment.

Previous studies have attempted to select cowpea genotypes with both a wide adaptability and a high phenotypic stability in different Brazilian regions (Santos et al., 2014a,b). Several statistical methods have been used, including additive main effect and multiplicative interaction (Santos et al., 2015), a Bayesian approach (Teodoro et al., 2015a,b; Barroso et al., 2016), restricted maximum likelihood/best linear unbiased prediction (Torres et al., 2015b, 2016), and the Eberhart and Russell (1966) method, which is based on linear regression (Almeida et al., 2012; Barros et al., 2013; Nunes et al., 2014). These studies have assisted in the introduction and improvement of cowpea cultivars in several tropical regions, such as the Brazilian Cerrado (Teodoro et al., 2015a,b).

The Eberhart and Russell (1966) method is widely used in genetic assessments of stability and adaptability because of its easy application, use, and interpretation of results. However, when the number of environments assessed in a breeding program is low (usually less than six), the method is inconsistent, because it can result in a failure to reject the null hypothesis. In order to solve this problem, Nascimento et al. (2013) used artificial neural networks (ANNs) in combination with the Eberhart and Russell (1966) method to classify alfalfa genotypes. Following this approach, we simulated genotypes belonging to the phenotypic adaptability and stability classes defined by Eberhart and Russell (1966), which were subsequently used in the training and validation of ANNs.

ANNs are computational techniques that create a model that simulates a neural network, which is able to quickly process a large amount of data and recognize patterns based on self-learning (Haykin, 2009). After training the ANNs, we evaluated the genotypes for phenotypic stability and adaptability. This assessment was not only executed based on the genotypes studied, but on a large collection of simulated genotypes according to predefined classes (Nascimento et al., 2013). The aims of this study were to identify cowpea genotypes with high phenotypic adaptability and stability for growing in the Brazilian Cerrado and to compare the use of ANNs with the Eberhart and Russell (1966) method.

### MATERIAL AND METHODS

Six trials were conducted in 2005 and 2006 in the municipalities of Aquidauana, Chapadão do Sul, and Dourados in the State of Mato Grosso do Sul and the municipality of Primavera do Leste, Mato Grosso (Table 1). The experiment had a randomized block design with 17 treatments and four replicates. The experimental unit consisted of four 5.0-m long rows that were spaced 0.5 m apart, with 0.25 m between plants within each row. In each experimental unit, grain yield was evaluated in the two central rows, and was corrected for 13% moisture and extrapolated to kg/ha.

### Environment (E), agricultural year (AY), site, latitude, longitude, altitude, Köppen’s classification, and sowing date of cowpea (*Vigna unguiculata*) genotypes in the State of Mato Grosso do Sul, Brazil.

E | AY | Site | Latitude | Longitude | Altitude | Köppen’s classification | Sowing date |
---|---|---|---|---|---|---|---|

1 | 2005 | Aquidauana | 22º01'S | 54º05'W | 430 m | Aw | March 21, 2005 |

2 | 2005 | Chapadão do Sul | 18º05'S | 52º04'W | 790 m | Aw | March 14, 2005 |

3 | 2005 | Dourados | 20º03'S | 55º05'W | 147 m | Cwa | April 7, 2005 |

4 | 2006 | Aquidauana | 22º01'S | 54º05'W | 430 m | Aw | March 2, 2006 |

5 | 2006 | Dourados | 20º03'S | 55º05'W | 147 m | Cwa | February 27, 2006 |

6 | 2006 | Primavera | 15º33'S | 54°17'W | 636 m | Aw | March 15, 2006 |

The data were subjected to individual analyses of variance (ANOVAs) for each environment, with the genotype effect fixed and the other effects random (Cruz et al., 2012), according to the following model:

(Equation 1)

where *Y _{ij}* is the value of the

*i*th genotype in the

*j*th block (

*i*= 1,..., g and

*j*= 1,..., b, g, and b being the number of genotypes and blocks, respectively);

*µ*is the overall mean;

*B*is the effect of the

_{j}*j*th block;

*G*is the effect of the

_{i}*i*th genotype; and

*ε*is the random error. A joint analysis of the trials was performed that included the effect of genotype as fixed and the other effects as random, according to the following model:

_{ij}(Equation 2)

where *Y _{ij}* is the value of the

*i*th genotype in the

*j*th block in the

*k*th environment (

*k*= 1, ...,

*e*,

*e*being the number of environments);

*µ*is the overall mean;

*B*is the effect of the

_{j(k)}*j*th block in

*k*environment;

*G*is the effect of the

_{i}*i*th genotype;

*GE*is the effect of the GE interaction; and

_{(ik)}*ε*is the random error. Subsequently, the data were submitted to adaptability and stability analysis by the Eberhart and Russell (1966) method and ANNs (Nascimento et al., 2013).

_{ij}The method proposed by Eberhart and Russell (1966) is based on linear regression analysis, which measures the response of each genotype to environmental variation. Therefore, for an experiment with *g* genotypes, *e* environments, and *r* repetitions, we define the following statistical model:

(Equation 3)

where *Y _{ij}* is the mean of genotype

*i*in environment

*j*;

*β*is the linear coefficient of the

_{0i}*i*th genotype;

*β*is the regression coefficient that measures the response of the

_{1i}*i*th genotype to variation in environment

*j*; and

*I*is defined as the environmental index, by the following equation:

_{j}(Equation 4)

and Ψ_{ij} are random errors, in which each component can be decomposed as the following equation:

(Equation 5)

where

(Equation 6)

and:

(Equation 7)

where *MSD _{i}* is the mean square of deviations of genotype

*i*and

*MSR*is the mean squared residue. The hypotheses of interest were H

_{0}: β

_{1i}= 1 versus H

_{1}: β

_{1i}≠ 1 and

*t*-test and an F-test, respectively.

For evaluating the adaptability and stability of genotypes by ANNs, two datasets are required: the training set and the testing set. To obtain these sets according to the classes defined, 1500 genotypes were simulated according to statistical model 1, and were evaluated in seven environments. The parameter values used for obtaining the genotypes of classes 1, 2, and 3 (Table 2), each consisting of 500 genotypes, were as follows: Class 1: *β _{1i}* ~

*U*[0.90; 1.10], and

*β*is considered equal to 1 if

_{1i}*β*∈ [0.90; 1.10]; Class 2:

_{1i}*β*~

_{1i}*U*[1.11; 2.00], and

*β*is considered greater than 1 if

_{1i}*β*∈ [1.11; 2.00]; Class 3:

_{1i}*β*~

_{1i}*U*[0.00; 0.89], and

*β*is considered lower than 1 if

_{1i}*β*∈ [0.00; 0.89]. In addition, U[a;b] was the continuous uniform probability distribution, with parameters a and b. For obtaining the three remaining classes (4, 5, and 6) in order to linearize the set of values, the simulated values were transformed for the logarithmic scale, i.e., for classes 4, 5, and 6 we had

_{1i}### Genotype classes according to the Eberhart and Russell (1966) method and their respective parametric values according to Nascimento et al. (2013).

Class | Practical classification | Parametric value |
---|---|---|

1 | General adaptability and low predictability | |

2 | Specific adaptability to favorable environments and low predictability | |

3 | Specific adaptability to unfavorable environments and low predictability | |

4 | General adaptability and high predictability | |

5 | Specific adaptability to favorable environments and high predictability | |

6 | Specific adaptability to unfavorable environments and high predictability |

The ANNs used in this study, as denoted by a back-propagation hidden layer, are described by Nascimento et al. (2013). After training and testing the ANNs, which had a maximum error of 2% for the testing set, a cotton dataset was subjected to ANNs for classification. Subsequently, classification based on adaptability and stability was conducted; for comparison, this was also performed by the Eberhart and Russell (1966) method. The ANNs were implemented in R (R Development Core Team, 2011), and the Genes software (Cruz, 2013) was used for the Eberhart and Russell (1966) method.

### RESULTS AND DISCUSSION

The individual ANOVAs revealed a significant block effect in all of the environments (Table 3), demonstrating that this design should be used in these types of experiments in order to control this source of heterogeneity. There were significant differences between the genotypes in all of the trials. The coefficients of variation obtained by the individual ANOVAs ranged between 23.08 and 34.08%, which were similar to those reported in other studies on cowpea (Rocha et al., 2007; Almeida et al., 2012; Santos et al., 2014a,b; Torres et al. 2015a,b).

### Summary of individual analyses of variance for grain yield (kg/ha) of 20 upright cowpea (*Vigna unguiculata*) genotypes in six environments (E) in the State of Mato Grosso do Sul, Brazil.

SV | d.f. | Mean square | |||||
---|---|---|---|---|---|---|---|

E1^{+} |
E2 | E3 | E4 | E5 | E6 | ||

Block | 3 | 584,978.33* | 160,801.38* | 171,117.54* | 7,255.28* | 133,215.19* | 401,399.92* |

Genotype | 19 | 181,162.89* | 141,462.97* | 603,747.18* | 44,836.59* | 39,498.11* | 70,157.38* |

Error | 57 | 66,525.70 | 49,454.98 | 45,592.55 | 5,559.47 | 5,127.79 | 17,996.46 |

Mean | - | 1,155.25 | 910.62 | 924.79 | 218.74 | 210.53 | 554.89 |

CV (%) | - | 22.32 | 24.42 | 23.08 | 34.08 | 34.01 | 24.17 |

*Significant at the 5% probability level according to an F-test; SV, source of variation; d.f., degrees of freedom; CV, coefficient of variation; ^{+}environments described in Table 1.

A summary of the joint ANOVA results is presented in Table 4. The genotype effect was not significant (P > 0.05), suggesting an absence of genetic variability among the genotypes. However, Cruz et al. (2012) reported that when the genotype effect is significant in individual ANOVAs but not in a joint ANOVA, the genetic variability present is consumed by the magnitude of the GE interaction effect.

### Summary of a joint analysis of variance for grain yield (kg·ha^{-1}) of 20 upright cowpea (*Vigna unguiculata*) genotypes in six environments (E) in the State of Mato Grosso do Sul, Brazil.

Source of variation | Degrees of freedom | Mean square |
---|---|---|

Blocks/Environment | 18 | 4376303.00 |

Genotype (G) | 19 | 6232784.45^{ns} |

Environment (E) | 5 | 62874783.73* |

GE^{+} |
66 | 14303653.23* |

Error^{+} |
221 | 10844647.31 |

Mean | - | 662.47 |

Coefficient of variation (%) | - | 33.43 |

*Significant at the 1% probability level according to an F-test; ^{ns}, not significant; ^{+}values adjusted according to the Cochran (1954) method.

Table 5 shows the mean grain yield and phenotypic adaptability and stability of the genotypes using the Eberhart and Russell (1966) method and ANNs. Genotypes MNC99-537F-4 and EVX91-2E-2 had higher grain yields than the overall average for the environments, and were highly stable according to both methods of analysis. Therefore, these genotypes are the most suitable for favorable environments and can be used by farmers that use high-tech equipment and procedures, because they can respond to environmental improvements in terms of fertilization and irrigation, among other practices. Low-tech farmers should grow the IT93K-93-10 genotype, which despite not having a higher grain yield than the overall average, was highly predictable according to both methods of analysis. Our results suggest that this genotype should maintain its production level under different environmental conditions.

### Mean grain yield and classification of 20 upright cowpea (*Vigna unguiculata*) genotypes based on phenotypic adaptability and stability by the Eberhart and Russell (1966) method and artificial neural networks in four environments in Mato Grosso do Sul, Brazil.

Genotype | Mean (kg/ha) | Eberhart and Russell (1966) | Artificial neural networks | ||
---|---|---|---|---|---|

Adaptability | Stability | Adaptability | Stability | ||

MNC99-537F-1 | 725.58 | Overall | Low | Overall | High |

MNC99-537F-4 | 891.92 | Favorable | High | Favorable | High |

MNC99-541-F5 | 716.75 | Overall | High | Overall | High |

MNC99-541-F8 | 651.01 | Favorable | High | Overall | High |

IT93K-93-10 | 514.18 | Unfavorable | High | Unfavorable | High |

Pretinho | 433.20 | Overall | High | Overall | High |

Fradinho-2 | 638.64 | Overall | High | Overall | High |

MNC99-519D-1-1-5 | 671.86 | Overall | Low | Overall | High |

MNC00-544D-10-1-2-2 | 602.69 | Overall | High | Overall | High |

MNC00-544D-14-1-2-2 | 722.08 | Overall | High | Overall | High |

MNC00-553D-8-1-2-2 | 641.91 | Overall | Low | Overall | High |

MNC00-553D-8-1-2-3 | 650.44 | Overall | High | Overall | High |

MNC00-561G-6 | 690.61 | Favorable | High | Overall | High |

EVX63-10E | 682.57 | Overall | High | Overall | High |

MNC99542F-5 | 882.23 | Overall | High | Overall | High |

EVX91-2E-2 | 722.23 | Favorable | High | Favorable | High |

MNC99-557F-2 | 494.64 | Overall | Low | Overall | High |

BRS Guariba | 667.20 | Overall | High | Overall | High |

Patativa | 753.34 | Overall | High | Overall | High |

Vita-7 | 496.39 | Unfavorable | Low | Unfavorable | High |

Agreement | Adaptability 90% | Stability 75% |

There was 90% agreement between the Eberhart and Russell (1966) method and ANNs in terms of the phenotypic adaptability of the genotypes (Table 5), and 75% agreement in terms of the phenotypic stability; this was lower than the adaptability value, probably because ANN stability is based on the Finlay and Wilkinson (1963) method, which differs from the Eberhart and Russell (1966) method by considering stability, invariance, and non-predictability. The strong agreement between the traditional Eberhart and Russell (1966) method and ANNs has been reported in studies that evaluated the GE interaction in genotypes of alfalfa (Nascimento et al., 2013), semi-prostate cowpea (Teodoro et al., 2015a), and common bean (Correa et al., 2016). This new approach is an effective method of quantifying the adaptability and stability of different genotypes in upright cowpea breeding programs. The main advantage of ANNs over the Eberhart and Russell (1966) method is that because of their non-linear structure (Haykin, 2009), they can capture the most complex features of a dataset without requiring detailed information about the process to be modeled, because they are self-learning (Nascimento et al., 2013).