The Mekong Delta of Vietnam has the greatest potential for shrimp aquaculture, an activity that plays a vital role in rural development and helps approximately one million fish farmers to achieve a sustainable livelihood. However, shrimp farms in the Mekong Delta are being affected by various diseases which seriously constrain sustainable shrimp farming.

Machine learning is an advanced computer technique that can provide strong support for fisheries, and one of its applications is the prediction of disease outbreak.

Fisheries in Vietnam contribute to the development of sustainable livelihoods and to the general economy, especially in the Mekong Delta.

Shrimp farming is the most significant fisheries activity in the country. A number of these diseases, such as acute hepatopancreatic necrosis disease (AHPND), diseases caused by white spot syndrome virus (WSSV), and the disease caused by Enterocytozoon hepatopenaei (EHP disease), have disastrous effects on shrimp farms.

To reduce the risk of diseases that threaten shrimp farming, we first attempted to assess the status of disease infection in shrimp farms through visualization of the distribution of three serious diseases, namely, WSSV, EHP, and AHPND, on a map of farms located on the east coast of the Mekong Delta.

“We then extracted geographical information from this map, which was examined as a feature related to disease outbreak.”

Then, various factors, including the clinical signs of infected shrimp, cal features influencing disease, were investigated. The machine-learning technique was applied to these factors to predict the occurrence of each disease based on classification algorithms.

In shrimp ponds affected by EHP, shrimp growth is normal for the first month after stocking; however, growth slows thereafter, with the direct consequence of a decrease in shrimp farmers’ incomes. The clinical signs of EHP disease are indistinct; as such, it is difficult to recognize the infection.

“White spot disease, caused by WSSV, the sole member of the virus family Nimaviridae, is a viral disease that causes high mortality in shrimp within a short time. “

The disease has been identified as the most serious disease affecting shrimp in coastal farms. The WSSV has been the focus of much research which has resulted in the identification of the relationship between WSSV and salinity and determination of the viability of WSSV in pond sediment.

Materials and methods


The data for this study have been collected since 2013 at shrimp farms in four provinces located on the east coast of the Mekong Delta. The two main shrimp species cultured in these farms are the tiger prawn Penaeus monodon and the whiteleg shrimp Litopenaeus vannamei.

The collected dataset consisted of two main parts: clinical signs and environmental factors. The clinical signs included:

  1. gut status, differentiated as discontinuous gut, yellow liquid in gut, little food in gut, or empty gut;
  2. hepatopancreas status, defined as hepatopancreatic paleness and atrophy;
  3. slow growth;
  4. soft shell;
  5. white feces;
  6. white spots;
  7. vermiform structure; and
  8. gregarine infection.

The environmental factors consisted of temperature, salinity, pH, NO2, and NH4.

Machine learning

To predict the occurrence of disease, the dataset was divided into training and testing datasets. The training dataset was used to generate the prediction model, and the testing dataset was used to determine the model’s accuracy.

Here, the dataset consisted of the three dependent variables WSSV, EHP, and AHPND, and multiple disease labels were assigned to each farm.

Logistic regression

The logistic regression model is often applied to probabilistic prediction. In this study, the scikit-learn package for Python was used to implement the prediction.

The structure of a neural network consists of many nodes (neurons) located in layers. There are three main layers: the input, hidden, and output layers. The intelligence of this algorithm occurs through the connection and weight of nodes.

Random forest makes a prediction model by selecting samples randomly and uses features to build multiple decision trees. The result is obtained by majority voting of decision trees; therefore, the random forest is more suitable and powerful than a single decision tree.

Gradient boosting

Gradient boosting aim to make a weak learner into a strong learner, and it is developed through many applications. Here, we used gradient boosting and random forest implemented in scikit-learn of the Python Package.

Results and Discussion

To establish the distributions of the farms infected by each disease, we separately mapped the locations of farms with AHPND, EHP, and WSSV in the four provinces on the east coast of the Mekong Delta.

Specifically, the density of farms infected by AHPND was high in Ca Mau and Tra Vinh Provinces, while EHP had less effect on farms in Bac Lieu Province, and WSSV was sparsely distributed throughout the entire study area.

Subsequently, machine learning using these algorithms was used for prediction. The model attained values of 88.96% for WSSV, 86.89% for EHP, and 97.93% for AHPND; however, this algorithm showed low accuracy in the testing dataset: 72.97% (WSSV), 72.97% (EHP), and 91.89% (AHPND).

“The neural network model functioned better than the logistic regression model in our dataset. For the training dataset, the accuracies of the neural network model were 97.24% for WSSV, 95.86% for EHP, and 96.55% for AHPND. “

Notably, the model was stable in predicting the testing dataset: 83.78%, 75.67%, and 91.89% for WSSV, EHP, and AHPND, respectively.

The random forest and gradient boosting methods provided overfit models for our dataset. Because these models learned details, they performed well with the training data. However, they could not ascertain the main trends of the dataset, which resulted in worse performance.

“For the training dataset, accuracy was 100% for all disease prediction; nonetheless, these models yielded low accuracies in the testing set in comparison with those in the training set.”

Specifically, the random forest model predicted with accuracies of 83.78% for WSSV, 78.37% for EHP, and 83.78% for AHPND, and the gradient boosting method obtained accuracies of 78.37% for WSSV, 78.37% for EHP, and 81.08% for AHPND.

The large difference in accuracy between the training dataset and the testing dataset showed that these two algorithms were not suitable for our analysis.


Accurate predictions were achieved by the neural network method for both the training dataset and the testing dataset, and this method outperformed the logistic regression, random forest, and gradient boosting methods.

This study contributes to disease management by helping shrimp farmers to understand how GIS-based technology can be used to visualize disease outbreaks and to determine strategies for reducing the risk of disease.

“The combination of GIS and machine learning provided comprehensive prediction and an intuitive map which provided visualization of the distribution of disease.”

Knowledge of disease status at the local levels also allows assessment of the effectiveness of disease management activities.

Heavily infected areas may be related to weak farm management, with the latter contributing to crosscontamination between farms or an infected seed source, whereas areas with low levels of infection suggest good disease management on farms.

Based on such information, shrimp farmers can easily determine suitable locations for new farms or prepare appropriate solutions to avoid infection.

The use of GIS in this study contributed to the clarification of the outbreak and spread of disease that was analyzed based on the locations of farms, hatcheries, and river tributaries. Research reveals that the closest distance between farms and the river revealed that some farms that shared the same river water source.

Furthermore, to increase the comprehensiveness of prediction, we examined environmental factors related to conditions suitable for strong activation of pathogens.”

Temperature and salinity strongly affect disease, which tends to break out in hot weather and under conditions of high salinity, but other factors, such as pH, NH4, and NO2 levels, also influence infection rates.

These environmental factors are particularly noticeable in the Mekong Delta where hot and dry weather result in conditions favorable for higher risk of disease. Among the environmental factors, salinity contributed the most to disease prediction, followed by temperature, pH, NO2, and NH4.

Although the environment affects the estimate of the area of disease spread, this process is mainly based on evidence of whether infected farms are present.

“To improve EHP prediction accuracy, more data are required, such as the density of shrimp in ponds and details of feeding and care regimes.”

Although the Mekong Delta has many shrimp farms and disease is highly prevalent, as evidenced by huge economic losses, disease data are difficult to collect. Also, because disease outbreaks constitute a sensitive situation, shrimp farmers usually do not share information on the status of their infected farm.

Furthermore, farmers usually find treatments themselves. Additionally, disease research requires long periods of sufficient data, especially for extensive farms.

However, if data could be collected from all shrimp farms in the region, including both healthy and infected farms, the visualization of disease distribution would be clearer, and prediction would be more accurate.

“Full mapping of all farms would provide a foundation for future research, such as detection of affected populations, the effects of industrial pollutants on disease, analysis of the most suitable areas for farm development, and assessments of annual changes in shrimp farm distribution.”

Infection prediction will become more accurate when the dataset is updated with additional data and, accordingly, the estimated area of the disease will be reliably visualized in the infected area.

Additionally, given suitable data this research can be applied to protect shrimp farms in regions other than those located on the east coast of the Mekong Delta.

This is a summarized version developed by the editorial team of Aquaculture Magazine based on the review article titled “USE OF GIS AND MACHINE LEARNING TO PREDICT DISEASE IN SHRIMP FARMED ON THE EAST COAST OF THE MEKONG DELTA, VIETNAM” developed by: NGUYEN MINH KHIEM Hokkaido University – Can Tho University, YUKI TAKAHASHI – Hokkaido University, HIROKI YASUMA – Hokkaido University, DANG THI HOANG OANH – Can Tho University, TRAN NGOC HAI – Can Tho University, VU NGOC UT – Can Tho University, NOBUO KIMURA – Hokkaido University.
The original article was published on JANUARY 2022, through SPRINGER under the use of a creative commons open access license.
The full version can be accessed freely online through this link:


Please enter your comment!
Please enter your name here