1.研究背景
在过去的60年里,全球塑料产量从150万吨成倍增长到3.35亿吨。由于塑料垃圾寿命长且回收率低,很大一部分塑料(约79%)最终进入海洋环境,污染了海岸线、水柱和全球海洋沉积物。在发现微塑料(<5mm)对海洋生态系统组成部分构成新威胁之后,海洋塑料碎片堆积问题开始引起公众越来越多的关注。由于微塑料是一种新兴的环境污染物,对其特性的相关综述和meta分析可以显著影响监管政策和立法措施。然而,缺乏空间分辨率是传统meta分析方法的一个主要缺点,而机器学习辅助meta分析可以克服这一限制,能更好的从空间分布来解释微塑料在全球海洋中分布的趋势和模式,这对于海洋微塑料研究具有重要的科学意义。
Over the past 60 years, global plastic production has doubled from 1.5 million tons to 335 million tons. Due to its long life and low recycling rate, a large proportion of plastic (about 79%) ends up in the Marine environment, contaminating coastlines, the water column and Marine sediments worldwide. The accumulation of plastic debris in the ocean has attracted increasing public attention following the discovery that microplastics (<5mm) pose a new threat to components of Marine ecosystems. As microplastics are an emerging environmental pollutant, relevant reviews and meta-analyses on their characteristics can significantly influence regulatory policies and legislative measures. However, the lack of spatial resolution is a major drawback of traditional meta-analysis methods. Machine learning assisted meta-analysis can overcome this limitation and better explain the trend and pattern of microplastics distribution in the global ocean from the perspective of spatial distribution, which is of great scientific significance for Marine microplastics research.
2.研究结果
Apriori算法是关联规则挖掘最流行的方法之一。它通过迭代过程生成项目(例如,微塑料聚合物)之间的关联规则,识别用户指定的支持度、置信度和提升值的高频项目集。通过Apriori算法形成的关联规则来看,碎片形状的微塑料在光谱识别过程中有更高的检测概率,这是由于它们的颜色以及更大的表面积。光谱鉴定中PE的检测概率较低,因为LDPE主要以薄膜形式存在,而且由于质地光滑柔软,其颜色更容易漂白。这也解释了PE是海洋环境中最丰富的聚合物,但它不能像PP那样形成强大的关联规则。另一个重要的发现是关于PA的,它在制定的规则中只出现过一次,但却是全球含量第三的聚合物。与PA相比,PS出现在更多的规则中。可以认为,在化学表征过程中,微纤维被检测到的频率较低,这是由于其较小的表面积,因此,文献中报道的它们的频率可能低估了其实际的丰度。
Fig. 1 8种特征的关联规则图
随机森林(RF)是一种集成机器学习算法,它可以对不同类型的输入数据(类别、因子或连续)执行回归和分类。RF算法使用系统默认参数建立并由此得出MDS图(Fig. 2-3),图中点的距离越近表示聚合物组成越相似。按空间分布分组的MDS图显示(Fig. 2),大多数来自水柱和底部沉积物的研究都被发现紧密聚集在第三个方形中,这表明在这些区域中聚合物的组成高度相似。按海洋类型划分的MDS图显示(Fig. 3),第三方格的聚类主要由来自北冰洋、北大西洋和北太平洋的研究组成海洋。这些海洋之间聚合物组成的相似性可能源于海洋洋流运动。
Fig.2 数据点的MDS图(按空间分布分组)
Fig.3 数据点的MDS图(按海洋类型分组)
Apriori algorithm is one of the most popular methods for association rule mining. It generates association rules between items (for example, microplastic polymers) through an iterative process, identifying high-frequency item sets with user-specified support, confidence, and appreciation. According to the association rules formed by the Apriori algorithm, microplastics in the shape of fragments have a higher detection probability in the process of spectral recognition, due to their color and larger surface area. The detection probability of PE in spectral identification is low because LDPE is mainly present as a film and its color is more easily bleached due to its smooth and soft texture. This also explains that PE is the most abundant polymer in the Marine environment, but it cannot form as strong association rules as PP. Another important finding concerns PA, which appears only once in the rules formulated but is the third most abundant polymer in the world. PS appears in more rules than PA. It can be argued that microfibers were detected less frequently during chemical characterization due to their small surface area and, therefore, their frequency reported in the literature may underestimate their actual abundance.
Random Forest (RF) is an integrated machine learning algorithm that can perform regression and classification on different types of input data (categories, factors, or continuities). The RF algorithm uses the default parameters of the system to establish and obtain the MDS diagram (Fig. 2-3). The closer the points in the diagram are, the more similar the polymer compositions are. MDS plots grouped by spatial distribution (Fig. 2) show that most studies from the water column and bottom sediments were found to be tightly clustered in the third square, indicating a high degree of similarity in polymer composition in these regions. The MDS diagram divided by ocean type shows (Fig. 3) that the clustering of the third-party lattice is mainly composed of studies from the Arctic Ocean, North Atlantic Ocean and North Pacific Ocean. The similarity in polymer composition between these oceans may be due to the movement of ocean currents.
3.研究意义
该研究证明了机器学习辅助meta分析是一种有效的数据分析技术,以推动海洋微塑料数据的大尺度分析,可以帮助制定相关政策以缓解海洋微塑料污染。
This study demonstrates that machine learn-assisted meta-analysis is an effective technique to advance the large-scale analysis of Marine microplastic data, which can help develop relevant policies to mitigate Marine microplastic pollution.
文章来源: Kannankai, M.P., Babu, A.J., Radhakrishnan, A., Alex, R.K., Borah, A., Devipriya, S.P., 2022. Machine learning aided meta-analysis of microplastic polymer composition in global marine environment. Journal of Hazardous Materials 440. https://doi.org/10.1016/j.jhazmat.2022.129801