Journal cover Journal topic
SOIL An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: indexed IF
    indexed
  • CiteScore value: 7.57 CiteScore
    7.57
  • SNIP value: 2.708 SNIP 2.708
  • SJR value: 2.150 SJR 2.150
  • IPP value: 7.02 IPP 7.02
  • Scimago H <br class='hide-on-tablet hide-on-mobile'>index value: 17 Scimago H
    index 17
Discussion papers
https://doi.org/10.5194/soil-2018-44
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/soil-2018-44
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Original research article 29 Jan 2019

Original research article | 29 Jan 2019

Review status
This discussion paper is a preprint. It is a manuscript under review for the journal SOIL (SOIL).

Word embeddings for application in geosciences: development, evaluation and examples of soil-related concepts

José Padarian and Ignacio Fuentes José Padarian and Ignacio Fuentes
  • Sydney Institute of Agriculture & School of Life and Environmental Sciences, The University of Sydney, New South Wales, Australia

Abstract. A large amount of descriptive information is available in most disciplines of geosciences. This information is usually considered subjective and ill-favoured compared with its numerical counterpart. Considering the advances in natural language processing and machine learning, it is possible to utilise descriptive information and encode it as dense vectors. These word embeddings lay on a multi-dimensional space where angles and distances have a linguistic interpretation. We used 280 764 full-text scientific articles related to geosciences to train a domain-specific language model capable of generating such embeddings. To evaluate the quality of the numerical representations, we performed three intrinsic evaluations, namely: the capacity to generate analogies, term relatedness compared with the opinion of a human subject, and categorisation of different groups of words. Since this is the first attempt to evaluate word embedding for tasks in the geosciences domain, we created a test suite specific for geosciences. We compared our results with general domain embeddings commonly used in other disciplines. As expected, our domain-specific embeddings (GeoVec) outperformed general domain embeddings in all tasks, with an overall performance improvement of 107.9 %. The resulting embedding and test suite will be made available for other researchers to use an expand.

José Padarian and Ignacio Fuentes
Interactive discussion
Status: final response (author comments only)
Status: final response (author comments only)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Login for Authors/Topical Editors] [Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement
José Padarian and Ignacio Fuentes
José Padarian and Ignacio Fuentes
Viewed  
Total article views: 275 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
206 65 4 275 3 2
  • HTML: 206
  • PDF: 65
  • XML: 4
  • Total: 275
  • BibTeX: 3
  • EndNote: 2
Views and downloads (calculated since 29 Jan 2019)
Cumulative views and downloads (calculated since 29 Jan 2019)
Viewed (geographical distribution)  
Total article views: 151 (including HTML, PDF, and XML) Thereof 148 with geography defined and 3 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Cited  
Saved  
No saved metrics found.
Discussed  
No discussed metrics found.
Latest update: 25 Apr 2019
Publications Copernicus
Download
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to "rescue" this information and transform it into a numerical form (embeddings). We used 280 764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embeddings tasks such as analogies, relatedness and categorisation, and can be used in novel applications.
A large amount of descriptive information is available in geosciences. Considering the advances...
Citation