The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). labourR
is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.
The initial motivation was to retrieve the work experience history from a Curriculum Vitae for the purpose of statistical analysis of data from the Europass online CV editor. In the approach followed, the first step is to generate the term frequency–inverse document frequency numerical statistic for each term found in the ESCO occupations corpus. Then, the input query receives a score for each ESCO occupation based on the matched terms found on the ESCO vocabulary. Given an ISCO level, the classification is performed by a plurality vote in the corresponding hierarchical level of the ESCO ontologies with the highest score.
The labourR
package:
Includes the ESCO corpus and the respective ESCO to ISCO mappings.
Allows a user to enter multilingual free-form text and receive its classification in the ESCO-ISCO hierarchy.
Computations are fully vectorized and memory efficient.
Includes facilities to assist research in text mining of labour market data.
You can install the released version of labourR from CRAN with,
library(labourR)
corpus <- data.frame(
id = 1:3,
text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's")
)
For a given ISCO hierarchical level, the top suggested ISCO group is returned. num_leaves
specifies the number of ESCO occupations used by the classifier to perform a plurality vote,
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.