Multi-Label Classification in Python

Scikit-multilearn is a BSD-licensed library for multi-label classification that is built on top of the well-known scikit-learn ecosystem.
pip install scikit-multilearn
Release: 0.2.0 | Supported Python versions: 2.7 / 3.x | Github | PyPi | Documentation
Stable Release: 0.1.0 | Supported Python versions: 2.7 / 3.x | Documentation
Get started
Star Fork
Lots of classifiers

Scikit-multilearn provides many native Python multi-label classifiers classifiers.

Label Relations

Use expert knowledge or infer label relationships from your data to improve your model.

Multi-label Embeddings

Embedd the label space to improve discriminative ability of your classifier.

Multi-label Deep Learning

Extend your Keras or pytorch neural networks to solve multi-label classification problems.

Efficient classification

Scikit-multilearn is faster and takes much less memory than the standard stack of MULAN, MEKA & WEKA.

Free as in BSD

The licensing model follows scikit's BSD licence, to allow maximum interopability. Some libraries if used for label space division may incur GPL requirements.

Data management

Scikit-multilearn is faster and takes much less memory than the standard stack of MULAN, MEKA & WEKA.

Multi-label stratification

Use expert knowledge or infer label relationships from your data to improve your model.

MEKA wrapper

Missing a particular classifier which exists in the Java MEKA and WEKA stack? Now you can use it like a native scikit classifier!

Well maintained

Scikit-multilearn has over 82% test coverage and undergoes continous integration on Windows 10, OS X and Ubuntu.

Scikit-compatible

Scikit-multilearn is compatible with the Scipy and scikit-learn stack. Use our classifiers with scikit, use scikit classifiers with our code.

Widely used

With over 160 stars and 60 forks scikit-multilearn is the second most popular multi-label library on github.

We're on StackOverflow

Need help? Ask a question on Stack Overflow, our community will answer.

Learn more

Scikit-multilearn offers extensive user documentation. Read the user docs, learn from recipes constructed on real data or browse the API reference to find a concrete class or function.

User docs Reference

Join the team!

Scikit-multilearn is developed

Developer docs About the project

News

0.2.0 (released 2018-12-10)

A new feature release:

  • first python implementation of multi-label SVM (MLTSVM)
  • a general multi-label embedding framework with several embedders supported (LNEMLC, CLEMS)
  • balanced k-means clusterer from HOMER implemented
  • wrapper for Keras model use in scikit-multilearn

0.1.0 [stable] (released 2018-09-04)

Fix a lot of bugs and generally improve stability, cross-platform functionality standard and unit test coverage. This release has been tested with a large set of unit tests that work across Windows. Also, new features:

  • multi-label stratification algorithm and stratification quality measures
  • a robust reorganization of label space division, alongside with a working stochastic blockmodel approach and new underlying layer - graph builders that allow using graph models for dividing the label space based not just on label co-occurence but on any kind of network relationships between labels you can come up with
  • meka wrapper works fully cross-platform now, including windows 10
  • multi-label data set downloading and load/save functionality brought in, like sklearn's dataset
  • kNN models support sparse input
  • MLARAM models support sparse input
  • BSD-compatible label space partitioning via NetworkX
  • dependence on GPL libraries made optional
  • working predict_proba added for label space partitioning methods
  • MLARAM moved to from neurofuzzy to adapt
  • test coverage increased to 94%
  • Classifier Chains allow specifying the chain order
  • lots of documentation updates