JuliaStats Logo

JuliaStats

Statistics and Machine Learning made easy in Julia.

  • Easy to use tools for statistics and machine learning.
  • Extensible and reusable models and algorithms
  • Efficient and scalable implementation
  • Community driven, and open source
Learn more

Packages

We bring together a number of great packages

StatsBase

Basic functionalities for statistics

  • Descriptive statistics and moments
  • sampling with/without replacement
  • Counting and ranking
  • Autocorrelation and cross-correlation
  • Weighted statistics

DataArrays

Arrays that allow missing data

  • Data arrays with missing values
  • Optimized representation of arrays comprised of repetitive values
  • Computational routines that work with missing values

DataFrames

Essential tools for tabular data

  • DataFrames to represent tabular datasets
  • Database-style joins and indexing
  • Split-apply-combine operations, pivoting
  • Formula and model frames

Distributions

Probability distributions

  • A large collection of univariate, multivariate distributions
  • descriptive stats, pdf/pmf, and mgf
  • Efficient sampling
  • Maximum likelihood estimation

MultivariateStats

Multivariate statistical analysis

  • Linear regression (LSQ and Ridge)
  • Dimensionality reduction (PCA,CCA,ICA,...)
  • Multidimensional scaling
  • Linear discriminant analysis

HypothesisTests

Hypothesis tests

  • Parametric tests: t-tests
  • Nonparametric tests: binomial tests, sign tests, exact tests, U tests, rank tests, etc

MLBase

Swiss knife for machine learning

  • Data preprocessing
  • Score-based classification
  • Performance evaluation
  • Model selection, cross validation

Distances

Various distances between vectors

  • A large variety of metrics
  • Efficient column-wise and pairwise computation
  • Support weighted distances

KernelDensity

Kernel density estimation

  • Kernel density estimation for univariate and bivariate data
  • User customization of interpolation points, kernel, and bandwidth

Clustering

Algorithms for data clustering

  • K-means
  • K-medoids
  • Affinity propagation
  • Evaluation of clustering performance

GLM

Generalized linear models

  • Friendly API for fitting GLM to data
  • Work with data frames and formulas
  • A variety of link types
  • Optimized implementation

NMF

Nonnegative matrix factorization

  • A variety of NMF algorithms, including Lee & Seung's, Projected ALS and projected gradient, with optimized implementation.
  • NNDSVD initialization

RegERMs

Regularized empirical risk minimization

  • Foundational framework for regression analysis
  • Support ridge regression, logistic regression, and many more (e.g. user-provided loss)
  • Solvers: L-BFGS and SGD
  • Highly configurable and extensible

MCMC

Markov Chain Monte Carlo

  • A generic engine for Bayesian inference
  • A variety of samplers, using latest techniques
  • User-friendly syntax for model specification
  • Use auto-differentiation
  • ability to suspend and resume

TimeSeries

Time series analysis

  • Tools to represent, manipulate, and apply computation to time series data
  • Based on data frames

Community

We have an active and friendly community.

Mailing list (on Google groups): julia-stats

Github page: https://github.com/JuliaStats

We discuss our blueprints on Roadmap.jl.