AutoML: when you want to be completely out of the think loop.
Predictive analytics - machine learning is when a lot of data is used to predict some other data by finding which parts affect which other parts. The goal is to reduce RMSE and that is a pretty simple metric. There used to be a time when a data scientist had to munge wrangle data, create generate and select features, select models and hyperparameters and maybe even understand the data. Not anymore. Now the system just runs its self and out pops the best possible answer with less interpret-ability than there ever used to be.
Python is an interpreted, interactive, object-oriented, extensible programming language. It provides an extraordinary combination of clarity and versatility, and is free and comprehensively ported.
Useful for users of languages
R is a free software environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R is a whole language with its working bundled application as specially the "de facto" standard for data analysis and data mining. Better suited for advanced users who want all the power in their hands.
Primary Goal "driverless"
github is a better wikipedia https://github.com/hibayesian/awesome-automl-papers
DataRobot's automated machine learning platform makes it fast and easy to build and deploy accurate predictive models. Learn how you can become an AI-driven enterprise today.
The healthcare industry has massive amounts of data available in health records, clinical trials, and billings and claims processing systems; and yet, the industry still struggles to unlock value in this data to drive better patient outcomes and comply with healthcare regulations.
Automated machine learning is helping transform the billions of data points collected in electronic health records, clinical trials, and billings and claims processing into predictions that drive down costs, improve operations, and ultimately, save lives.
Cloud AutoML helps you easily train high quality custom machine learning models with limited machine learning expertise needed.
Not completely automated.
short history https://blog.datarobot.com/automated-machine-learning-short-history
H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across all nodes and machines. The algorithms are implemented on top of H2O’s distributed Map/Reduce framework and utilize the Java Fork/Join framework for multi-threading. The data is read in parallel and is distributed across the cluster and stored in memory in a columnar format in a compressed way. H2O’s data parser has built-in intelligence to guess the schema of the incoming dataset and supports data ingest from multiple sources in various formats.
H2O’s REST API allows access to all the capabilities of H2O from an external program or script via JSON over HTTP. The Rest API is used by H2O’s web interface (Flow UI), R binding (H2O-R), and Python binding (H2O-Python).
The speed, quality, ease-of-use, and model-deployment for the various cutting edge Supervised and Unsupervised algorithms like Deep Learning, Tree Ensembles, and GLRM make H2O a highly sought after API for big data data science. Requirements
At a minimum, we recommend the following for compatibility with H2O:
Operating Systems: Windows 7 or later OS X 10.9 or later Ubuntu 12.04 RHEL/CentOS 6 or later Languages: Scala, R, and Python are not required to use H2O unless you want to use H2O in those environments, but Java is always required. Supported versions include: Java 7 or later. Note: Java 9 is not yet released and is not currently supported. To build H2O or run H2O tests, the 64-bit JDK is required. To run the H2O binary using either the command line, R, or Python packages, only 64-bit JRE is required. Both of these are available on the Java download page. Scala 2.10 or later R version 3 or later Python 2.7.x or 3.5.x Browser: An internet browser is required to use H2O’s web UI, Flow.
The ML Studio is an interactive platform for data visualization, statistical modeling and machine learning applications. Based on Shiny and shinydashboard interface, with Plotly interactive data visualization, DT HTML tables and H2O machine learning and deep learning algorithms. The ML Studio provides a set of tools for the data science pipeline workflow. More details available on the package vignette.
The ML Studio package Currently available features:
Data Management -
Ability to load data from installed R package, R environment and/or csv file Modify variables attributes Data summary with dplyr functions
Interactive data visualization tool with the Plotly package, that include:
Scatter, line, histogram correlation, etc. Time series plots – seasonality, correlation etc.
Machine learning and deep learning algorithms with the H2O package, currently only classification models available (Deep Learning, Random Forest, GBM, GLM)
Under construction features:
Machine learning -
In depth model summary Ability to compare, select and save models Regression models The caret functions and models H2O grid search and autoML Deep learning applications with Keras
Time series and forecasting -
Tools for time series analysis Forecasting models with the forecast package
Data visualization – extending the current functionality
The package is available for installation with the devtools package (if devetools package is not installed please use install.packages("devtools") to install it).
Install the MLstudio
mlr provides this so that you can focus on your experiments! The framework provides supervised methods like classification, regression and survival analysis along with their corresponding evaluation and optimization methods, as well as unsupervised methods like clustering. It is written in a way that you can extend it yourself or deviate from the implemented convenience methods and your own complex experiments. package is nicely connected to the OpenML R package , which aims at supporting collaborative machine learning online and allows to easily share datasets as well as machine learning tasks, algorithms and experiments. Clear S3 interface to R classification, regression, clustering and survival analysis methods Possibility to fit, predict, evaluate and resample models Easy extension mechanism through S3 inheritance Abstract description of learners and tasks by properties Parameter system for learners to encode data types and constraints Many convenience methods and generic building blocks for your machine learning experiments Resampling methods like bootstrapping, cross-validation and subsampling Extensive visualizations for e.g. ROC curves, predictions and partial predictions Benchmarking of learners for multiple data sets Easy hyperparameter tuning using different optimization strategies, including potent configurators like iterated F-racing (irace) or sequential model-based optimization Variable selection with filters and wrappers Nested resampling of models with tuning and feature selection Cost-sensitive learning, threshold tuning and imbalance correction Wrapper mechanism to extend learner functionality in complex and custom ways Combine different processing steps to a complex data mining chain that can be jointly optimized OpenML connector for the Open Machine Learning server Extension points to integrate your own stuff Parallelization is built-in Unit-testing
Consider caret ensemble. https://moderntoolmaking.blogspot.com/2013/03/new-package-for-ensembling-r-models.html
Weka is a collection of machine learning algorithms for data mining tasks; with its own GUI.
(The application is named after a flightless bird of New Zealand that is very inquisitive.)
The algorithms can either be applied directly to a dataset or called from your own Java code.
Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
"Predict, intelligently manage, interpret behaviors, automate, Prevision.io brings artificial intelligence into your business at an affordable cost and unparalleled performance."
ENERGY Ensure a perfectly reliable electricity distribution nationwide, predict demand and production or control energy consumption.
E-COMMERCE Manage relationships with millions of customers, portfolio of hundreds of products, and predict the actions of its users.
DISTRIBUTION Analyze trends and events to predict, every day, how many customers will come to your stores.
POWER A platform running 24/7 capable of delivering results on demand through a fully automated system.
EASE Send the data is your only job. The platform does the rest for you.
SECURITY The data flow is completely secure, the data is destroyed once the models are generated.
BEST MODELS Our data scientists work every day to offer you the best platform, capable of creating models of exceptional quality, regardless of the use case or the field of application.
MODEL MANAGEMENT Manage the end-to-end process: create new projects, observe the learning phase, use the models as you wish, forecast in total autonomy
RapidMiner (Community edition) is a data mining software. It was formerly known as "YALE".
You can use RapidMiner as a stand-alone application for data analysis, or integrate it as a data-mining engine into your own products.
- Data integration, analytical ETL, data analysis and reporting into a single suite
- Powerful yet intuitive GUI (Graphical User Interface) for the design of analytical processes
- Repository for process, data and metadata management
- The only solution with metadata transformation: Forget trial and error and inspect results already at design time
- The only solution that supports on-the-fly error detection and quick fixes
- Complete and flexible: Hundreds of methods for data integration, data transformation, modeling and visualization
RapidMiner has a free version that anyone can use. The free license is limited to 10,000 rows of data and use of only 1 processor; otherwise it's full-featured. Users can get an additional 10,000 rows by referring a friend.
The paid license of RapidMiner starts at $2500/year and allows unlimited data and processors. There is an Academic License program for those in academia (https://rapidminer.com/educational-program/).
One Feature of many
IBM Watson Analytics offers smart data discovery-from the cloud
Watson Analytics offers cognitive, predictive, & visual analytics in an easy-to-use service you can use on your own to find answers in your data. Even extend your data for a complete view of your business. For example, included with paid editions, users can tap directly into a sample of the Twitter Firehose to bring social insights into business decisions. Wherever you are in your business, with Watson Analytics, you can get better data, understand your business, tell a story and think ahead.
"Watson Analytics gave me decision-making insight to my data in minutes. I tested the analytics against another program to see if the results would differ. Upon investigation, Watson Analytics gave me more detailed analysis with easy to communicate results in a few minutes as compared to days of setting up tests to run." -Mark C. Lack - Manager, Strategy Analytics & BI, Mueller Inc.
Features and Benefits at a Glance • Data preparation, refinement, management and analysis are automated and available from the cloud so you can easily work with your data-and trust the results. • Automated intelligence enables you to draw conclusions based on what's happened in your business and why. • Visualizations that best show what's important can create clear and compelling infographics to support your decisions and effectively communicate with others. • Statistical analysis, correlations and predictions help you see what's likely to happen and what you can do about it.
SAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market.
SAS is the first company to call when you need to solve complex business problems, achieve key objectives and more effectively manage your information assets. As the leader in business analytics software and services, we provide a technology platform and market-leading analytic applications to help you not only navigate today's challenges but capitalize on tomorrow's opportunities.
The IBM SPSS software platform offers advanced statistical analysis, a vast library of machine-learning algorithms, text analysis, open-source extensibility, integration with big data and seamless deployment into applications. Its ease of use; flexibility and scalability make IBM SPSS accessible to users with all skill levels and outfits projects of all sizes and complexity to help you and your organization find new opportunities, improve efficiency and minimize risk.
Amazon Machine Learning is a service that makes it easy for developers of all skill levels to use machine learning technology. Amazon Machine Learning provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Once your models are ready, Amazon Machine Learning makes it easy to obtain predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure.
Amazon Machine Learning is based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community. The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application.
Amazon Machine Learning is highly scalable and can generate billions of predictions daily, and serve those predictions in real-time and at high throughput. With Amazon Machine Learning, there is no upfront hardware or software investment, and you pay as you go, so you can start small and scale as your application grows.
BigML's goal is to create a machine learning service extremely easy to use and seamless to integrate.
Algorithmia makes applications smarter, by building a community around algorithm development, where state of the art algorithms are always live and accessible to anyone.
Comments on 'AutoML: when you want to be completely out of the think loop.'
don't forget https://www.automaticstatistician.com/index/
and ez kaggles:https://mljar.com/
really only https://github.com/EpistasisLab/tpot https://github.com/automl/auto-sklearn and maybe https://github.com/HDI-Project/ATM https://github.com/rmcantin/bayesopt
just in case http://www.ml4aad.org/automl/