
Data Science tools
My favorite Data Science tools, excluding Python data science tools :
- Programming languages
- ETL Tools
- Business Intelligence
- Statistics - Spreadsheets
- Statistics - Command Line
- Mathematics - Econometrics
Programming languages
Python is an interpreted, interactive, object-oriented, extensible programming language. It provides an extraordinary combination of clarity and versatility, and is free and comprehensively ported.
R is a free software environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R is a whole language with its working bundled application as specially the "de facto" standard for data analysis and data mining. Better suited for advanced users who want all the power in their hands.
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, largely written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing.
In addition, the Julia developer community is contributing a number of external packages through Julia’s built-in package manager at a rapid pace. IJulia, a collaboration between the IPython and Julia communities, provides a powerful browser-based graphical notebook interface to Julia.
It is based on libuv
ETL Tools
Zapier enables you to automate tasks between other online services (services like Salesforce, Basecamp, Gmail, and 249 more).
Imagine capturing Wufoo form leads automatically into Salesforce or displaying new Paypal sales in your Campfire team chat room. Zapier lets you automate all these simple tasks and get back to real work.
Hevo Data is a no-code, bi-directional data pipeline platform specially built for modern ETL, ELT, and Reverse ETL Needs. It helps data teams streamline and automate org-wide data flows that result in a saving of ~10 hours of engineering time/week and 10x faster reporting, analytics, and decision making.
The platform supports 100+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services. Over 500 data-driven companies spread across 35+ countries trust Hevo for their data integration needs. Try Hevo today and get your fully managed data pipelines up and running in just a few minutes.
Key Features: Pre-built Integrations with 100+ Data Sources Fully Automated Data Flows with No-code Interface Supports ETL, ELT, and Reverse-ETL use cases Data Modeling and Workflows Real-time Alerts & Notifications Replay/Re-run Failed Events Auto-schema Management Supports Historical and Incremental Data Load (CDC) Secure Connection with SSH Tunnel and SSL/TSL Encryption Supports Advanced Transformations
Airflow is a platform to programmaticaly author, schedule and monitor data pipelines.
Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.
Principles
Dynamic: Airflow pipelines are configuration as code (Python), allowing for dynamic pipeline generation. This allows for writting code that instantiate pipelines dynamically. Extensible: Easily define your own operators, executors and extend the library so that it fits the level of abstraction that suits your environment. Elegant: Airflow pipelines are lean and explicit. Parameterizing your scripts is built in the core of Airflow using powerful Jinja templating engine. Scalable: Airflow has a modular architecture and uses a message queue to talk to orchestrate an arbitrary number of workers. Airflow is ready to scale to infinity.
Talend leverages the open source model to make data integration available to all types of organizations, regardless of their size, level of expertise or budgetary constraints. Talend’s solutions connect to all source and target systems and they can be downloaded at no cost. Talend also offers data quality solutions, fully complementary to its data integration solutions.
Business Intelligence
Power BI for Office 365 is a self-service business intelligence (BI) solution delivered through Excel and Office 365 that provides information workers with data analysis and visualization capabilities to identify deeper business insights about their data. With Power BI for Office 365, you can connect to data in the cloud or extend your existing on-premises data sources and systems to quickly build and deploy self-service BI solutions hosted in Microsoft’s trusted enterprise cloud.
With Power BI for Office 365, you can do more with your data:
-- Analyze and present insights with Excel in compelling visual formats from data either on premises or in the cloud. -- Share reports and datasets online with data that is always kept up to date. -- Access and stay connected to data and reports from your mobile devices wherever you are.
Tableau can help anyone see and understand their data. Connect to almost any database, drag and drop to create visualizations, and share with a click.
Whether you’re driving decisions across your organization or embedding insights into your software, app, or website – choose the analytics software that works the way people think.
Also with Tableau Public you can create and share interactive charts and graphs, stunning maps, live dashboards and fun applications in minutes, then publish anywhere on the web for free.
Open-source BI for your whole team
JasperReports is a variously-licensed reporting tool & library, written in Java, which can generate HTML, PDF, Excel, CSV, and other format reports. JasperReports Server Community Edition is Open Source, licensed under the GNU Affero General Public License 3 (AGPL v4).
"JasperReports is the worlds most powerful and widely used embeddable Java reporting library for report designers and developers. JasperReports Professional includes iReport, the most popular graphical design tool for JasperReports."
STATISTICA is a tried and true analytics platform with more than two decades of history in delivering successful business results for our customers, a global user base of more than 600,000 users.
Statistics - Spreadsheets
The IBM SPSS software platform offers advanced statistical analysis, a vast library of machine-learning algorithms, text analysis, open-source extensibility, integration with big data and seamless deployment into applications. Its ease of use; flexibility and scalability make IBM SPSS accessible to users with all skill levels and outfits projects of all sizes and complexity to help you and your organization find new opportunities, improve efficiency and minimize risk.
JASP is a free and open-source graphical program for statistical analysis supported by the University of Amsterdam. It is designed to be easy to use, and familiar to users of SPSS. It offers standard analysis procedures in both their classical and Bayesian form. JASP generally produces APA style results tables and plots to ease publication. It promotes open science by integration with the Open Science Framework and reproducibility by integrating the analysis settings into the results. The development of JASP is financially supported by several universities and research funds. Main Features:
- Offers both Frequentist (classical) analyses and Bayesian analyses
- Dynamic update of all results
- Spreadsheet layout and an intuitive drag-and-drop interface
- Progressive disclosure for increased understanding
- Annotated output for communicating your results
- Integrated with The Open Science Framework (OSF)
- Support for APA format (copy graphs and tables directly into Word)
- Supports many file formats: .sav, .txt, .csv, .ods, and .jasp
Analyses methods currently available:
- A/B Test (Beta)
- ANOVA
- ANCOVA
- AUDIT (module)
- Bain (module)
- Binomial Test
- Confirmatory Factor Analysis (CFA)
- Contingency Tables (incl. Chi-Squared Test)
- Correlation: Pearson, Spearman, Kendall
- Equivalence T-Tests: Independent, Paired, One-Sample
- Exploratory Factor Analysis (EFA)
- Linear Regression
- Logistic Regression
- Log-Linear Regression
- Machine Learning
- MANOVA
- Mediation Analysis
- Meta-Analysis
- Mixed Models
- Multinomial
- Principal Component Analysis (PCA)
- Repeated Measures ANOVA
- Reliability Analysis
- Structural Equation Modeling (SEM)
- Summary Stats
- T-Tests: Independent, Paired, One-Sample
- Visual Modeling: Linear, Mixed, Generalized Linear
jamovi is a new free and open "3rd generation" statistical spreadsheet. Designed from the ground up to be easy to use, jamovi is a compelling alternative to costly statistical products such as SPSS and SAS.
PSPP is a free software application for analysis of sampled data. It has a graphical user interface and conventional command line interface. It is written in C, uses GNU Scientific Library for its mathematical routines, and plotutils for generating graphs. It is intended as a free replacement of the proprietary program, SPSS.
Statistics - Command Line
RStudio™ is an integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.
RStudio brings together everything you need to be productive with R in a single, customizable environment. Its intuitive interface and powerful coding tools help you get work done faster.
RStudio is available for all major platforms including Windows, Mac OS X, and Linux. It can even run alongside R on a server, enabling multiple users to access the RStudio IDE using a web browser.
Like R, RStudio is available under an open source license that guarantees the freedom to share and change the software, and to make sure it remains free software for all its users.
Mathematics - Econometrics
MATLAB is a numerical computing environment and programming language. Maintained by The MathWorks, MATLAB allows easy matrix manipulation, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs in other languages. Although it is numeric only, an optional toolbox uses the MuPAD symbolic engine, allowing access to computer algebra capabilities. An additional package, Simulink, adds graphical multidomain simulation and Model-Based Design for dynamic and embedded systems.
Octave is a computer program for performing numerical computations which is mostly compatible with MATLAB . It is part of the GNU Project. It is free software under the terms of the GNU General Public License.
GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. The Octave interpreter can be run in GUI mode, as a console, or invoked as part of a shell script. Octave is normally used through its interactive interface, but it can also be used to write non-interactive programs.
Scilab is a scientific software package for numerical computations providing a powerful open computing environment for engineering and scientific applications. Scilab is an open source software and includes hundreds of mathematical functions with the possibility to add interactively programs from various languages (C, C++, Fortran…). It has sophisticated data structures (including lists, polynomials, rational functions, linear systems...), an interpreter and a high level programming language.
Is a cross-platform software package for econometric analysis, written in the C programming language that can import SPSS data files.