xenmaster's data-science tools

Data Science is more about learning concepts rather than software. These concepts include statistics, linear algebra, and ab-testing. But the following tools are the most commonly used in this practice.


  • Basic Computer Skills

    Learning to work in the terminal is a good skill to have for anyone working in the computer science field. And no programmer's experience is complete without learning the version control power of Git!

  • PowerShell

    Free Mac Windows Linux Website

    PowerShell (including Windows PowerShell and PowerShell Core) is a task automation and configuration management framework from Microsoft, consisting of a command-line shell and associated scripting language built on the .NET Framework.

     

    PowerShell icon
  • Terminal

    Free Mac Website

    The built in terminal window on Mac OS. Totally not bloated, does what it should in a 'lean but mean' way.

     

    Terminal icon
  • GNOME Terminal

    Free Linux BSD Website

    GNOME Terminal is a terminal emulator for the GNOME desktop environment written by Havoc Pennington and others. Terminal emulators allow users to execute commands using a real UNIX shell while remaining on their graphical desktop.[

     

    GNOME Terminal icon
  • Git

    Free Mac Windows Linux Android iPhone ... BSD Haiku Website

    Git is a free & open source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

     

    Git icon
  • Programming

    Python and R are the most commonly used programming languages. I've included additional IDEs (Integrated Development Environments) as well, two for Python (one with a desktop GUI, the other for the terminal) and one for R.

  • Python

    Free Mac Windows Linux S60 BSD ... AROS Haiku AmigaOS OpenSolaris MorphOS Website

    Python is an interpreted, interactive, object-oriented, extensible programming language. It provides an extraordinary combination of clarity and versatility, and is free and comprehensively ported.

     

    Python icon
  • Jupyter

    Free Mac Windows Linux Web Cloudron Website

    Open source, interactive data science and scientific computing across over 40 programming languages.

     

    Jupyter icon
  • IPython

    Free Mac Windows Linux Website

    IPython is an interactive shell for the Python programming language that offers enhanced introspection, additional shell syntax, syntax highlighting, tab completion and rich history. It is a component of the SciPy package.

     

    IPython icon
  • R (programming language)

    Free Mac Windows Linux BSD Website

    R is a free software environment for statistical computing and graphics.
    It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

    R is a whole language with its working bundled application as specially the "de facto" standard for data analysis and data mining. Better suited for advanced users who want all the power in their hands.

     

    R (programming language) icon
  • RStudio

    Free Mac Windows Linux Xfce Website

    RStudio™ is a new integrated development environment (IDE) for R. RStudio combines an intuitive user interface with powerful coding tools to help you get the most out of R.

    RStudio brings together everything you need to be productive with R in a single, customizable environment. Its intuitive interface and powerful coding tools help you get work done faster.

    RStudio is available for all major platforms including Windows, Mac OS X, and Linux. It can even run alongside R on a server, enabling multiple users to access the RStudio IDE using a web browser.

    Like R, RStudio is available under an open source license that guarantees the freedom to share and change the software, and to make sure it remains free software for all its users.

     

    RStudio icon
  • Data Visualization and Manipulation

    Matplotlib is a basic data visualization tool. SciPy is a great choice for manipulating data and TensorFlow is a fantastic platform if you are interested in machine learning (especially running with Keras.io and scikit-learn).

  • SciPy & Numpy

    Free Mac Windows Linux Website

    SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation.

     

    SciPy & Numpy icon
  • TensorFlow

    Free Mac Linux Website

    TensorFlow is an open source software library for machine learning in various kinds of perceptual and language understanding tasks. It was originally developed by the Google and later released under the Apache 2.0 open source license on Nov 9, 2015.

     

    TensorFlow icon
  • Databases

    Below are the most commonly used databases for raw data processing power. I've included a relational database and a noSQL database for handing document driven data, particularly useful in the big-data space!

  • PostgreSQL

    Free Mac Windows Linux BSD Website

    PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.

     

    PostgreSQL icon
  • pgAdmin

    Free Mac Windows Linux BSD Website

    pgAdmin is the most popular and feature rich open source administration and development platform for Small PostgreSQL iconPostgreSQL, the most advanced open source database in the world. The application may be used on Linux, FreeBSD, Solaris, Mac OSX and Windows platforms to manage PostgreSQL 7.3 and above running on any platform, as well as commercial and derived versions of PostgreSQL such as Postgres Plus Advanced Server and Greenplum database.

    pgAdmin is designed to answer the needs of all users, from writing simple SQL queries to developing complex databases. The graphical interface supports all PostgreSQL features and makes administration easy. The application also includes a syntax highlighting SQL editor, a server-side code editor, an SQL/batch/shell job scheduling agent, support for the Slony-I replication engine and much more. Server connection may be made using TCP/IP or Unix Domain Sockets (on *nix platforms), and may be SSL encrypted for security. No additional drivers are required to communicate with the database server.

    pgAdmin is developed by a community of PostgreSQL experts around the world and is available in more than a dozen languages. It is Free Software released under the PostgreSQL License.

     

    pgAdmin icon
  • MongoDB

    Free Mac Windows Linux Web BSD Website

    MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database. The database is document-oriented so it manages collections of JSON-like documents. Many applications can thus model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.

     

    MongoDB icon
  • MongoDB Compass

    Free Mac Windows Linux Website

    The GUI for MongoDB. Visually explore your data. Run ad hoc queries in seconds. Interact with your data with full CRUD. View and optimize your query performance. Compass empowers you to make smarter decisions about indexing, document validation, etc.

     

    MongoDB Compass icon
  • Business Data

    I've seen the following used frequently for data visualization on the business side. Pick one or more!

  • D3.js

    Free Web Self-Hosted Website

    D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. .

     

    D3.js icon
  • Matplotlib

    Free Mac Windows Linux Web Website

    matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell (ala MATLAB®* or Mathematica®†), web application servers, and six graphical user interface toolkits.

     

    Matplotlib icon
  • Power BI for Office 365

    Freemium Windows Web Android iPhone Android Tablet ... iPad Microsoft Office 365 Microsoft Office Excel Website

    Power BI for Office 365 is a self-service business intelligence (BI) solution delivered through Excel and Office 365 that provides information workers with data analysis and visualization capabilities to identify deeper business insights about their data. With Power BI for Office 365, you can connect to data in the cloud or extend your existing on-premises data sources and systems to quickly build and deploy self-service BI solutions hosted in Microsoft’s trusted enterprise cloud.

    With Power BI for Office 365, you can do more with your data:

    -- Analyze and present insights with Excel in compelling visual formats from data either on premises or in the cloud.
    -- Share reports and datasets online with data that is always kept up to date.
    -- Access and stay connected to data and reports from your mobile devices wherever you are.

     

    Power BI for Office 365 icon
  • Tableau

    Freemium Mac Windows Web Self-Hosted Website

    Tableau helps the world’s largest organizations unleash the power of their most valuable assets: their data and their people.

     

    Tableau icon
  • Microsoft Office Excel

    Commercial Mac Windows Android iPhone Windows S ... Android Tablet Windows Phone iPad Website

    Microsoft Excel, part of the Small Microsoft Office Suite iconMicrosoft Office Suite, is Microsoft's spreadsheet application. With the Microsoft Office Fluent user interface, rich data visualization, pivot table views, and professional-looking charts are easier to create and use.

    An online version, Small Excel Online iconExcel Online , is also available as part of Small Office Online iconOffice Online .

     

    Microsoft Office Excel icon
  • Distributed Cloud Computing

    Some people prefer using the cloud to do their dirty data-processing work. Pick one and go for it!

  • Apache Hadoop

    Free Mac Windows Linux Website

    Apache Hadoop is a open source software framework that supports data-intensive distributed applications licensed under the Apache v2 license. It enables applications to work with thousands of computational independent computers and petabytes of data.

     

    Apache Hadoop icon
  • Microsoft Azure

    Commercial Web Android Android Tablet Website

    Microsoft Azure and SQL Azure enable you to build, host and scale applications in Microsoft datacenters.

     

    Microsoft Azure icon
  • AWS Machine Learning

    Commercial Web Amazon Web Services Website

    Amazon Machine Learning allows developers to use machine learning. It provides visualization tools and wizards that guide you in the process of creating machine learning (ML) models. It makes it easy to obtain predictions using simple APIs.

     



Comments on xenmaster's data-science tools

Echo echo ... Feels empty in here

Maybe you want to be the first to submit a comment?

Sign up to comment, it's simple!