Installing GPU version of TensorFlow™ for use in R on Windows

The other night I got TensorFlow™ (TF) and Keras-based text classifier in R to successfully run on my gaming PC that has Windows 10 and an NVIDIA GeForce GTX 980 graphics card, so I figured I'd write up a full walkthrough, since I had to make minor detours and the official instructions assume -- in my opinion -- a certain level of knowledge that might make the process inaccessible to some folks.

Why would you want to install and use the GPU version of TF? "TensorFlow programs typically run significantly faster on a GPU than on a CPU." Graphics processing units (GPUs) are typically used to render 3D graphics for video games. As a result of the race for real-time rendering of more and more realistic-looking scenes, they have gotten really good at performing vector/matrix operations and linear algebra. While CPUs are still better for general purpose computing and there is some overhead in transferring data to/from the GPU's memory, GPUs are a more powerful resource for performing those particular calculations.

Notes: For installing on Ubuntu, you can follow RStudio's instructions. If you're interested in a Python-only (sans R) installation on Linux, follow NVIDIA's instructions.

Prerequisites

  • An NVIDIA GPU with CUDA Compute Capability 3.0 or higher. Check your GPU's compute capability here. For more details, refer to Requirements to run TensorFlow with GPU support.
  • A recent version of R -- latest version is 3.4.0 at the time of writing.
    • For example, I like using Microsoft R Open (MRO) on my gaming PC with a multi-core CPU because MRO includes and links to the multi-threaded Intel Math Kernel Library (MKL), which parallelizes vector/matrix operations.
    • I also recommend installing and using the RStudio IDE.
    • You will need devtools: install.packages("devtools", repos = c(CRAN = "https://cran.rstudio.com"))
  • Python 3.5 (required for TF at the time of writing) via Anaconda (recommended):
    1. Install Anaconda3 (in my case it was Anaconda3 4.4.0), which will install Python 3.6 (at the time of writing) but we'll take care of that.
    2. Add Anaconda3 and Anaconda3/Scripts to your PATH environment variable so that python.exe and pip.exe could be found, in case you did not check that option during the installation process. (See these instructions for how to do that.)
    3. Install Python 3.5 by opening up the Anaconda Prompt (look for it in the Anaconda folder in the Start menu) and running conda install python=3.5
    4. Verify by running python --version

Setting Up

CUDA & cuDNN

  1. Presumably you've got the latest NVIDIA drivers.
  2. Install CUDA Toolkit 8.0 (or later).
  3. Download and extract CUDA Deep Neural Network library (cuDNN) v5.1 (specifically), which requires signing up for a free NVIDIA Developer account.
  4. Add the path to the bin directory (where the DLL is) to the PATH system environment variable. (See these instructions for how to do that.) For example, mine is C:\cudnn-8.0\bin

TF & Keras in R

Once you've got R, Python 3.5, CUDA, and cuDNN installed and configured:

  1. You may need to install the dev version of the processx package: devtools::install_github("r-lib/processx") because everything installed OK for me originally but when I ran devtools::update_packages() it gave me an error about processx missing, so I'm including this optional step.
  2. Install reticulate package for interfacing with Python in R: devtools::install_github("rstudio/reticulate")
  3. Install tensorflow package: devtools::install_github("rstudio/tensorflow")
  4. Install GPU version of TF (see this page for more details):
    library(tensorflow)
    install_tensorflow(gpu = TRUE)
  5. Verify by running:
    use_condaenv("r-tensorflow")
    sess = tf$Session()
    hello <- tf$constant('Hello, TensorFlow!')
    sess$run(hello)
  6. Install keras package: devtools::install_github("rstudio/keras")

You should be able to run RStudio's examples now.

Hope this helps! :D

Yo, NieR: Automata is super awesome

This weekend I got super into a new videogame called NieR: Automata (available on PS4 and PC). I saw a bunch of folks tweeting nothing but praise about it, so I decided to check out the demo on PSN. I was so blown away by it that I actually got into my car, drove to the nearest GameStop, and picked up a copy. I cannot remember the last time a game demo did that to me, if ever. This game is ⚡️E⚡️X⚡️T⚡️R⚡️E⚡️M⚡️E⚡️L⚡️Y⚡️ 💥 ⚡️G⚡️O⚡️O⚡️D⚡️, and I highly recommend it if you're into games like DmC: Devil May Cry and other PlatinumGames titles.

It borrows so many ideas from so many games and genres, but the outcome doesn't feel like a Frankenstein's monster. It all feels cohesive.

The little touches in this game are really endearing. Like when 2B gets off a ladder and does a flip onto a platform, or when she occasionally slides down the side of a ladder. The animations feel at once both completely superfluous but also absolutely necessary.

NieR: Automata is a game that I'm glad to not be reviewing, because I would be staring at an empty document, thinking, "They should have sent a poet."[1]

Probabilistic programming languages for statistical inference

This post was inspired by a question about JAGS vs BUGS vs Stan:

Explaining the differences would be too much for Twitter, so I'm just gonna give a quick explanation here.

BUGS (Bayesian inference Using Gibbs Sampling)

I was taught to do Bayesian stats using WinBUGS, which is now a very outdated (but stable) piece of software for Windows. There's also OpenBUGS, an open source version that can run on Macs and Linux PCs. Benefits include: academic papers and textbooks written in 80s, 90s, and early 2000s that use Bayesian stats might include models written in BUGS. For example, Bayesian Data Analysis (1st and 2nd editions) and Data Analysis Using Regression and Multilevel/Hierarchical Models use BUGS.

JAGS (Just Another Gibbs Sampler)

JAGS, like OpenBUGS, is available across multiple different platforms. The language it uses is basically BUGS, but with a few minor differences that require you to rewrite BUGS models to JAGS before you can run them.

I used JAGS during my time at University of Pittsburgh's neuropsych research program because we used Macs, I liked that JAGS was written from scratch, and I preferred the R interface to JAGS over the R interfaces to WinBUGS/OpenBUGS.

Stan

Stan is a newcomer and it's pretty awesome. It has a bunch of interfaces to modern data analysis tools. The language syntax was designed from scratch by people who wrote BUGS programs and thought it could be better and were inspired by R's vectorized functions. It's strict about the type of data (integer vs real number) and about parameters vs transformed parameters, which might make it harder to get into than BUGS which gives you a lot of leeway (kind of like R does), but I personally like constraints and precision since that's what allows it to be hella fast. Stan is fast because it compiles your Stan models into C++ (hence the need for strictness). I also really like Stan's Shiny app for exploring the posterior samples, which also supports MCMC output from JAGS and others.

The latest (3rd) edition of Bayesian Data Analysis has examples in Stan and Statistical Rethinking uses R and Stan, so if you're using modern textbooks to learn Bayesian statistics, you're more likely to find examples in Stan.

There are two pretty cool R interfaces to Stan that make it easier to specify your models. The first one is rethinking (accompanies the Statistical Rethinking book I linked to earlier) and then there's brms, which uses a formula syntax similar to lme4.

Stan has an active discussion board and development, so if you run into issues with a particular model or distribution, or if you're trying to do something that Stan doesn't support, you can reach out there and you'll receive help and maybe they'll even add support for whatever it is that you were trying to do.

Putting the R in romantic

I've used R for a lot of tasks unrelated to statistics or data analysis. For example, it's usually a lot easier for me to write an intelligent batch file/folder renamer or copier as an R script than a bash shell script.

Earlier today I made a collection of photos that I wanted to put on a digital picture frame to mail to my partner. I also made a set of messages that I wanted to show up randomly. What I needed to do was to shuffle the set of 260+ images in such a way that a subset of them would not show up consecutively.

To make referencing the images easier, let's call the overall set of $n$ images $Y$ (with $Y = y_1, \ldots, y_n$), and let $X \subset Y$ be the images we do not want to have consecutive pairs of after the shuffling. Let $Y' = y_{(1)}, \ldots, y_{(n)}$ be the shuffled set of images.

This was really easy to accomplish in R. I started with k <- 0; set.seed(k) and shuffled all the images (using sample.int()). Then I checked whether our very specific requirement was or was not met.

If we did end up with a pair of consecutive images from $X$, we increment $k$ by 1 and repeat the procedure until $\{y_{(i-1)}, y_{(i)}\} \not\subset X ~\forall~i = 2, \ldots, n$.

I think what makes R really nice to use for tasks like this is vectorized functions and binary operators like which(), %in%, order(), duplicated(), sample(), sub(), and grepl(), as well as data.frames that you can expand to include additional data, such as indicators of whether row $m$ is related to row $m-1$.

Next time you have to do something on the computer that is repetitive and time-consuming, I urge you to consider writing a script/program to do it for you if you know R but haven't considered it before for doing file organization.

Cheers~