Sunday, January 12, 2020

Cuda 10.1 and TensorFlow 2.1.0 - Works fine

Many months I had perfectly fine tensorflow-Gpu configured along with anaconda. I had to reinstall my anaconda for some reason and that changed my Python version to 3.7. I made many attempts to install TensorFlow-GPU but all failed. Every time my kernel used to die or restart whenever I tried model training. I was using Keras network and reinstalled that too many times. Well, in the following few paragraphs I will tell what worked and what didn't work for me. I hope it helps you too.

How it finally worked - I have GTX 700M Nvidia card and the latest driver I found online was 425.something. I installed it. Along with it, I installed VS 2017 and VS 2019 along with their builds. When I tried Cuda 10.0 and Cuda10.1 install it used to fail. I even tried to uncheck driver install from the Cuda window and still no success. Then I downloaded the latest Cuda 10.2 and unchecked driver update and that too failed.

Cuda 10.2 requires 441 and higher graphics card drivers. Cuda 10.2 itself has a compatible driver version and so I did an express Cuda 10.2 install. My graphic card was successfully updated and Cuda installed. I tried using this with tf 2.1.0 and tf 2.0.0, it detected GPU but failed in training. Finally, I decided to install Cuda 10.1 again and keeping Cuda 10.2 as it is. This time it was a success. Cuda 10.1 got installed and with tf 2.1.0 I was able to use GPU for training.

NutShell - TF 2.1.0 and Cuda 10.1 works fine. Also, I had python 3.5 and I am sure things will be fine with Python 3.7 too.



What didn't worked -

Cuda 10.0 didn't work for me at all. I reinstalled Anaconda, TF, etc. Built TF from source using Bazel etc, nothing seemed to work. I didn't get to the root cause but I was happy that TF finally worked with Cuda 10.1.

Comment below if you need any further details or share your story.
PS - This is my first technical post. In future I promise to improve the quality and add up links to other sites for further reading.