My Neural Machine Translation Project – Step 0 – Hardware Selection
This is the second installment of a series of blog posts, where I want to describe my attempts to set up my own neural machine translation engine. In the previous episode, I introduced myself and the project.
For neural network applications, the choice of hardware is just as crucial as the software and algorithms, because the training of neural networks consists basically of a large number of matrix multiplications that are best done in parallel. This is why the proliferation of dedicated graphics processing units (GPUs), spurred by the invention of LCD monitors and the popularity of virtual reality games, made the current advances in artificial neural networks possible. The idea of artificial neural networks is not new. The concept has been around since the 1940s, except nobody could really accomplish any practical tasks with neural networks until sufficiently powerful GPUs came along. GPUs are constructed specifically for large scale parallel computing and matrix multiplications, while even the fastest CPUs are wired for serial computing, not parallel computing.
But I digress. This post deals with setting up the right hardware for a neural machine translation network. According to the authors of various open source NMT toolkits, a powerful GPU with at least 6 GB of dedicated, on-board GPU memory is recommended. So it was really a no-brainer when my local nerd store, Fry’s, advertised an Asus gaming PC with an Nvidia GeForce GTX 1070 graphics card with 8 GB on-board memory. The 1070 is one level down from Nvidia’s current flagship GPU, the GTX 1080 Ti, but at roughly half the price of the 1080 Ti, it is definitely the most bang for the buck at this time. Aside from Nvidia, AMD also makes good GPUs with its Radeon line, but the PC package I bought was on sale as an open-box display item, so the price couldn’t be beat. Of course, the Asus BIOS came with its own headaches, so please keep reading if you are interested in the gory technical details of the lengthy setup process that ensued.
The PC came with Windows 10 preinstalled, which is essentially useless for serious computations. All open source neural net toolkits run standard on Linux. Although I have no intention of using the “gaming PC” for gaming and thus have no immediate use for Windows, I decided to keep Windows 10 on the machine, also because it came preinstalled without an installation medium. So I repartitioned the hard drive, which is surprisingly simple in Windows 10, and installed Ubuntu in a dual boot configuration. Instructions on how to do that can be found in abundance on the web. The process was quite straightforward with a USB stick and all seemed well, until I noticed that the Nvidia card wasn’t using the proprietary Nvidia driver for Ubuntu, but another driver. This defeats the purpose of having this high-end graphics card, because the standard driver is not using all the high-end features of the card. Here is where the headaches began.
I began by installing the latest Nvidia driver for Ubuntu with the following commands (drop the sudo if you are logged in as root):
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-384
384 is the latest version at the time of writing. However, upon reboot to activate the driver, things started to go awry. Ubuntu would not let me log in and always returned to the login screen, no matter what I tried.
The culprit with my particular setup seemed to be the so-called Secure Boot settings in the Asus UEFI BIOS, which seem as useless as sliced bread to me. These secure boot settings are supposed to prevent non-Windows operating systems from using certain firmware that is not trusted by the system manufacturer, even if that firmware is by a manufacturer of one of the components — in other words, in my case Asus doesn’t seem to trust Nvidia. After rebooting, I entered the UEFI BIOS by pressing F2, and accessed the Secure Boot settings. I was unable to disable it, “Enabled” was simply greyed out, so instead I chose the option “Other OS” instead of “Windows,” as you can see in the screenshots below. This fixed one problem.
However, upon another reboot, I got a screen filled with error messages about a PCIe bus error, pcieport, etc. that wouldn’t stop, along with
kern.log files that filled up my entire TB harddisk and froze the system once the disk was full. Here, a simple additional option in the grub boot menu solved the problem:
- I went to the command line with
Ctrl + Alt + F1.
- I emptied the
kern.logfiles that had eaten up my entire harddisk with the following commands:
sudo truncate -s0 syslog
sudo truncate -s0 kern.log
- I backed up the grub configuration and then edited it as follows:
sudo cp /etc/default/grub /etc/default/grub.bak
sudo -H gedit /etc/default/grub
In gedit, I replaced the following line
GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash pci=nomsi”
MSI is short for Message Signaled Interrupts, which are supposed to be more stable against system freezes than other interrupt signals. However, it is known that the MSI support of specific combinations of hardware is inherently unstable and tends to freeze the system instead of preventing such freezes. My current setup of Asus motherboard with Intel chipset and Nvidia GeForce GPU card on Ubuntu 16.04 seems to be an example for this.
I saved the edited grub configuration file and exited gedit. Then I updated grub and restarted the system:
I have run a few GPU intensive computations (not neural net related), and everything seems well. No overflowing system log files, no strange PCIe-related errors, and no log-in or boot issues. Windows 10 also seems to work fine.
Next up: The choice of toolkit — OpenNMT or Google’s Tensorflow? Decisions, decisions…