My kingdom for a venv

I’ve never enjoyed using Python. I think my feelings on it can be summed up by this video. But for whatever reason, Python is unavoidable if you want to do anything with AI/machine learning. And so as someone wanting to get into AI, I have no choice but to use it.

But I don’t have to learn to code it of course, because all the tools you need for AI area already written and available. ChatGPT is of course easy to use on the web. But what if you wanted to have a version of ChatGPT that was snarkier, or wrote better jokes, or was in whatever way tuned specifically for your needs and wants? In that case, you can always make a fine-tuned language model and use it yourself.

But that’s where Python rears its ugly head. I wanted to fine tune a language model. So I installed LLaMA, downloaded a simple model from huggingface, and got to work. 

To fine-tune a model for your own needs, you need to have data and you need to annotate that data. No time to explain how annotations work, but there are programs that make it easy. There is a program called Label Studio that I thought I could use. The instruction say to just download python, make a venv (virtual environment) and have pip (a python installer) install Label Studio. Sounds easy, right? Just 3 lines of code.

The trouble started almost immediately because despite Label Studio telling me it was available for Windows, the install instructions were actually written for Linux. I realized this and corrected it, but the trouble didn’t stop. Once I created the venv, I tried to install Label Studio, but one of the dependencies failed to install so the whole process failed.

Uh… what? Why is this program, which is available as a paid enterprise product by the way, failing to install itself due to a dependency issue? I find the missing dependency and try installing it directly to the venv, hoping that fixes the issue. But no, it still errors out. What am I missing?

So it turns out that when I directly install that dependency, it installs the latest version of it. But Label Studio is looking for a specific older version, so it still tries to install the older version when installing itself. I tried to install the specific older version, and that fails too. Apparently I can install the new version with no issues, but not the old version.

Reading the message closely, it says that to install the old version I need to have another python module installed and also add that other module to the system path. Now we’re getting into part of why I hate venv. The thing about Python is that if you install itself outside of a contained environment, it infects your computer and doesn’t get out. Ask an amateur pythonist how to remove an old version of Python, and see the blank look on their face. Just deleting the folder doesn’t fix it.

And this old version/new version bs can mess you up something fierce, because some other python module will start looking for what it needs, and find the old version instead of the new version. Or it will be sent to where the old version used to be, but finding nothing there it will error out. Venv is supposed to fix all this so you only install things into designated containers where they can’t escape.

But I can’t do that, because to install something into this venv, I have to install another package and add it to the path of my entire Windows system. So the venv isn’t even doing what it’s supposed to do!

So I gave up. I hate having to use python like this, normal programs will just come to you as an executable or a zip and you use them. Python always needs to install itself everywhere and then usually fails even then. So I won’t use Label Studio and will look for another tool instead.

If anyone knows of a good annotation tool for LLM data, hit me up.

Leave a comment