The words of Jesus on the Cross

It’s a little late, but since it’s still Easter season I was thinking about languages and in particular the language of Jesus. The gospels of course record different versions of what exactly Jesus said when he died on the cross. But Matthew and Mark record a version that sounds like it could be historical.

Matthew and Mark both record Jesus’s final sentence as “Eli, Eli, lema Sabachthani,” which means “My God, My God, why have you forsaken me?” Although they differ slightly in their spelling. Now, Jesus’s native language was Aramaic, but I’ve always been intrigued by how similar this phrase sounds to the Arabic I learned in school.

To start with “Eli” is almost identical to how you would say “My God” in modern Arabic, El = God and the noun ending -i makes it possessive for “my.” Sabach isn’t a verb I ever learned in Arabic, but if it does mean “to forsake,” then Sabachthani is also very close to how you would conjugate it in Arabic for this sentence.

But the part I’ve always been most interested in is “lema.” Now the Arabic word for “why” is “lematha,” but it’s made up of two pieces: “le” means “for” and “matha” means “what.” So “lematha” = “for what” = “why.” But there’s another word for “what” in Arabic, you can say “ma” instead of “matha” in many cases. So can you also say “lema” = “for what” = “why” in Arabic as well? I don’t know for sure, but it sounds like a likely etymology for the Aramaic word as well.

The bible gives us very few direct quotes in Aramaic, the native language of Jesus and many people in his day. It’s good to hear what we can from in their own tongue.

Happy Easter

My kingdom for a venv

I’ve never enjoyed using Python. I think my feelings on it can be summed up by this video. But for whatever reason, Python is unavoidable if you want to do anything with AI/machine learning. And so as someone wanting to get into AI, I have no choice but to use it.

But I don’t have to learn to code it of course, because all the tools you need for AI area already written and available. ChatGPT is of course easy to use on the web. But what if you wanted to have a version of ChatGPT that was snarkier, or wrote better jokes, or was in whatever way tuned specifically for your needs and wants? In that case, you can always make a fine-tuned language model and use it yourself.

But that’s where Python rears its ugly head. I wanted to fine tune a language model. So I installed LLaMA, downloaded a simple model from huggingface, and got to work. 

To fine-tune a model for your own needs, you need to have data and you need to annotate that data. No time to explain how annotations work, but there are programs that make it easy. There is a program called Label Studio that I thought I could use. The instruction say to just download python, make a venv (virtual environment) and have pip (a python installer) install Label Studio. Sounds easy, right? Just 3 lines of code.

The trouble started almost immediately because despite Label Studio telling me it was available for Windows, the install instructions were actually written for Linux. I realized this and corrected it, but the trouble didn’t stop. Once I created the venv, I tried to install Label Studio, but one of the dependencies failed to install so the whole process failed.

Uh… what? Why is this program, which is available as a paid enterprise product by the way, failing to install itself due to a dependency issue? I find the missing dependency and try installing it directly to the venv, hoping that fixes the issue. But no, it still errors out. What am I missing?

So it turns out that when I directly install that dependency, it installs the latest version of it. But Label Studio is looking for a specific older version, so it still tries to install the older version when installing itself. I tried to install the specific older version, and that fails too. Apparently I can install the new version with no issues, but not the old version.

Reading the message closely, it says that to install the old version I need to have another python module installed and also add that other module to the system path. Now we’re getting into part of why I hate venv. The thing about Python is that if you install itself outside of a contained environment, it infects your computer and doesn’t get out. Ask an amateur pythonist how to remove an old version of Python, and see the blank look on their face. Just deleting the folder doesn’t fix it.

And this old version/new version bs can mess you up something fierce, because some other python module will start looking for what it needs, and find the old version instead of the new version. Or it will be sent to where the old version used to be, but finding nothing there it will error out. Venv is supposed to fix all this so you only install things into designated containers where they can’t escape.

But I can’t do that, because to install something into this venv, I have to install another package and add it to the path of my entire Windows system. So the venv isn’t even doing what it’s supposed to do!

So I gave up. I hate having to use python like this, normal programs will just come to you as an executable or a zip and you use them. Python always needs to install itself everywhere and then usually fails even then. So I won’t use Label Studio and will look for another tool instead.

If anyone knows of a good annotation tool for LLM data, hit me up.

I’m addicted to rageahol

I don’t like writing this, but I’ll try to do so.

I’ve found that I’m too rageaholic recently. I don’t know if this is weird, but before I actually talk to people I sometimes plan out conversations in my head. What I want to say, how I want to say it, that kind of thing. All too often, conversations in my head turn into me being angry at people, attacking them, making cutting remarks, that sort of thing.

And this is happening in the real world too. I passed a woman as I biked to work recently. It was on a shared walk/bike path in the city and so I felt I had the right to be there. I’ve often noticed that walkers get really scared or heated at bikers, but I always give them a large latitude. I don’t want to hit them any more than they want to get hit.

Anyway I passed this woman with a very wide latitude, yet she still yelled out as I passed. Then, I locked up my bike to get into my job, and she came up at me complaining about how I passed her. I had already realized she was going to do this (I could tell when she yelled at me as I passed), so the conversation was heated from the beginning. I brusquely told her that I passed her well to the left, that I pass lots of walkers every day, and that she needs to share the road with bikers just as we share it with her. I didn’t even give her a chance to respond, I just walked away and said I didn’t like that she yelled at me when I didn’t do anything wrong.

But the problem is: what did other onlookers think of me?

To be clear, I really think I was in the right to pass her. It’s a shared space, you can tell by all the bikers on it and the fact that there are bike lock-ups all along the sides of it. One of which I used to lock my bike as she ran after me to complain. I’ve had assholes in cars yell at me when I bike on the road, and I think walkers who think bikers can’t ride on shared spaces are no better. I gave her a lot of space, I didn’t hit her and I wasn’t even near enough to hit her if I tried.

Could I have said something before I passed? On designated bike paths, there’s an “on your left” system to let people know you’re passing. But that’s for places where you pass someone every 5 or 10 minutes, I pass a hundred people in the few minutes it takes to get to my building, if I said something to every single one of them, I’d be hoarse at the end of the week. And besides, I don’t say “on your left” when I walk past slow walkers, I just give them enough space and go right by. I don’t say it to cars that I pass in my car either. If I’m just commuting on a bike, I feel that it should be understood that I’ll pass slow walkers wordlessly just as if I were walking past them.

So that’s me being all defensive about my actions, but still, what did people think about me is the problem. To be honest, it might not have been good. I was very heated at her, which made me act rude. I cut her off and said my piece, then left. That wasn’t the right way to do things.

What was the right way? As I said, a lot of asshole drivers don’t want bikes on the road, and a lot of asshole walkers don’t want bikes on shared walk/ride paths. I don’t want to just give in to those people and say “yes, you’re right, bikers should never exist anywhere near you.” But I needed to find a better way to stand my ground without looking like an asshole. How? How to respond to someone yelling at me without seeming like an asshole myself?

What if just said my piece more calmly? “Hey, I passed you by a wide margin, please don’t yell at me just for using the path.” Would that have been better? She might still have yelled at me, but then she’d be the asshole. Would calmly pointing out “this space is for bikes as well as walkers” been better? Would calmness as a whole have been better, or would I just have seemed snooty and stuck up?

Should I have just not responded at all as she came up to me?

I don’t think I could have improved my interaction with her specifically. Like I said, I’ve dealt with way too many drivers and walkers who are furious that the city allows bikers to exist at all, such that any legal use of a bike will bring a torrent of yelling and profanity. I can’t change their mind, they’re just assholes. But to everyone surrounding her, this could have been an interaction between an asshole lady and me, or it could have been an interaction between two assholes. And I worry it was the latter.

Maybe calmness as a whole would have been better. I need to try that next time. I’ve gamed this conversation out in my head, running through it because I don’t like how I acted and don’t like how I probably came across to other people. It’s not an important conversation, I’m sure no one on that street will even remember me by tomorrow. But it’s a microcosm of a lot of my problems, and if I’m going to fix them I need to become the type of person who would have handled that conversation better.

Interesting notes about ChatGPT

I know I’m about 2 months late to the party, but I just looked into ChatGPT and I was interested in what I found. These will be some random assessments since I don’t have the energy for a full post. Obviously they keep updating the model so some of this may no longer be true, but here is what I found.

  • The model says its data only goes to 2021 and is coy about exactly when its data ends, but I was to be able to pin it down to mid-2021 some time right before the Olympics. The model is unaware of anything about the 2021 Olympics, but believes Naftali Bennet is the current Prime Minister of Israel. Bennet became PM in July 13th 2021, the Olympics began July 23rd. Since the Olympics are always such a media frenzy, I find it hard to believe that ChatGPT would not have been trained on at least a few articles about the 2021 Olympics if its training dataset included dates after July 23rd, so I estimate that its dataset cuts off in mid-July 2021 between the 13th and the 23rd.
  • As a topic becomes more in depth, the quality of the answers decreases. It has some amazingly esoteric “surface level” knowledge (ask it what the capital city was for some ancient, long dead nation/civilization), but has a harder time with the kind of deep knowledge that makes one an expert in a field. I asked it to explain how Isoelectric Point relates to pH, and while it gave all the right words it came up with an answer that is the opposite of reality (see image below). For reference if the pH is higher than the pI, the protein will have a positive charge due to deprotonated amino acids. This answer would be like someone giving you a a very deep description of an electron but ended up saying it always has a positive charge. Sounds good! But wrong answer.
  • The math update has fixed some of the fun, dumb responses you used to be able to get, and it can now give some strong answers for your physics homework like calculating the speed and trajectory of a baseball. But it still has some weird hangups though and I don’t know why. Complex word problems usually seem ok but simple math is much trickier. I’ve heard tell that the language processor will (somehow) be hooked up to Wolfram Alpha to get solutions to math problems, but it seems like that’s not the case yet.
  • As an aside, when ChatGPT gives me a wrong answer, I find myself doubtful and second guessing myself. It gives blatantly wrong answers with the exact same cadence that it tells you all the correct things, so I find myself wondering if I’m the dumb one and my college degrees are all a lie. I guess it proves the maxim that if you just say things confidently people will believe you.
  • I wondered if ChatGPT would work as an ad supported model. This may be dumb, but hear me out. Say Khan Academy wants to advertise itself, and they already know students are looking up homework answer on ChatGPT (just like they used google before) so this is the perfect opportunity. A submodel could be trained using Khan Academy-approved language, such as testimonials from happy parents and children about how great Khan Academy is. Then when ChatGPT’s language model starts using words associated with Khan Academy topics (calculus, biology, physics etc) it can insert a canned tagline for Khan Academy and follow it up with words chosen based on the Khan Academy-approved text. So in my pI question above, it could add a snippet somewhere which would go “isoelectric point and pH [tagline starts here] are also taught as part of the Khan Academy course on Chemistry. Parents and students love Khan Academy because blah blah blah [end advertisement] oh by the way deprotonated amino acids are negatively charged.” Inserting ads into your searches is basically Google’s whole business model, and I’d certainly prefer this over a paid version of ChatGPT.
  • The non-deterministic nature of the answers makes it sometimes hard to gauge the overall “quality” of the model. I’ve had days where every answer seemed right and days where everything was a shambles. I’ve seen people complain that certain tricks don’t work while others post snapshots showing that they do. I think the output is at least partly determined by previous parts of the conversation but it also just seems semi-random (would love to know if it IS semi-random!). Either way, it makes it hard to judge without doing some statistics that I don’t feel like doing.

Anyway those are my impressions of ChatGPT so far. Fun timewaster, MUCH less toxic than spending all day on Twitter.

我觉得不太好

我觉得不太好。我的工作现在不太好,我做的时候不好可是我不知道为什么。 我应该净化这些蛋白质可是我只净化错的蛋白质。我的ferritin帖子是因为我不知道什么净化真的蛋白质,我每一天试一下净化真的蛋白质我只找得到ferritin。所以那时我的问题。

我的电脑只可以写简体字,可是最多的我的中文说的朋友是台湾人,他们用繁体字。所以我希望他们不是offended我在用简体字。

الكتبة بالعربية صعب جداً

امس كتبتُ بالصينية عن اشيأ، واليوم اريد ان اكتب بالعربية قليل. لكن الكتابة بالعربية صعب جداً. عندما اكتب بالصينية, اكتب حروف إنجليزي والكومبيوتور يعطيني الكلمات الصينية. لكن لا افعل هذا بالعربية. فالحقيقة ما في العربية اَي حروف إنجليزي مثل فالصينية. بسبب ذلك لا أستطيع ان اشوف الى مفاتيحي واستخدمها. من اللازم عن استخدم مفاتيح عربي أو أعرف اين كل الحروف بدون اقراء اليها. 

فكيف أنا اكتب هذا؟ بايفون. ايفوني عنده المفاتيح العربي فأستطيع ان أشوفها. لكن ما في أندرويد (Android) هذا المفاتيح. فلازم استخدم ايفون اذا اود ان افعل هذا. 

اكتب بالعربية بطيء جداً. ايداً مهارتي بالعربية ليس كثير  وأنا خطأ إملائي كثيراً. بس اود ان احاول هذا فلن ازال هن هذا في المستقبل. 

هذا كان فسير جذاً. مذا اخر اريد ان أقول؟ أنا عم العب لعبة فيديو عن اشيأ في الوقت تصنيع. هي “التاريخ المغاير” وفيها العب عن المصر بعد احرب مع “اوتوماني” (Ottomans) وأنا اتحدت كل المصري والبدوي والمشرقي (هل “مشرقي” كالمة الخقيقة؟) وأتحداهم في بلد العربي وحاربتُ البلدين الأوروبي.  كان كثير من الحرب في الوقت التصنيع. ابي يقول هو لا احب لعباتي لان كلها عن الحرب لكن هو كل وقت يشاهد الفيلم عن أو في الحرب العالمية في ١٩٤٠ أو تلك وقت. فالخقيقة هو لا يحب اللعبة الفيديو وهذا اوكي بس هذا ليس عن حرب أو لا حرب. 

فهذا كان مرح واعرف كثير معه ليس سحيح بالعربي بس اود ان افعل هذا مرة أخرى!

我在想用中文写一个报

我觉得报的意思是“report”所以因为我不知到怎么说”post”用中文所以我说报。

我现在必须写一个工作的报告,可是我告诉我的自己我会每一个天在我的blog写一个报。所这是我的报。我想在工作找得到新朋友,可是这是特别难的。每一个工人我们都工作以后回家,不做好玩的东西。所以我跟同工不花时间,所以我跟同工不当朋友。

我也不知道什么我想用中文说。不知道怎么关于科学用中文写。我的工作用蛋白质,我可以说那。我们学病毒,我可以说那。可是怎么说别的东西?

这个包是不太条可是我没昨天晚上写,我今天在写。所以我应该做工作,所以不太条不太错。

How do you read in a language you only half understand?

Whenever I learn a new language, there always comes a time when I start to get good enough at it to recognize and understand certain words, but not good enough to know every word I come across.  I can read half a sentence but not the whole sentence, understand half a paragraph but not the whole paragraph.  This is a difficult time for a learner because you’re just on the cusp of truly using the language to read, but you don’t feel good enough to actually use it because you only understand half of what you read.  How do you get better?

The answer (so I’ve been taught) is you still try to read.  Even if you don’t understand everything, even if you only understand half of it, you try to read what you can so you can get familiar with the language and start learning by using.  Most words we know were probably never defined to us specifically, did anyone ever define to word “anyone” to you?  Instead as learners we pick them up by context clues and other hints, and start using them the way we read or heard them.  This can occasionally lead to hilarity, like how I once heard someone describe a child as homely instead of comely, but it can also lead to learning as you start to use and understand each new word you read.

So if I’m reading something and I come upon words I don’t understand, I was taught not to look each one of them up, but instead to just keep reading and try to figure them out as I go.  I may read a sentence that says “he went to the 餐厅, and after he’d finished his meal he…”.  Although I don’t know what 餐厅 means directly, it seems that “he” ate their, so it must be some sort of eating place.  Now whenever I see that word again I see if it seems to have something to do with eating, and if it does then I can learn by usage that 餐厅 means “a place where you eat.” Through this process I can slowly pick up the language through usage rather than trying to stop and look up every word.

But here’s the secret: this trick also works with scientific writing.  Scientific writing is filled to the brim with jargon and odd definitions.  What is an SDS-PAGE?  What is an HPLC?  And not only are the words difficult, the concepts are difficult, why did they use centrifugation to separate out the nucleus?  Why does electron microscopy not let you visualize the less-rigid parts of a protein?  When you start out as a scientist, you are often told to read scientific papers, and scientific papers can feel like you’re reading a foreign language!  But the same rules apply as reading a foreign language, you don’t always have to know every word when you’re starting out, or even every concept.  It’s more important to develop scientific language fluency so that you can get the big idea out of a paper and understand it when speaking with others.  For example, they used HPLC to separate a protein of interest from all the other proteins in a cell.  OK so HPLC is a purification technique, I don’t need to know how it works if all I’m interested in is that protein of interest.  I can move on to what the paper says about the protein secure in the knowledge that it is indeed pure.  If later on if HPLC becomes more important then I can do a quick search or deep dive to understand more of it, but it isn’t always necessary to know every single word or technique in a paper. Reading scientific papers is a skill, one I’ve had to devote a lot of time to getting better at, but once you develop knowledge of the jargon and techniques it gets a lot easier, and importantly you develop the skills necessary to learn any new jargon or techniques that you come across.  And that is the real skill, not the knowledge of specific things but the ability to learn new things.  That is what truly makes a scientist.

Random thought: push Zhuge Liang for Summerslam

As I stated earlier, one of my favorite pieces of Chinese-language media is the Three Kingdoms TV show.  The more I rewatch it, the more I remember one of it’s stand-out features: they REALLY want you to think Zhuge Liang is cool

Anyone anywhere who is at all a cool guy is consistently shown up by Zhuge Liang, who is not only the wisest and most capable general but is able to predict entire battles before they even happen.  Several characters outright state that Zhuge Liang is the Coolest Guy and Best Strategist, and those who think they’re better are always shown to be wrong before the episode is finished.  

Now in stories, this isn’t a bad thing, it makes the audience know that Zhuge Liang is a Cool Guy, and when well-executed it makes the audience like him BECAUSE he’s a Cool Guy.  But by necessity it can lead storylines down weird paths

Is wrestling there’s something calling “pushing” which is basically where you take a character and give them a lot of victories so the audience starts to like them.  Although audiences can root for underdogs, most underdog stories end in the heroes’ victory (just see every sports movie), so giving a character a bunch of wins lets the audience know that they are Cool and Competent and will definitely be important in the future, even if they’re still an underdog.  Conversely, characters who always lose are clearly not as special and good, unless the story focuses on their defeats and how they grow from those defeats (and start getting victories). When this is done well, the audience roots for exactly who you told them to root for and everyone is happy. When this is done poorly, sometimes a character can feel “overpushed,” when the audience gets sick of seeing them win all the time and wants to see someone else in the limelight instead.

When I watch Three Kingdoms, it feels like Zhuge Liang is being pushed for Summerslam.  He’s the smartest, he’s the best, and he needs to get a bunch of wins in a hurry to make up for lost time since he’s only just been introduced.  Not only does Liu Bei go through great lengths to recruit Zhuge Liang (indicating he’s super special and important), several characters all outright state that as a strategist, Zhuge Liang is far superior to any of the cool and competent characters we’ve met up until now.  He is routinely shown outsmarting Lu Su and Zhou Yu (his “rivals” from the Southlands) and masterminds the defeat of Cao Cao (his “rival” to the North). Zhou Yu in particular is shown to be a petty, insecure jerk constantly trying to one-up Zhuge Liang and then getting outsmarted like a mean principal in a kid’s show. To be blunt, I’m a bit tired of Zhuge Liang already, which makes me worried since I know he’s going to stay super duper important for a long time yet, I mean I visited his shrine in Cheng Du (the WuHouCi if you’re ever in the area).  I’m not sure why exactly I’m already tired of him, maybe it’s because he’s just too smart and it gets boring, or maybe I just perennially root for underdogs.  But while it’s still fun to watch the show and see what Zhuge Liang will get up to next, I’m a tiny bit more interested in the stuff I’m not seeing, but which I know happened in history.  I’d love to see more of Cao Pi (Cao Cao’s son who everyone agrees isn’t half as smart as his dad but eventually inherits everything anyway).  But everyone agrees Cao Pi is a moron so he’s not cool enough to get much focus yet.

Who is the “protagonist” of a narrative spanning over a century?

A few days ago I posted about my favorite Chinese-language media, and included in that list the Three Kingdoms TV show that can be watched on Youtube. The TV show is heavily based on the “Romance of the 3 Kingdoms” novel written hundreds of years ago, and I remember reading a(n abridged) version of the novel when I was in University.

One of the most interesting conversations I had was with a Chinese friend of mine who had read the book in middle school. I basically told him “I really like this book, and it’s got some cool characters like Cao Cao, he seems to be the main character.” My friend said “really? I remember Zhuge Liang being the main character.” At that point I hadn’t even met Zhuge Liang in the book so was confused. In the sections I read, Cao Cao was in many ways the driving force behind the narrative: he tried to assassinate Dong Zhou, he helped raise a rebel army, many of the plot threads were from his perspective as he warred across the Central Plain.

And yet my friend’s memory was correct, as soon as Zhuge Liang enters the narrative, HE is the clear protagonist of the story. He is very clearly shown as the smartest, wisest, most dedicated general, and anyone who is in any way cool will at some point get shown up by Zhuge Liang to prove that Zhuge Liang is even cooler. Perhaps his only drawback is that he is too smart, I remember a conversation sometime before the Battle of Red Cliffs where someone admonishes him to remember that not everyone understands what he’s saying or doing because they aren’t as smart as him.

But of course the narrative lasts a very long time, many of these characters grow old and die before it is finished. So in a long-running character-spanning narrative, how do you even define who the main protagonist is? I guess in a way you don’t, Three Kingdoms is more an ensemble cast of characters who rise and fall throughout the narrative, and that’s part of what makes it so great.