April 18, 2025
1

Kanjis project episode 0: the idea

OK but why?

Ok, so I have this project about kanjis. Because I wanted to come back to neural networks with image recognition, but MNIST is a bit boring. And because I love to suffer, I learn Japanese (I try).

If you ever think about learning Japanese, you should know before that even though it is a beautiful language, and it has its nice aspects (like the fact that it’s not too hard to pronounce, at least when you are a French speaker), its writing system is…chaotic. It uses 3 sets of characters: katakanas and hiraganas that are syllabaries (katakanas are mainly used for imported words like コンピュータ [konpyuuta] = computer, hiraganas are used for verb endings or particles for example), and of course kanjis. The fun part is that kanjis have been imported from the Chinese writing system, so in Japanese, the same kanji can have different meanings (although generally with a common idea), but also different pronunciations, depending on whether the kanji is used alone or with others to form a new word. And of course, there are thousands of them, otherwise it wouldn’t be funny (Japanese students learn 2,136 kanjis in school). So when you encounter a new kanji (which happens. A lot.), it can be difficult to find it’s meaning or its pronunciation.

Here comes my problem: how do I find information about a new kanji? Well of course I am not the first person to ask this question and many tools already exist (you can use the number of strikes, some keys…), but I wanted to make my own, because why not? And it’s better than MNIST.

So what do we do?

What I want to do is to create a little app (not sure about the final form yet) where I can draw a kanji, and use neural networks to do some image recognition and get the kanji. In fact, I want something like the tool proposed by Jisho.org:

Jisho.org: draw kanjis
Jisho.org: almost found 雨

To be honest, I don’t think they are using neural networks. Most tools that I found seem to use strike order and direction. We could argue that my idea to use neural networks is probably not so good then, and it’s probably true, as there are a lot of kanjis. And for each of them, I need a big amount of handwritten examples (and we’ll see in episode 1 that this will be a problem). But I am not looking for the best results, I just want to learn and have fun, so let’s see where we can go!