I've recently developed an application named "Sushida Killer," which automatically plays "Sushida," the most famous typing game here in Japan -- For more details,
play sushida here --.
This application consists of three parts:
1. Reading problems
2. Recognizing characters in movie
3. Similate key typing
The first part is reading problems.
On first sight, you will notice that a dish of sushi, which corresponds to one problem, emerges every 5 second. So it's not a bad idea to automatically take a screen shot of flash video with a span of 5 seconds. How to get a screen shot automatically? I used the next procedures:
1. Simulate key typing "Alt + PrintScrenn"
2. Access to the clip board with win32 API.
3. Get the Bitmap infomation from the clip board
4. Save it to a bmp file.
The second part is the key part of Sushida Killer.
To read only the charactor on screen, I first created some mimic cpp library that deals with image processing.
With it, I cut off a rectangle, in which a problem text comes up, then changed RGB 3-demension vector infomation of each pixel into 0/1 data using a threshold.
So far, you can get data something like this:
I had to collect many sample data like above to train the classifier, which was done semi-automatically.
After training with enough amounts of learning data sets, the next step is classifying an unknown data to a string. Fortunately I didn't have to use a learning machine based on statistical theory like an SVM and an Bayes' machine, since the characters were written by Flash app, not by a human, so they were neat and tidy enough to be understood by computer.
The key point is speed. How to classify data in less than 5 second?
There are about 1000 sample data, each has a 10,000-pixel relevant rectangle in it.
It seems tough to finish classifying the image data to the corresponding string in 5 seconds, considering that you also have to get a screen shot of the flash movie, and save it to the local file, then classify it into one of about 1000 groups.
So I mapped each learning data's information to a hash value between 0 and 10^10, inclusive.
I thought a collision would never happen because (the number of data / hash value size) was 10^-7. And I was right!! I succeed to finish the process less than 5 second. Actually, it doesn't take more than 1 second.
I preliminarily calculated each data's hash value, and dispatched them to a file. So when playing Sushida, program can load the data in memory, which doesn't take time at all.
The third part is key simulating.
It's not so difficult to simulate key typing if you know about keybd_event() of win32 API.
But getting the window handle of Sushida was a little challenging. The flash app runs in a browser, so it's impossible to get the application window directly.
I had to implement a program that gives you a window handle of the application on which the mouse cursor is located. Thanks to it, I was able to get the handle of Sushida, which was the children of children of children of children of my web browser!!! So I traversed window handle to reach the flash app.
Most of my effort was put into collecting the data. That were tedious tasks, though I did it semi-automatically.
And also selecting parameters was not so interesting, but it was a very important job.
What the size of rectangle?
How long the program have to sleep to wait for the screen shot and file writing to end?
How long the program have to sleep to wait for the key simulation to end?
Finally, I thank the great developer who created sushida for giving me this good challenge.
(Ah, he/she never gave me that kind of challenge, I just think I was given it..)
And I hereby promise that I will never use this Sushida killer to be ranked in the formal site, i.e., this program is for local use only.
Last but not least, here's a demo of Sushida Killer!