CatchTheDuck.com

Technology

Tools

I use:

  • The open source machine learning TensorFLow
  • The TensorFlow implementation of the Swivel algorithm for generating word embeddings
  • A Virtual Machine for eventually increasing power to run the program considering size of the vocabulary
Pattern

Rather than saying what I do, I prefer to explain what I do not do. I bypass what it takes time, I mean training the model. I prefer operate directly on matrix to get a better result. In fact, I catch the duck by the neck, it’s what I call «fast-learning». Beyond you are not sure that increasing number of epochs can drastically produce a better result. Sometime, it’s worst.

Data

I dumped Data from Wikipedia. After cleaning the french version contains 5 millions words and the english version 25 millions. I need a 4vCPU/15Go Ram machine to compute the program. It takes less than one minute in french and 15 minutes in english. We have to admit the context is heterogen so many results are strange. Some are suprising. In french the last result for «midwife» is «lighting». In english the first result for «woman» is «isomers». If you’re not convinced, you can try the Deep Search: the deepest result for «deep» is strangely «IA». You can also execute «deep» with the Edge Search.

Css

I use the Css purposed by Material Design Lite

Faq

Answers
  • The program is written in Python, it means you have to run it with a command line or use an interface to get results like this website
  • The source code is not opened yet, it’s more a diy than a real implementation
  • If you find this technology interesting, don’t hesitate to contact me. Thank you in advance for your feedback. A practical exercice could be trying to make the link between your search and some results


Contact

Send