Recognition of number sets

Translated with google translator.

One of the added features in version 1.3 of LearningML is number set recognition. And what does this number recognition consist of? Well, to be fair, recognizing number patterns is the only thing ML algorithms can do. In fact, when we work with images or texts, before being introduced as inputs to the algorithm, they are converted (encoded) to numerical sets.

In LearningML, each example image is converted into a set of numbers (called a tensor). The image is divided into a network of 227 × 227 squares (pixels) and the color of each of these squares is encoded as a combination of red (R), green (G) and blue (B). Hence the total number of numbers required to encode an image is 154587.

Something similar happens with the texts, a dictionary is built with all the words of all the example texts, words that do not contribute much to the semantics of the sentence are eliminated from that dictionary (stopwords) and the presence or absence of the Dictionary terms in each text to encode it.

Therefore, what Machine Learning algorithms require are numerical sets. However, until version 1.3, we did not have the ability to directly enter these number sets in LearningML. And this is very useful because, on many occasions, the data sets we have are simply data sets organized tabularly, as in a spreadsheet, in which each row of numbers represents an instance.

One of the most characteristic and well-known numerical sets in the world of statistics and Machine Learning is the iris dataset. There are 150 specimens of Iris flowers classified into three species: Iris setosa, Iris virgínica and Iris versicolor. Each specimen has been characterized using 4 typical flower traits: the length of the sepal, the width of the sepal, the length of the petal and the width of the petal.

We reproduce here 6 of the 150 copies of the set:

EjemplarRepresentación numérica del ejemplarRepresentación numérica de la clase
1[5.1, 3.5, 1.4, 0.2]Iris Virginica
2[4.9, 3. , 1.4, 0.2]Iris Virginica
3[5.5, 4.2, 1.4, 0.2]Iris Versicolor
4[4.9, 3.1, 1.5, 0.2]Iris Versicolor
5[6.2, 3.4, 5.4, 2.3]Iris Setosa
6[5.9, 3. , 5.1, 1.8, 2]Iris Setosa

You can download the complete set in CSV format or in the LearningML JSON format (ready to be loaded into the tool) from the following links:

Iris dataset CSVIris dataset (JSON)

And now we are going to build with LearningML an ML model with this sample data. The procedure is the same as always:

  1. Enter the sample data (train).
  2. Build the model from the sample data (learn).
  3. Evaluate the model (test).

The only difference is that in this case we click on the “Recognize numbers” button on the home screen.

In the case of sets of numbers, before starting to enter data, we must indicate the number of attributes of the data set that we want to process. This value, by default, is 2, but we can change it using the “Number of columns” text box. In the example that we come up with, this number is 4.

Now we create the three classes of the problem: Iris Setosa, Iris Versicolor and Iris Virginica, and we add the numerical data corresponding to each class. The way to do this is by separating the numbers with commas (“,”), as seen in the following image.

Once we have the sample data, we proceed to build the ML model by clicking on “Learn to recognize numbers.”

And, once the learning process is finished, we proceed to carry out some tests to see if the model convinces us. Again, the way to enter the numbers is to separate them with commas.

Finally, and as with the other types of data (texts and images), we can build a program with Scratch that uses the built model to recognize new numerical sets that encode new specimens (of iris flowers in this model).

The creation of number recognition models opens up new possibilities for the design of application programming activities with artificial intelligence, and will help to work on the concepts of data science, big data and even IOT (internet of things), since using sensors We can collect data on different phenomena, classify them, create a model and use it to predict or classify new data collected with those same sensors. In this way, we connect Artificial Intelligence activities with educational robotics even more intimately. EchidnaSTEM and micro: bit are two boards for educational robotics with which you can design some activity of this type. Do you dare to raise any?