# G9 Intro to Computer Technology 2019-2020

# Building Models of the World

We discussed how Data Scientists gather data to then extract information out of it by using Math and Computers to build nice graphs summarizing said information.

One of the key steps in doing so is to determine the features (or attributes) we want to collect data about. This is always a limited, finite set of attributes.

For instance, let's say we want to buy a new car and decide to compare the different models out there. What car should we buy? What features should we pay attention to?

Say, Bob always wanted to have the fastest Mercedes out there. What features is he considering? Answer: It would seem only speed: the brand he already locked to being a Mercedes. He would then go and list Mercedes cars writing down just their max speed. 

 | speed (Km/h) |
 |:------------:|
 |220|
 |225|
 |215|
 |...|
 
 Alice, on the contrary is looking for the **_best option out there within a given budget_**. Clearly, she's interested in looking into **more features** than just speed. A possible list of attributes could be 
 _brand, model, build year, number of doors, horse power, city consumption (L/100Km), highway consumption (L/100km) and price_. There are more attributes one could consider, e.g., engine type, number of airbags, [safety classification](https://www.euroncap.com/en/ratings-rewards/latest-safety-ratings/#?selectedMake=0&selectedMakeName=Select%20a%20make&selectedModel=0&selectedStar=&includeFullSafetyPackage=true&includeStandardSafetyPackage=true&selectedModelName=All&selectedProtocols=34803,30636&selectedClasses=1202,1199,1201,1196,1205,1203,1198,1179,1197,1204,1180,34736&allClasses=true&allProtocols=false&allDriverAssistanceTechnologies=false&selectedDriverAssistanceTechnologies=&thirdRowFitment=false), but Alice chose only those 8 features
 
Both Alice and Bob **_can only consider a limited set of features_**. Bob is extreme and just looks at one -well, two if you want, as he locks the brand to be a Mercedes Benz car; Alice chose instead to look at 8 different features. 

No matter how many resources and effort we put into such a research, we'd only be able to consider a finite number of features. 

In addition, the more features we collect data about the more complex our analysis becomes.

In any case, both Alice and Bob are, in fact, _building a **Model** of a car_: each car is simply a list of features' values!

   * For Alice: (brand, model, build year, number of doors, horse power, city consumption (L/100Km), highway consumption (L/100km), price)
   * For Bob: (brand=Mecedes Benz, speed)

Such a list of features is what determines the Model of reality we are considering.

When scientists study the best drug recommendation for different patients, they will collect data on previous patients in order to infer from it some pattern. That data will inevitably consist on a limited number of features for each patient, e.g.,

> (Age, Blood Preasure, Sex, Cholesterol, Na_to_K ratio, Drug Recommended)  

Clearly, it's impossible to describe how we see our loved ones by considering only those 6 features! A person is way more complex than what such a list of attributes may allow to describe! In some sense, however, we could say that for the (data) scientist each patient has become _just a set of numbers or features_. For the problem the scientist wants to address, that limited seat of features might well be a good enough _description of reality_.

This is the way science works. When a physicist studies the elementary particles, the quarks, that give rise to the Universe and everything in it, each particle is reduced to a set of features 

> (momentum, mass, electric charge, parity, spin, isospin, charm, strangeness, bottom, top, and color) 

In Computer Science, among other things, we develop programs that solve specific problems, like extracting information from data, predicting outcomes (e.g., weather forecast), driving cars, etc... In doing so, we always have to build first a _model_ adequate for the given problem. 

# Homework : List attributes for the topic Immigration (either specific, e.g. to Canada, or migration flows in general)



Together we came up with the following list of attributes we could use to describe immigration flows:

> Country of Origin, Destination Country, Legal Requirements, (Number of people: this would actually be a number listed under each year), Travel Method, Cause of migration, (etnicity, religion), 1980, 1981, 1982,etc.,2018,2019 

This gives rise to a **_table_** with the following columns:

In [140]:
Immigration_Attributes

Unnamed: 0,Country of Origin,Destination Country,Legal Requirements,Travel Method,Cause of migration,1980,1981,1982,2018,2019


An example of an immigration flow entry could be

In [141]:
Immigration_flow_example

Unnamed: 0,Country of Origin,Destination Country,Legal Requirements,Travel Method,Cause of migration,1980,1981,1982,2018,2019
0,China,Canada,,Plane,General,50000,38000,51000,65000,65300


[Here some examples of immigration flows on a world map](https://nbviewer.jupyter.org/url/evermeet.cx/~user055/Dragon/Lessons/CompSci/G9/G9-TEJ0-20190917_143000.ipynb)

In [144]:
world_map 

# Appendix: Code

In [133]:
import folium, math
import numpy as np
import pandas as pd

In [143]:
world_map = folium.Map()
canada=np.array([56.1304,-106.3468])
china=np.array([35.8617, 104.1954])
india=np.array([20.5937, 78.9629])
pakistan=np.array([30.3753, 69.3451])
colombia=np.array([4.5709, -74.2973])
east=np.array([0,100])
origins={'China':china,'India':india,'Pakistan':pakistan,'Colombia':colombia}
#origins={'China':china}
for oname,oxy in origins.items():
    aline=folium.PolyLine(locations=[oxy,canada],tooltip=oname+'->Canada')
    world_map.add_child(aline)
    dr=canada-oxy
    angle=np.arctan2(dr[0],dr[1])*180./np.pi
    #world_map.add_child(folium.PolyLine(locations=[canada,-canada+dr]))
    #world_map.add_child(folium.PolyLine(locations=[canada-east,canada+east]))

    folium.RegularPolygonMarker(location=canada,rotation=int(-angle),radius=4,
                            number_of_sides=3
                           ).add_to(world_map)
    #print(oname,dr,angle)
#folium.RegularPolygonMarker(location=(0,0),rotation=int(0),radius=2,
#                            number_of_sides=3
#                           ).add_to(world_map)



In [137]:

Immigration_Attributes = pd.DataFrame(columns=['Country of Origin', 
                           'Destination Country', 'Legal Requirements', 
                           'Travel Method', 'Cause of migration', 
                           '1980', '1981', '1982','2018','2019'
                          ]
                 ) #,index=['col1'])


In [139]:
df2=pd.DataFrame([['China','Canada','None','Plane','General',50000,38000,51000,65000,65300]],
                       columns=Immigration_Attributes.columns.values) 

Immigration_flow_example=Immigration_Attributes.append(df2)