We can work on Python Skills

Question 1: Python Skills [20 marks]
Given the Python list data=[2, 4, 6, 8, 10, 12, 14], write the answer that will result from
evaluating each of the following expressions:
a) len(data)
b) sum(data)
c) data[2]
d) data [-2]
e) data [2:6]
f) if someone wanted to use Python to analyse their customer data and find the largest
5 clusters of customers with similar purchasing behaviour, what Python library would
you recommend that they use? (pandas, or matplotlib, or sklearn, or csv)?
g) write a Python function called ‘clean’ that takes a string as an input, removes all
commas from that string and then converts it to an integer and returns it. For
example, clean(“4,567,000”) should return the integer 4567000.
Question 2: Pandas Proficiency [20 marks]
Below is the first few rows of a dataset of house prices – there are thousands of rows of data
in the full dataset.
a) Identify any issues in the above data description and data values? Explain how you
would correct each of the issues. [5 marks]
Now assume that this table has already been read into a Pandas DataFrame called houses,
and that all the data issues you have identified have been fixed, so that all the columns are
id bathr24rooms bed $$ rooms finished 44 sqmeter lastsolddate lastsoldprice neighborhood totalrooms year && built
1 2 2 1043 2/17/2016 $ 1,300,000.00 South of Market 4 2007
2 1 1 903 2/17/2016 $ 750,000.00 South of Market 3 2004
3 4 3 1425 2/17/2016 $ 1,495,000.00 Potrero Hill N/A 2003
4 3 3 2231 2/17/2016 $ 2,700,000.00 Potrero Hill 10 1927
5 3 3 1300 2/17/2016 $ 1,530,000.00 Bernal Heights 4 1900
6 1 2 1250 2/17/2016 $ 460,000.00 Crocker Amazon 5 1924
7 1 3 ###1032 2/17/2016 $ 532,000.00 Oceanview No 1939
8 1 2 1200 2/17/2016 $ 1,050,000.00 Mission Terrace 5 1924
9 3.5 4 2700 2/17/2016 $ 3,500,000.00 Noe Valley 9 1912
10 2 3 2016 2/17/2016 $ 1,500,000.00 Hayes Valley 7 1890
11 1 3 ????1798 2/17/2016 $ 848,000.00 Portola Yes 1953
12 1 1 761 2/17/2016 $ 1,000,000.00 South of Market 4 2008
13 1 1 780 8/12/2015 $ 863,000.00 Eureka Valley 4 1981
14 5 5 5786 2/16/2016 $ 4,888,000.00 Lake 12 1926
15 2 2 1688 2/16/2016 $ 1,000,000.00 Inner Sunset 6 1927
16 3 4 1619 2/16/2016 $ 210,000.00 Sunnyside 7 1966
17 1 0 ***398 2/12/2016 $ 525,000.00 Van Ness – Civic Center 4 2008
18 4.5 4 2615 2/12/2016 $ 2,300,000.00 Mission 9 1906
19 2 2 1252 2/12/2016 $ 1,450,000.00 Nob Hill 4 2002
20 2 3 1444 2/12/2016 $ 2,500,000.00 Nob Hill 6 2009
21 2 3 1441 8/6/2013 $ 630,000.00 Oceanview 5 1955
numeric, except for neighborhood which is a string column and lastsolddate which is a
Python date column.
b) Write a Pandas expression that returns the average house price.
c) Write a Pandas expression that returns the median house price.
d) Write a Pandas expression that will return the average price of houses in the “Nob
Hill” neighbourhood.
e) We want to investigate the relationship between the size of houses and their prices.
Draw a scatter graph that plots the house price (“lastsoldprice”) as the Y axis,
against the house size (“finished 44 sqmeter”) as the X axis. Show the scale of
both axes clearly, and draw just the first SIX houses into your graph.
f) Write some Python code that will draw this scatter graph for ALL the houses.
g) Add a new column called “price/room” to the houses table that is the ‘price per
room’. For each house, this is the price of the house (“lastsoldprice”) divided by the
number of rooms in the house (“totalrooms”).
Question 3: Machine Learning Concepts [20 marks]
There are many different types of machine learning algorithms.
a) Define the difference between regression and classification machine learning.
b) Give two examples of machine learning algorithms that are suitable for
regression problems.
c) Give two examples of machine learning algorithms that are suitable for
classification problems.
d) Give a business example where a regression learning approach would be
appropriate.
e) Give a business example where a classification learning approach would be
appropriate.
Question 4: Machine Learning Process [20 marks]
This question will use the same house-price dataset as Question 2, cleaned and loaded into
a Pandas table called houses.
Your manager asks you to use machine learning (and Python) to build a model that will
predict the expected sale price for a house with a given number of bathrooms, total size,
neighbourhood, and construction year, etc. That is, use all the available data to build a
model that predicts house sale prices.
a) Explain the typical process you would follow to use Python to build a model to
predict the house prices. Give a number and a title for each of the steps that you
would take, and briefly explain each step.
b) Sketch out some example Python code that you would use to implement the above
steps using the LinearRegression learning algorithm (from the sklearn.linear_model
library). Use Python comments to break your code into the steps you discussed
above, with the number and title of each step.
Question 5: Evaluation of Models [20 marks]
Socks4You is an online company that sells socks by subscription ($10/month, or
$120/year). Each month, they send one pair of high-quality designer socks to each
customer. They want to analyse their customer base, and their ‘churn rates’ (customers
who decide to stop subscribing to their service).
The following confusion matrix shows the results of applying a Decision Tree machine
learning algorithm to 1000 historical examples of customer churn from the previous
year. The columns show whether the customer did really leave (‘Churn=Yes’) or stay
(‘Churn=No’). The rows show the prediction output from the learned Decision Tree
model.
Churn=Yes Churn=No
Model predicted Yes 500 200
Model predicted No 100 200
Calculate the values of the following evaluation metrics for this model (since you do not
have a calculator, you can write them as a fraction) [2 marks each]:
a) number of true positives?
b) number of false positives?
c) accuracy?
d) precision?
e) recall?
Socks4You is considering using this model as the basis a new marketing campaign to
better retain their existing customers. They will send special offers to all the people that
the model predicts Yes (that is, the customers that are in danger of ‘churning’ away from
Socks4You). The cost of these discounts will average $10 per customer, but it is
expected that it will halve the churn rate, which will save on average half of the annual
subscription of each customer who is persuaded not to churn. The following costbenefit matrix summarises the annual costs and expected benefits of this campaign for
each group of customers.

f) Calculate the expected income after this marketing campaign? Show your
working. [4 marks]
Churn=Yes Churn=No
Model predicted Yes ¾ * $120 – $10 = $80 $120 – $10 = $110
Model predicted No $120 / 2 = $60 $120
The cost-benefit matrix for the next year without the marketing campaign is:
Churn Not Churn
Model predicted Yes $120 / 2 = $60 $120
Model predicted No $120 / 2 = $60 $120
g) Calculate the expected annual income WITHOUT the marketing campaign. [4
marks]
h) Would you recommend that Socks4You goes ahead with the marketing
campaign? Explain your reason. [2 marks

Sample Solution

CIN or, cervical intraepithelial neoplasia is a precancerous situation of extraordinary cellular increase on the cervix. Intraepithelial way that the atypical cells are developing at the surface or the epithelial tissue of the cervix. Neoplasia is relating to the growth of recent cells. signs and symptoms and signs may be obvious but also can resemble several situations that women ought to encounter. those signs can encompass unusual vaginal bleeding, bleeding after sexual intercourse, pelvic pain, discharge, and ache at some stage in sexual intercourse. (Stöppler) it is recommended that women start getting pap smears at the age or 21. this is most critical in case you are HIV fantastic or have a weakened immune system. (Weber, 2017) these screenings ought to maintain from a while 21 to 29 with cytology by myself every three years. From ages 30-65, ladies must keep cytology screening every 3 years and upload HPV checking out. After 65 no screening is vital as long beyond screenings are everyday and no high hazard is present. (Boardman, 2018) through the years professionals have discovered it hard to all be at the same page about reporting. some stages of unusual effects can include atypia, moderate, mild, excessive dysplasia, and carcinoma in situ. The creation of the Bethesda gadget has given one reporting device for all fitness care specialists. In 1988 the country wide most cancers Institute held a conference for the advent of this device, it became then re-evaluated in 2001. There are four primary classifications that make it easier for this frequent gadget to work. “ASC-US: This abbreviation stands for odd squamous cells of undetermined importance. LSIL: This abbreviation stands for low-grade squamous intraepithelial lesion. under the vintage system of class, this class was known as CIN grade I. HSIL: This abbreviation stands for excessive-grade squamous intraepithelial lesion. underneath the vintage gadget of category, this category become referred to as CIN grade II, CIN grade III, or CIS. ASC-H: this means peculiar cells are present and HSIL can’t be excluded.” (Stöppler) CIN instances are maximum constantly resulting from contamination with oncogenic styles of HPV or, Human Papillomavirus. There are 12 regarded kinds of excessive chance HPV, which might be the most accepted associations with cervical cancer. Cervical cancer outcomes from a genital contamination with HPV, a known human carcinogen. due to the fact maximum HPV infections are brief or, passing inside and out of life in a patient, it causes handiest brief modifications in cervical cells. (national cancer Institute, 2014) about 90% of HPV infections clean on their personal inside months to years with out a sequelae. (Boardman, 2108) This makes it tough to trap the HPV infection and in turn cervical most cancers. Too common of screenings might be difficult for several motives. One being that treating those abnormalities wondering it was HPV but that went away in any case would reason needless pressure at the patient. additionally, placing stress at the cervix numerous times in any time period can weaken the tissue and will in the long run affect the woman’s fertility. interestingly enough, it is able to take up to 20 years for a chronic contamination with a excessive chance HPV to turn out to be cancerous. (national cancer Institute, 2014) Low danger HPV infections hardly ever or nearly never motive cervical cancer. (Boardman, 2018) however if lesions are determined and no longer handled, they’re extra than probably to turn into cervical cancer. (country wide most cancers Institute, 2014)>

Is this question part of your assignment?

Place order