INFV320 Final Project

July 6th, 2020

6/24/2020 INFV320FinalProject
https://d2l.arizona.edu/d2l/le/content/877109/viewContent/8514790/View 1/4
Learning Objective
This assignment is the designated final exam. You need to use what you have learned to approach the problem.
The assignment is revised based on a capstone project composed by Professor Odile Wolf.
Problem Overview
You will be creating a Python program saved in a file named “extractFrame.py”. The Python program will read
text from a file named “wireShark.txt” and display the frames with essential frame data. For each frame, you
need to extract and present its frame number, the source and destination addresses, as well as the frame type.
The frame number always appears after “Frame” at the beginning of each frame in the document. For either
source or destination address, it is composed of 12 hexadecimal digits, every two of which are separated by
colon. The address examples could be like below:
00:14:ee:08:dd:b1
01:00:5e:7f:ff:fa
The frame type is a hexadecimal value that is used to indicate the type of upper-level protocol in the data fields.
A common value is 0x800 that describes the IPv4 protocol.
For example, the first frame in the given “wireShark.txt” file is presented below. Note that here I highlight the data
fields that you need to extract from each frame in the document.
6/24/2020 INFV320FinalProject
https://d2l.arizona.edu/d2l/le/content/877109/viewContent/8514790/View 2/4
Frame 1 : 372 bytes on wire (2976 bits), 372 bytes captured (2976 bits) on interfac
e 0
Ethernet II, Src: WesternD_08:dd:b1 ( 00:14:ee:08:dd:b1 ), Dst: IPv4mcast_7f:ff:fa (
01:00:5e:7f:ff:fa )
Destination: IPv4mcast_7f:ff:fa (01:00:5e:7f:ff:fa)
Address: IPv4mcast_7f:ff:fa (01:00:5e:7f:ff:fa)
…. ..0. …. …. …. …. = LG bit: Globally unique address (factory de
fault)
…. …1 …. …. …. …. = IG bit: Group address (multicast/broadcast)
Source: WesternD_08:dd:b1 (00:14:ee:08:dd:b1)
Address: WesternD_08:dd:b1 (00:14:ee:08:dd:b1)
…. ..0. …. …. …. …. = LG bit: Globally unique address (factory de
fault)
…. …0 …. …. …. …. = IG bit: Individual address (unicast)
Type: IPv4 ( 0x0800 )
Internet Protocol Version 4, Src: 192.168.1.180, Dst: 239.255.255.250
0100 …. = Version: 4
…. 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 358
Identification: 0xfe2a (65066)
Flags: 0x4000, Don’t fragment
Time to live: 4
Protocol: UDP (17)
Header checksum: 0xc505 [validation disabled]
[Header checksum status: Unverified]
Source: 192.168.1.180
Destination: 239.255.255.250
User Datagram Protocol, Src Port: 35064, Dst Port: 1900
Simple Service Discovery Protocol
No. Time Source Destination Protocol Length
Info
2 0.307821 192.168.1.180 239.255.255.250 SSDP 422
NOTIFY * HTTP/1.1
6/24/2020 INFV320FinalProject
https://d2l.arizona.edu/d2l/le/content/877109/viewContent/8514790/View 3/4
After analyzing the frames in the text file, your program should display all frames with the essential data. Below
is the result that should be displayed by running your program based on the given text file:
Frame 1, Src:00:14:ee:08:dd:b1, Des:01:00:5e:7f:ff:fa, Type:0x0800
Frame 2, Src:00:14:ee:08:dd:b1, Des:01:00:5e:7f:ff:fa, Type:0x0800
Frame 3, Src:cc:2f:71:3e:ca:a1, Des:14:91:82:36:7a:8d, Type:0x0800
Frame 4, Src:cc:2f:71:3e:ca:a1, Des:14:91:82:36:7a:8d, Type:0x0800
Frame 5, Src:cc:2f:71:3e:ca:a1, Des:14:91:82:36:7a:8d, Type:0x0800
Frame 6, Src:14:91:82:36:7a:8d, Des:cc:2f:71:3e:ca:a1, Type:0x0800
Frame 7, Src:14:91:82:36:7a:8d, Des:cc:2f:71:3e:ca:a1, Type:0x0800
Note that if there are more frames given in the input file “wireShark.txt”, your Python program should be able to
extract all the frames with the expected presentation. (That is, you cannot hard code the output based on the
given file! When I test your code at my side, I might use a different input file that has more than the frames
provided to you.)
Hints and Notes
To approach the problems, you may need to use certain methods and functions to analyze and manipulate the
lines in the given text file. For example, you may need to find if a substring, such as “Frame” or “Type:”, appears
on a line. You may also need to split a line into multiple parts, or slice a string in order to get a substring of it.
Below I provide some Python code examples to provide you clues on which methods/functions you could use
and how to use them.
6/24/2020 INFV320FinalProject
https://d2l.arizona.edu/d2l/le/content/877109/viewContent/8514790/View 4/4
In [27]:
#How to check if a substring is on a line? Hint: use the find method
seedStr = “Internet Protocol Version 4”
a_str = “Internet Protocol Version 4, Src: 192.168.1.180, Dst: 239.255.255.250″
x = a_str.find(seedStr)
print(x)
#should print 0
x_str =”(I am Li Xu)”
x = x_str.find(seedStr)
print(x)
#should print -1
#How to extract a substring from a line? Hint: slice the string!
x = x_str[1:len(x_str)-1]
print(x)
#should print “I am Li Xu”
#How to split a line into several parts? Hint: use split method!
a_str = “Internet Protocol Version 4, Src: 192.168.1.180, Dst: 239.255.255.250”
a_parts = a_str.split(“, “)
print(a_parts)
#should print [‘Internet Protocol Version 4’, ‘Src: 192.168.1.180’, ‘Dst: 239.255.255.25
0’]
x = a_parts[0]
words = x.split() #default split token is white space
print(words)
#should print [‘Internet’, ‘Protocol’, ‘Version’, ‘4’]
Turn-in
You need to turn in your Python file “extractFrame.py” to the Assignments folder for the final project.
0
-1
I am Li Xu
[‘Internet Protocol Version 4’, ‘Src: 192.168.1.180’, ‘Dst: 239.255.255.250’]
[‘Internet’, ‘Protocol’, ‘Version’, ‘4’]

Micro fabrication is generally the fabrication

July 6th, 2020

Summary
 Micro fabrication is generally the fabrication of the miniature structures of micrometer scales and other smaller unit thus becoming a general term that is used to refer to various engineering terminologies such as semiconductor processing among others.
 The micro fabrication process is made of repeated processes such as deposition of film, pattering of the film into various desired features and etching/removing of the undesired patterns in the film.
 Dry etching on the other hand is divided into three separate classes namely sputter etching, reactive ion etching and vapor phase etching.
 Micro forming is another fabrication process of micro electro mechanical (MEMS) parts or structures with two or more dimensions ranging from sub millimeter and above the process involves micro cutting, micro stamping and micro extrusion which have led to development of industrial- and experimental-grade manufacturing tools.
 Micro fabricated devices are usually formed over or in a thicker support substrate for electronic applications, semiconducting substrates such as silicon wafers.
 Thus micro fabrication is a technology that is widely in application in production of various modern devices used in making the mankind environment conducive and more comfortable for existence.

References
[1] Madou, M. J. (2012). Fundamentals of microfabrication and nanotechnology: Volume II. Boca Raton, FL: CRC Press.http://www.springer.com/cda/content/document/cda_downloaddocument/9781588295170-c2.pdf?SGWID=0-0-45-387685-p173728190
[2] http://rleweb.mit.edu/biomicro/reprints/jvoldman_annrev.pdf
[3] http://nptel.ac.in/courses/112108092/module2/lec07.pdf
[4] http://home.iitk.ac.in/~nsinha/Microfabrication%20I.pdf

Differences between Permanent and Temporary Staff Essay

July 6th, 2020

Differences between Permanent and Temporary Staff

INTRODUCTION
Research question
What are the differences between temporary and permanent staff in terms of their job satisfaction and how their health is catered for? Who are the best types of employees to employ?
Objectives
This proposal aims at undertaking an extensive research on the best type of staff to employ, either on the permanent or temporary basis and which type of staff is more satisfied with their job and health welfare while at the company using both theory and experiments to achieve the best results.
Background information
Permanent staffs are employees hired by a company for an unpredicted period of time while temporary employees are those that are employed for a limited period of time all this depending on the organization employing the person (Collins et al., 1999). Permanent employees are entitled to other benefits like medical covers, pensions and holidays, unlike the temporary ones who may work on full time or temporary time depending on their situation though they may receive some benefits in some instances. Most permanent employees are paid salaries at the end of the month whereas temporary ones may be paid hourly, daily or at the end of a specified time in form of wages.
In addition, permanent employees are protected from unprecedented job termination by well spelled out policies that include an advance notice in the instance of layoffs and procedures of instilling discipline. On the other hand, temporary employees are not protected and can be fired at any time that the employer is discontented with him or her. Permanent employees have the privilege of joining unions that they can enjoy services ranging from financial and social benefits in the time of their employment. Many employees have referred permanent employees as the most committed because they look for career development within the industry while the temporary ones become more committed if there is a chance of a renewal of contract because it acts as a motivating factor (Sobaiah et al., 2012). The permanent employees are mostly common in the public sector while the temporary ones dominate the private sector. Also, the temporary employees are entitled to an overtime pay while the permanent ones rarely get an overtime pay because they are paid at the end of the month. In the recent past, temporary employees have been liked by most Germany logistic companies. Permanent employees in many companies enjoy medical cover at whenever at workplace they are assured of their safety. These health covers are acquired from insurance companies and in the case of sickness a member is taken to hospital under the companies bills. For the temporary staff, few companies’ offer medical covers for them and with this they are never satisfied medically. When it comes to medical satisfaction, permanent employees are the most satisfied because they are protected from job termination, unlike their temporary counterparts who can be fired at any time.
Scope
i. Data sources
In the identification of the topic, research papers and guidelines used for research were selected. They ranged from articles from the internet on this topic, journals, and other related books. All the studies put into consideration the differences between temporary and permanent employees and how health matters and job satisfaction of this two types of staff is looked into. I have to analyze the differences between this two types of staff to come up with their differences that are vital in the research.
ii. Time to be invested in research
The research is estimated to take approximately two weeks to do this research. This is because one is required to visit a company and conduct interviews, issue questionnaires and do observation in the environments where this two types of staff work. The time will also be used to compare the data acquired so as to come up with the correct information from the research.
iii. When field works will take place
The field work to do the research on the differences between temporary and permanent employees is scheduled to start immediately my proposal is accepted and I have acquired all the required resources to start the research.
iv. Number of participants
I have targeted more than fifty participants ranging from the interviews to assistants in the research. The participants have to be free from bias to avoid any aspect of false information.
v. The number of interviews to be conducted.
I will interview an equal sample of both permanent and temporary staff to get a clear knowledge of their differences. I will also interview the employers to know the type of employees whom they prefer most.
Theoretical framework
Job satisfaction can be defined as how one is contented with his or her job. It can depend on how management treats employees and also on individual perception (Gruenberg, 1979). The two types of job satisfaction include the affective one that mainly involves the emotional feeling of an individual about his or her job. The other one is cognitive that deals with the level of contentment regarding some chosen aspects of the job (Spector, 1997). Such aspects include pay and benefits. Depending on the type of employee one is in the organization, that is permanent or temporary the level of job satisfaction varies. Also, the health of employees tends to affect how satisfied they are in their jobs since without good health working is difficult. Thus as job satisfaction is looked into, the health of the employees must be looked into. Permanent employees tend to have the highest levels of job satisfaction than the temporary employees as they enjoy job security.
With this, the research on the difference of job satisfaction levels among the permanent staff as compared to the temporary ones can easily be identified. Before answering my research question on who between the permanent and the temporary employees enjoy good job satisfaction I will have to first conduct the interviews and questionnaires to get the best response from them. Secondly, I will have to analyze all the data got from the field research and also compare it with the theoretical perception of the same to come up with the best answer to my research question. The virtue of job satisfaction being personal will favor my research, methods like use of questionnaires and interviews as each employee will offer personal data.
Limitations
In the process of conducting the research, I expect to face some challenges. These include;
I. Uncooperative employees who may not be willing to offer the correct information.
II. Vagaries of nature like poor climate may affect movements during the research period.
III. Harsh supervisors who may not be willing to stop the operations for the interviews to be conducted.
IV. Insufficient funds to facilitate the process of research.
With this, it will be important to conduct a reconnaissance so as to familiarize me with the field before the ideal fieldwork day. This will help me to minimize the unprecedented bottlenecks on the day of field research.
Delimitations
The research will not involve employees much because they may hide some information regarding their organizations but wholly deal with the employees alone. The research will also make use of only closed questionnaires instead of the open-ended once so that the respondents can give only the required data.

References
Collins, V. R., & Mirza, M. (1999). Working in tourism: The UK, Europe & beyond, for seasonal and permanent staff. Oxford: Vacation Work.
Gruenberg, M. M. (1979). Understanding job satisfaction. New York: Wiley.
Spector, P. E. (1997). Job satisfaction: Application, assessment, cause, and consequences. Thousand Oaks, Calif: Sage Publications.
Sobaih, Abu Elnasr E, Ritchie, Caroline, Coleman, Phil, & Jones, Eleri. (2012). Part-time restaurant employee perceptions of management practices: an empirical investigation. Emerald Group Publishing Limited.

Homework, Your contract responsibilities include preparing a proposal for a data communications network

June 12th, 2020

BackgroundYou have obtained a contract position at a large organisation. Your contract responsibilities include preparing a proposal for a data communications network to serve the organisation’s requirements.Network Scope The proposed network needs to serve the organisation’s Central Office and two of its remote offices. The Central Office, located in Brisbane, contains five departments to be served by this network. Each remote office contains four departments to be served. The North Office is located in North Lakes, about 30 Km from the Central Office. The South Office is located in Helensvale, about 65 Km from the Central Office. This network is mainly intended for administrative purposes.Objectives of the NetworkThe network to be designed to achieve several specific operational objectives:1. Secure Service: The main objective of this network is to provide secure administrative computing service to the Central Office and two adjoint offices. It is to be designed to be functionally and physically isolated from access by people not employed by the organisation so as to minimize the risk of unauthorized use.2. Integration and Update: Presently there are many LANs in the organisation, but much of the equipment is out of date, many of the LANs are incompatible with each other, and not connected in a system-wide network. Your proposal should describe a WAN that integrates and updates these LANs to support productive collaboration across the system.3. Versatile Information Processing: The network should enable users to retrieve, process, and store ASCII and non-ASCII text, still graphics, audio, and video from any connected computer.4. Collaboration: The network should combine the power and capabilities of diverse equipment across the three offices to provide a collaborative medium that helps users combine their skills regardless of their physical location. A well-designed network for this organisation will enable people to share information and ideas easily so they can work more efficiently and productively.5. Scalability: The design should be scalable so that more remote offices can be added as funding becomes available without having to redo the installed network.Intended UsersThe primary users of the network at the Central Office are three administrators, three secretaries, ten members of the Technical Department, eight members of the Human Resource Department, six members of the Finance/Accounting Department, and three members of the Computer Services Department. At the remote offices the primary users are four administrators, four secretaries, four members of the Computer Services Department, sixteen members of the Human Resource Department, and two members of the Finance/Accounting Department. Clients and the public are secondary users of the network in that they may receive information produced on the network, but they will not directly use the network.Data TypesThe types of data served by the network will be reports, bulletins, accounting information, personnel profiles, and web pages. The majority of the data will be text (ASCII and non-ASCII), but there will be some still graphics and possibly a small amount of voice and video (primarily for PCbased teleconferencing).Data SourcesData will be created and used at all end stations on the network. The data will be produced by software applications in Windows 10, primarily Dream Weaver and Office 2016 Professional (Word, Excel, Access, PowerPoint, and Outlook). Other data sources to be supported on at least a limited basis will Windows 10 Accessories (Paint, Notepad, etc.), NetMeeting, Media Player, and Photoshop. Note that the network will be not accessible from outside.Network Needs AnalysisNumbers of Users and Priority Levels At the Central Office, the users will be administrators, secretaries, and members of four departments. At the adjoint offices, the users will be administrators, secretaries, and members of three departments. The maximum estimated number of users on the network at any given time is 100, including 33 regular users in the Central Office, 30 regular users in the North Office, 30 regular users in the South Office, and seven otherwise unanticipated users.Three priority levels to be supported: management (top priority), user (medium priority), and background (low priority). Note that these designations do not correspond to administrative levels in the organisation; rather, they are network service levels. Network management processes will receive top-priority service; most network processes will receive medium-priority service; a few processes (e.g., e-mail transfers, backup, etc.) will be given low-priority service. It should be noted that network management will usually consume a small amount of the available bandwidth; this means that management and user processes will usually enjoy identical support. Background processes will also usually receive more than adequate service, but they will be delayed as needed to maintain support for management and user services. Transmission Speed Requirements The network is to be transparent to the users. Thus, remotely executed applications, file transfers, and so forth should ideally appear to operate as quickly as processes executed within an endstation. Interviews with users to ascertain their needs and expectations indicate that an average throughput of 20 mbps per user within each LAN and 10 mbps per user between LANs will more than support the needed performance in most cases (teleconferencing being the possible exception).Load Variation Estimates Interviews with users and observation of LAN use at the three locations yielded data on hourly average and peak loads from January to March, 2020. The data indicate that the highest average traffic volume will occur from 8:00 a.m. to 6:00 p.m., Monday through Friday. The peak network traffic volume is expected at two times during the day: 8:00 a.m. to 12:00 noon and 3:00 p.m. to 5:00 p.m. At night and on weekends the network traffic is minimal except for the daily backups of the PCs to the LAN servers in the remote offices and several batch data transfers anticipated from the remote offices to the Central Office. The data indicate the following network design parameters: • The average required throughput on any LAN during work hours (7:00 a.m. to 6:00p.m.) will be only about 0.2 mbps.• The average required throughput on the WAN during work hours (7:00 a.m. to 6:00 p.m.) will be only 0.04 mbps.• The peak expected traffic load on any LAN will be about 10.4 mbps.• The peak expected traffic load on the WAN will be about 6.4 mbps.Note: to avoid user complaints, the network should be designed for the peak traffic loads, not the average throughput.Storage Requirements Storage requirements need to be large enough to store all data. Interviews and observations of users’ present and anticipated storage requirements indicate that each user will need an average of 100 MB of server space (in addition to secondary storage on local PCs); the maximum estimated server-side storage requirement per user is about 1 GB. Additionally, the network operating system will occupy about 500 MB on each LAN server. Taking price-performance issues into account, each PC will have a minimum storage capacity of 10 GB, each LAN server will have a minimum storage capacity of 20 GB. A main data server in the Central Office will have a 36 GB capacity.Reliability RequirementsIn keeping with user expectations and industry standards, both the LANs and the WAN are expected to operate at 99.9% uptime and an undiscovered error rate of .001%.Security Requirements Firewalls to be used so unauthorized users will be restricted. Part of the security will be Users accounts and passwords that will give limited access. There should be different access capabilities for network managers and users.Design Assumptions• Wireless internet access should be made available in all departments.• In response to the organisation’s enquiries regarding their IPv4 needs, their ISP has advised that they can use this block of public IPv4 addresses: 186.132.192.192/27• Each department at each location should maintain an individual network, while all the departments should be able to communicate with each other, and any necessary internal server(s), and access the Internet. Network management protocols must be deployed to make identification and management of network devices easy and convenient. Assignment Requirements and DeliverablesYou are to prepare a formal proposal report for the above organisation, which should include the following details (Activities 1-5) in the main body of the report. The proposal should cover the details asked in Activities 1-5 for the above case study and justify suitable technologies to enhance the organisation’s communications and networking within and outside the organisation. Be sure to include all necessary information and network diagrams.After you have gathered all the appropriate information, it is time to do some research. You must now use your knowledge and research skills to propose an appropriate technical solution for their limited budget and time requirements. When developing a plan, it is often easier to start at the end user and then work back toward the network and any shared resources, and then finally, any external connections to the Internet or other networks.Activity 1Develop and demonstrate a complete network layout plan for the organisation. Because the organisation has limited funds available for this project, it is important that where possible only equipment designed for the small business and home markets be used. Activity 2After the network layout has been selected, it is time to look at the workflow and decide on any shared components and network technology to support this workflow. This can include such things as shared printers, scanners, and storage as well as any firewalls, routers, switches, access points and integrated service routers. When planning a network infrastructure, always plan into the future. For larger companies, because it is usually a substantial investment, the infrastructure should have a lifetime of about 10 years. For smaller companies and home users, the investment is significantly less and change occurs more frequently.Using the Internet and locally available resources, research available high-speed colour copiers/printers/scanners, access points, switches and routers that are suitable for the organisation’s offices. In your report include a write-up for at least two devices of each equipment type and provide supporting details to justify your selection. Included in the write-up should be a description of the device, including the manufacturer, the model, the seller, the cost, and a brief summary of the manufacturer specifications.Activity 3It is necessary to plan the Internet connectivity, and which services are provided by the ISP and which services must be provided in-house. Larger companies usually provide services in-house, while small businesses and individuals normally rely on an ISP to provide these services. Most ISPs offer a variety of services and service levels. In your report include the specifics of internal services (networking and security) that must be offered by the organisation, and the devices and protocols that provide these services. Provide supporting details to justify your answer.Activity 4After the equipment has been selected and the required services planned, the physical and logical installation is planned out. Physical installation includes the location of equipment and devices, along with how and when these devices are to be installed. In the business environment, it is important to minimize disruption of the normal work processes. Therefore, most installations, changes, and upgrades are done during hours when there is minimal business activity. Physical installation should also consider such things as adequate power outlets and ventilation, as well as the location of any necessary data drops. Planning for physical installation is out of the scope of this assignment.Equally as important as planning the physical layout of the network and equipment is planning the logical layout. This includes such things as subnetting, addressing, naming, data flow, and security measures. Servers and some network devices are assigned static IP addresses to allow them to be easily identified on the network and to also provide a mechanism for controlling access to these devices. Most other devices can be assigned addresses using DHCP. Note that both a server and its backup server must be accessed via the same public IP address.Devise subnetting and addressing schemes for the organisation. The schemes must provide servers and some other network devices (where necessary) with a static address and allow all other hosts to be configured via DHCP. Identify and list the details of all internal and external subnets and the ranges of useable IP addresses (and default gateways) for DHCP within each internal subnet, and also assign an appropriate IP address (and default gateways) to network devices that should use a static IP address. Use the following tables to report the required setups for subnetting and addressing.Table 1: Subnetting Table (to be used for all the organisation’s internal and external subnets)Subnet Network Address Subnet Mask First Usable IP Address Last Usable IP Address Broadcast Address … Table 2: Addressing Table (to be used for all devices/interfaces with static IP addresses)Device Interface* IP Address Subnet Mask Default Gateway^ … *Can be NIC, Serial, FastEthernet, etc. ^ Use N/A for routers.Note: Show all your workings for subnetting in an appendix.Activity 5The organisation is concerned that their files and resources may be vulnerable through the wired or wireless networks. Explain how the use of public and private IP addresses together may address this concern and provide a security plan that allows only the organisation’s employees to connect to the network and gain access to company information and resources. Prepare and Present the ProposalAll gathered information and the proposed technical solution must be assembled into a format that makes sense to the organisation who has asked you to provide a solution. The formal report usually contains many different sections, including:• Title page• Executive Summary• Tables of Contents, Figures and Tables• Introduction • Project proposal, comprising needs statement, goals and objectives, methodology and timetable, technical solutions and evaluation, budget summary, and future plans.• Recommendations and Conclusions• Bibliography and List of Refences• Appended informationThe report is often presented to various groups for approval. When presenting the report, present it in a confident, professional, and enthusiastic manner. The report must be technically accurate and free from spelling and grammatical errors. The report must be written in Microsoft Word. Multiple files will not be accepted.Report Structure: The report must be formatted using the following guidelines: • Paragraph text: Use 11-point Calibri single line spacing • Headings: Use Arial in an appropriate type size • Margins: 2.0cm on all margins • Header: Report title• Footer: Page numbering – up to and including the Table of Contents use roman numerals (i, ii, iii, iv), restart numbering using conventional numerals (1, 2, 3, 4) from the first page after the Table of Contents.• Title Page must not contain headers or footers• The report is to be structured as a formal business report.• Refer to the following references for details on report structures: Summers, J., Smith, B. (2014), Communications Skills Handbook, 4th Ed., Wiley, Australia Referencing: The report is to include appropriate references and these references should follow the Harvard method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers, recognized experts in the field or vendors’ and service providers’ websites.

CS 170: Introduction to Computer Science I Homework Assignment

June 12th, 2020

1
CS 170: Introduction to Computer Science I
Homework Assignment #4
Submission instructions
If you submit with a teammate, please make sure that both of you submit separately in canvas. No email submissions
are accepted. No late submissions are accepted.
General instructions and hints
In those problems asking you to write a method, always call the method several times to test that it works properly
with a variety of different values for the parameters. Test it on more examples than the ones shown in this handout.
Even if the problem asks you for just one method, you can always write additional helper methods to simplify and
organize your code. Make sure you write comments to explain your code.
Comment requirements: Always comment the top of each method (what the method does, the meaning of the
input parameters, the meaning of the output value). Write comments within the methods to explain the
strategy you are using to solve the problem, and to clarify blocks of code that may be difficult to understand.
Problem 1: Valid password (7 points)
Write a method named isValidPassword that takes a string as input parameter, and returns true if that string represents
a valid password, or false otherwise. A password is considered valid if (and only if) its length is between 6 and 8
characters (inclusive), and all of the requirements below are satisfied. You must use regular expressions to check if
the following conditions are met:
• The password starts either with an upper case letter or with one of the following special characters: ! @ #
• The password’s first character is followed by at least 5, and at most 7, word characters (i.e., letters, digits,
or underscore)
• The password’s last character is not equal to any of the following special characters: * . %
• The password does not contain any whitespace characters
Examples:
isValidPassword(“Tr7s6d_”) returns true
isValidPassword(“@abc2-bc”) returns false
isValidPassword(“ALphaa%”) returns false
Rubric:
programs that do not compile get zero points
+7 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-2 incorrect method signature (method name, number of parameters, and types of parameters)
-2 if there are no test cases
-2 if there are no comments, insufficient comments, or bad usage of comments
Problem 2: Valid email address (5 points)
Write a method named isValidEmail that takes a string as input parameter, and returns true if that string represents a
2
valid email address, or false otherwise. An email address is considered valid if it follows this format “user123@domain.ext”, where:
• user123 represents a sequence of word characters (i.e., letters, digits, or underscore) whose length is between
1 and 10 (inclusive), but the first character must be a letter
• domain represents a sequence of alphanumeric characters (i.e., letters or digits) whose length is between 1
and 12 (inclusive), but the first character must be a letter
• Exactly one character “@” separates the user name from the domain name
• ext is a sequence of lower case letters only whose length is between 1 and 3 (inclusive)
Examples:
isValidEmail (“user_123@gmail.com”) returns true
isValidEmail (“user123alpha@gmail.com”) returns false
Rubric:
programs that do not compile get zero points
+5 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 3: Valid file name (5 points)
Write a method isValidFilename(String filename, String sys) which returns true if filename represents a valid file name
according to the rules of the operating system in the second parameter sys, or false otherwise. Use regular expressions
to match the following rules for different systems. You can assume that filename will never be passed as an empty
string and will always include a file extension (i.e., file type).
If the operating system sys is “Windows”, then:
• The file name cannot contain any leading or trailing whitespace characters
• The file name cannot contain any of these special characters: / ? < > : * | . ”
• The file name cannot end with com1, com2, …, or com9 (i.e., com followed by exactly one digit between 1
and 9, inclusive)
• The file name is separated from the file extension by exactly one period character
• The file extension can only contain lower case alphabet letters
• The file extension length is between 2 and 6 characters (inclusive)
If the operating system sys is “Mac” or “Linux”, then:
• The file name cannot contain a period “.” character or a colon “:” character
• The file name is separated from the file extension by exactly one period character
• The file extension can only contain alphabet letters (upper case or lower case)
• The file extension length is between 2 and 6 characters (inclusive)
Examples:
isValidFilename(“homework5.java”, “Linux”) returns true
isValidFilename(“hamlet.shakespeare”, “Mac”) returns false
isValidFilename(“rom_com3.txt”, “Windows”) returns false
Rubric:
programs that do not compile get zero points
+5 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
3
from correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 4: Extract movie title (4 points)
Write a method extractTitle that takes a string as input parameter representing IMDB movie information (written in
XML style), and extracts all of the text between the tags and . Your method can assume that there will
be at most one movie title inside the input string.
Example:
extractTitle(“Split (2017)6375308“)
returns “Split (2017)”
Rubric:
programs that do not compile get zero points
+4 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 5: Convert text to American-English or Canadian-English (7 points)
Write a method named convertText(String text, String code, String dictionary) that converts the content of text to either
pure American-English or pure Canadian-English, based on the code value passed in the second parameter, which
you can assume will either be “US” or “CA”. For example, if the passed code is “US”, then your method should
find all words spelled in Canadian-English in text and replace them with their American-English equivalent, and
vice versa.
To know which words to match and replace, your method will use the third parameter dictionary. The content of the
String dictionary should follow the exact format shown below, in which each line starts with a word spelled in
American-English, followed by a tab character “t”, followed by the equivalent Canadian-English spelling of the
same word, and ending with a new line character “n”:
“AmericanWord1tCanadianWord1nAmericanWord2tCanadianWord2n”
For example, below are all valid values for the parameter dictionary:
“colortcolourn”
“colortcolournfavoritetfavouriten”
“colortcolournfavortfavournhonorthonournhumorthumournlabortlabourncentertcentrenmetertmetren”
You can utilize Java’s split method (available for any String variable), to separate the different lines (or words) inside
dictionary. Notice that your method should recognize both uppercase and lowercase characters in matching words
that need to be replaced, but the new words replacing the matches can be taken as-is from the given dictionary. Your
method should return a String containing the new modified text.
Examples:
convertText(“I do not have a favorite COLOUR nor a favorite ice-cream flavour”, “US”, “colortcolournflavortflavourn”)
returns: “I do not have a favorite color nor a favorite ice-cream flavor”
convertText(“I do not have a favorite colour nor a favorite ice-cream flavour”, “CA”,
4
“colortcolournflavortflavournfavoritetfavouriten”)
returns: “I do not have a favourite colour nor a favourite ice-cream flavour”
Rubric:
programs that do not compile get zero points
+7 correct implementation (up to 5 points for a partially correct solution. Zero points if the solution is far
from correct)
-2 for word matching being case-sensitive
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 6: Robust division (5 points)
Write the method sumOfIntegerDiv(int[] a, int n) which takes an array of integers a and an integer n as input and returns
an integer. The return value is calculated by summing up the values that occur when you divide each element by
the preceding element, until you stop at the n-th element in the array.
Your method should be resilient against possible exceptions, such as dividing by zero or attempting to access an
invalid array index. Instead of terminating when these exceptions occur, your method will instead skip the array
index that generated the exception, print a friendly message to the user informing them why this index will be
skipped, then resume computation normally (if possible).
Your method should be able to catch at least two types of exceptions:
• If an ArithmeticException occurred, your method should print the following message (before resuming
computation normally): Cannot divide by zero. Skipping index: index_value
• If an ArrayIndexOutOfBoundsException occurred, your method should print the following message (before
returning the result): Cannot access array at index: index_value
• If any other type of exception occurred, then your method should print:
Something went wrong! Skipping index: index_value
Notice that you should replace “index_value” above by the value of the actual array index.
Examples:
sumOfIntegerDiv({2, 4, 6, 0, 8, 16}, 4) returns 3 (4/2)+(6/4)+(0/6)
sumOfIntegerDiv({2, 4, 6, 0, 8, 16}, 5) returns 5 (4/2)+(6/4)+(0/6)+(16/8)
The second call skips (8/0) and prints a friendly error message to the user:
“Cannot divide by zero. Skipping index: 4”
Rubric:
programs that do not compile get zero points
+5 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 7: Swear filter using regular expressions (7 points)
Remember your swear filter method from Homework 3? Your job now is to implement a swear word filter using
regular expressions. Write a method named swearFilter(String text, String[] swear) that takes two parameters: a String
5
containing some text, and an array of Strings containing a list of “swear words”. Your method will return a String
containing the text contained in the first String, where each “swear word” is replaced by its first character, followed
by a number of stars equal to its number of characters minus two, followed by its last character. For example, if the
swear words are “duck”, “ship”, and “whole”, and the text contains the following story:
A duck was sailing on a ship shipping whole wheat bread. Duck that SHIP!!!
Your method would return:
A d**k was sailing on a s**p s**pping w***e wheat bread. D**k that S**P!!!
Notice that your method should recognize both uppercase and lowercase characters in a swear word. You must use
regular expressions to solve this problem while utilizing Java’s replaceAll method in class String.
Rubric:
programs that do not compile get zero points
+7 correct implementation (up to 5 points for a partially correct solution. Zero points if the solution is far
from correct)
-2 for swear-word matching being case-sensitive
-2 for not maintaining original upper/lower-case
-2 for not matching strings that contain swear words as substrings
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Bonus points: Early submission
If you submit the entire homework no later than 48 hours before the deadline, and the total score on the rest of this
homework assignment is at least 20 points, you will receive 2 bonus points. The bonus points will be added to the
total score of this homework assignment.
Good luck and have fun!

Write a function comp10001huxxy_valid_play() which takes five arguments: play

June 5th, 2020

Write a function comp10001huxxy_valid_play() which takes five arguments:
play, a 3-tuple representing the play that is being attempted; see below for details;
play_history, a list of 3-tuples representing all plays that have taken place in the game so far (in chronological order); each 3-tuple is based on the same structure as for play;
active_player, an integer between 0 and 3 inclusive which represents the player number of the player whose turn it is to play;
hand, a list of the cards (each in the form of a 2-character string, as for Q1) held by the player attempting the play;
table, a list of list of cards representing the table (in the same format as for Q2).
Your function should return a Boolean indicating whether the play is valid or not given the current game state (i.e. the combination of the plays made to date, the content of the player’s hand, and the groups on the table). In this, you only need to validate the state of the table (using comp10001huxxy_valid_table from Q2, which you are provided with a reference implementation of) if the play ends the player’s turn and they have played to the table. Note that play_history, hand, and table all represent the respective states prior to the proposed play being made (e.g. play_history will not contain play).
The composition of the 3-tuple used to represent each play is (player_turn, play_type, play_details), where player_turn is an integer (between 0 and 3 inclusive) indicating which player is attempting to play, and play_type and play_details are structured as follows, based on the play type:
pick up a card from stock (and thereby end the turn): play_type = 0, play_details = None;
play a card from the hand to the table: play_type = 1, play_details = (card, to_group) where card is the card from the hand that is to be played, and to_group is the (zero-offset) index of group in table to play to; in the instance that the card is to start a new group, to_groupshould be set to the one more than the index of the last group on the table (i.e. if there are three groups, the last group will be index 2, so 3 would represent that the card is to be used to start a new group);
play a card from one group on the table to another: play_type = 2, play_details = (card, from_group,to_group) where card is the card to be played from the group, from_group is the (zero-offset) index of the group in table to play card from, and to_group is the index of the group in table to play card to (and, similarly to above, a value of one more than the index of the last group indicates that a new group is to be formed)
end the turn, after playing from the hand or play between groups on the table: play_type = 3, play_details = None.
Note that picking up a card (play_type = 0) implicitly ends the turn, whereas if plays are made from the hand/between groups on the table, an explicit “end of turn” play (play_type = 3) must be used to confirm that the player is ending their turn.
Example function calls are as follows:
>>> comp10001huxxy_valid_play((0, 0, None), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
True
>>> comp10001huxxy_valid_play((0, 1, (‘KC’, 0)), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
True
>>> comp10001huxxy_valid_play((0, 1, (‘KC’, 1)), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
False # invalid group no.
>>> comp10001huxxy_valid_play((0, 1, (‘AC’, 0)), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
False # can’t play card you don’t hold
>>> comp10001huxxy_valid_play((0, 1, (‘KH’, 0)), [(0, 1, (‘KC’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [[‘KC’]])
True
>>> comp10001huxxy_valid_play((0, 1, (‘KD’, 0)), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [[‘KC’, ‘KH’]])
True
>>> comp10001huxxy_valid_play((0, 2, (‘KS’, 1, 0)), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [[‘KC’, ‘KH’]])
False # group/card don’t exist
>>> comp10001huxxy_valid_play((0, 3, None), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0)), (0, 1, (‘KD’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’], [[‘KC’, ‘KH’, ‘KD’]])
True
>>> comp10001huxxy_valid_play((0, 3, None), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
False # attempt to end turn without any plays to table
>>> comp10001huxxy_valid_play((0, 3, None), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [[‘KC’, ‘KH’]])
False # table state not valid
>>> comp10001huxxy_valid_play((0, 3, None), [(0, 1, (‘AC’, 0)), (0, 1, (‘AH’, 0)), (0, 1, (‘AD’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’], [[‘AC’, ‘AH’, ‘AD’]])
False # insufficient points for opening turn
>>> comp10001huxxy_valid_play((0, 1, (‘KS’, 0)), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0)), (0, 1, (‘KD’, 0)), (0, 3, None), (1, 0, None), (2, 0, None), (3, 0, None), (0, 0, None), (1, 0, None), (2, 0, None), (3, 0, None)], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KS’], [[‘KC’, ‘KH’, ‘KD’]])
True
>>> comp10001huxxy_valid_play((0, 3, None), [(0, 1, (‘KC’, 0)), (0, 1, (‘KH’, 0)), (0, 1, (‘KD’, 0)), (0, 3, None), (1, 0, None), (2, 0, None), (3, 0, None), (0, 0, None), (1, 0, None), (2, 0, None), (3, 0, None), (0, 1, (‘KS’, 0))], 0, [‘3S’, ‘8C’, ‘3S’, ‘8S’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’], [[‘KC’, ‘KH’, ‘KD’, ‘KS’]])
True
>>> comp10001huxxy_valid_play((1, 0, None), [], 0, [‘3S’, ‘KC’, ‘8C’, ‘3S’, ‘8S’, ‘KH’, ‘4H’, ‘2C’, ‘6S’, ‘5H’, ‘8C’, ‘KD’], [])
False # wrong playe

CptS 215 Introduction to Algorithmic Problem Solving

June 5th, 2020

CptS 215 Introduction to Algorithmic Problem Solving

Gina Sprint

PA5 Decision Trees (100 pts)
Due:

Learner Objectives
At the conclusion of this programming assignment, participants should be able to:

Understand decision trees
Building from a training set of examples
Classifying new examples
Implement the ID3 tree building algorithm
Understand k-fold cross validation
Visualize trees
Prerequisites
Before starting this programming assignment, participants should be able to:

Write object-oriented code in Python
Implement a tree ADT and common tree algorithms
Write Markdown and code cells in Jupyter Notebook
Create plots with matplotlib
Acknowledgments
Content used in this assignment is based upon information in the following sources:

sci-kit learn machine learning library
Data Science from Scratch by Joel Grus
Overview and Requirements
For this programming assignment, we are going to investigate the accuracy of our ID3 decision tree implementation compared to the decision tree implemented in the sci-kit learn machine learning library. We are going to train the models to classify whether passengers on the RMS Titanic survived the shipwreck or not.

For this assignment, we are going to implement the following:

Write code to read a dataset in from a file
Dataset file name will be specified as a command line argument
Use a pandas Dataframe to read in and store the data
Re-write the ID3 decision tree code from the class notes to make use of an object oriented implementation of a tree
Instead of the True, False, tuple tree representation used by Joel Grus, implement an object oriented tree
Write code to implement K-fold cross validation
Value of K will be specified as a command line argument
Write code to compute classifier evaluation metrics
Accuracy
Precision
Recall
F1 score
Adapt the sci-kit learn example code provided in this document to compare the ID3 decision tree implementation to sci-kit learn’s implementation of a decision tree
Write code to plot the evaluation metric results of code executions with different values of K
The following sections describe each of these steps in more detail.

Note: for this assignment, code one Jupyter Notebook that tells the story of your data science endeavor with results. All python code (e.g. decision tree classes and functions, classification evaluation metrics, sci-kit learn’s DecisionTreeClassifier comparision) are to be implemented in .py files, and the results of the classification are to be included in the Jupyter Notebook.

Program Details
Background: K-fold Cross Validation
In our example in class with the job candidate dataset, we trained our decision tree model on 14 examples. We then tested our model on 4 new examples that the model had not yet seen. We did not use these new examples to update the model. This approach to training and testing is called the holdout method. In the holdout method, the dataset is divided into two sets, the training and the testing set. The training set is used to build the model and the testing set is used to evaluate the model (e.g. the model’s accuracy). One of the shortcomings of this approach is the evaluation of the model depends heavily on which examples are selected for training versus testing.

(image from https://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Supervised_machine_learning_in_a_nutshell.svg/2000px-Supervised_machine_learning_in_a_nutshell.svg.png)

K-fold cross validation is a model evaluation approach that addresses this shortcoming of the holdout method. For K-fold cross validation, the examples are divided into k subsets S=s1,…,si,…,sk and the holdout method is repeated k times. Each iteration i , subset si is held out of the training set. Subsets S−si are used for training and si is used for testing. The average performance of all k train/test trials is computed and evaluated.

(image from https://upload.wikimedia.org/wikipedia/commons/1/1c/K-fold_cross_validation_EN.jpg)

Note: K = 1 is called leave-one-out cross validation.

Background: Evaluation Metrics
While there are several classification evaluation metrics, we are only going to cover four of the most common classification evaluation metrics. For each machine learning model, compute the following metrics:

Classification accuracy: the proportion of correct classifications made out of all classifications made:
Accuracy=#correct#correct+#incorrect

Note: Not useful when classes are imbalanced (e.g. 99% accuracy when 99% of the dataset is the positive class)
Precision (positive predictive value): the proportion of correctly classified positives out of all classified positives:
Precision=#truepositives#truepositives+#falsepositives

A true positive is a positive example that is correctly classified as a positive example during testing
A false positive is a negative example that is incorrectly classified as a positive example during testing
Recall (true positive rate): the proportion of correctly classified positives out of all positives (regardless of classification):
Recall=#truepositives#truepositives+#falsenegatives

A false negative is a positive example that is incorrectly classified as a negative example during testing
F1 score: the harmonic mean of precision and recall:
F1=2(precision×recall)precision+recall

Summarizes a classifier in a single number (however, it is best practice to still investigate precision and recall, as well as other evaluation metrics)
Note: There is a trade-off between precision and recall. For a balanced class dataset, a model that predicts mostly positive examples will have a high recall and a low precision.

Dataset
Download titanic.txt. This dataset is from this site and is a text file containing 2202 examples.

Each row in this file is a comma-separated list of values representing attributes of passengers aboard the Titanic:

class: 0 = crew, 1 = first class, 2 = second class, 3 = third class
age: 1 = adult, 0 = child
sex: 1 = male, 0 = female
survived: 1 = yes, 0 = no
Note: the first line in the file is the header describing the order of the attributes. You can read this in as the header when using pandas.read_csv(). Each line after the header represents a single passenger’s attributes.

sci-kit learn Example Code
Below is example code that trains and tests a decision tree using sci-kit learn’s DecisionTreeClassifier. This code is going to work with the Iris dataset that is a common dataset used in the library’s example code. For this dataset, there are 3 class labels representing different types of iris’ (flowers): Setosa, Versicolour, and Virgninica. There are four attributes for each class example: sepal length, sepal width, petal length, petal width. There are 150 examples in the dataset.

from sklearn import tree
from sklearn.datasets import load_iris

iris = load_iris()
clf = tree.DecisionTreeClassifier()

# iris.data is 2D numpy ndarray of 150 examples.
# Each example is a list of 4 attribute values.
print(iris.data.shape, type(iris.data))

# iris.target is 1D numpy ndarray of 150 labels.
print(iris.target.shape, type(iris.target))

# fit is for training
clf = clf.fit(iris.data, iris.target)
# predict is for classifying
predicted_integer = clf.predict(iris.data[:1, :])
# get the string label for the predicted class
predicted_class = iris.target_names[predicted_integer]
print(predicted_class)
(150, 4)
(150,)
[‘setosa’]
Since we will be evaluating our ID3 decision tree implementation in K-fold cross validation, we will need to get similar results for the DecisionTreeClassifier. We need mean accuracy, precision, recall, and F1 score for the classifier in K fold cross validation.

K fold cross validation is computed with the cross_val_score() function, which by default for a DecisionTreeClassifier evaluates the model by mean accuracy (shown in the example below). The remaining metrics can be specified by the scoring parameter of cross_val_score() (simply set scoring to one of the following: scoring=precision, scoring=recall, scoring=f1). Since the Iris dataset is a multi class problem (3 classes), the example below does not include precision, recall, or F1 score.

from sklearn.model_selection import cross_val_score

iris = load_iris()
clf = tree.DecisionTreeClassifier()

# model, examples, labels, K
accuracies = cross_val_score(clf, iris.data, iris.target, cv=5)
# by default returns the mean accuracy
print(accuracies) # 5 scores, one for each iteration of K fold cross validation
[ 0.96666667 0.96666667 0.9 0.93333333 1. ]
Plotting
For values of K = 1, 2, …, 20, run the ID3 code and the DecisionTreeClassifier code, storing each models performance in terms of the previously specified evaluation metrics. Plot each of these evaluation metrics (Y-axis) separately as a function of K (X-axis).

Write up your observations of the plots.

What do you notice?
How does K impact each of the evaluation metrics?
How does the different tree implementation affect the performance?
Bonus (10 pts)
(5 pts) Programmatically visualize the ID3 decision tree and the sci-kit learn decision tree. You are free to use the example code from class (using networkx and pydot) or you can use another Python library. Include the plots as .png or .pdf files in your final zip file.
Note: You may want to check out: sklearn.tree.export_graphviz (uses Graphviz)
(5 pts) Compare the ID3 decision tree and the sci-kit learn decision tree to at least two other classification models in sci-kit learn. I recommend checking out the following algorithms:
K-nearest neighbors (KNN)
Logistic regression
Naive Bayes’
Include plots for the results for these classifiers against the ID3 decision tree in your final zip file.

Submitting Assignments
Use the Blackboard tool https://learn.wsu.edu to submit your assignment. You will submit your code to the corresponding programming assignment under the “Content” tab. You must upload your solutions as _pa4.zip by the due date and time.
Your .zip file should contain your .ipynb file, .py files (tree class code), and your .txt files used to test your program.
Grading Guidelines
This assignment is worth 100 points + 10 points bonus. Your assignment will be evaluated based on a successful compilation and adherence to the program requirements. We will grade according to the following criteria:

5 pts for reading information in from command line
5 pts for using a pandas Dataframe to store the dataset
15 pts for correct object oriented tree implementation and ID3 adaptation
15 pts for correct K fold cross validation implementation
20 pts (5 pts/each) for correct evaluation metric
10 pts for utilizing the sci-kit learn DecisionTreeClassifier implementation
15 pts for plotting
10 pts for write up of observations
5 pts for adherence to proper programming style and comments established for the class

CS 170: Introduction to Computer Science I – Summer 2020 Homework Assignment #3

June 5th, 2020

1
CS 170: Introduction to Computer Science I – Summer 2020
Homework Assignment #3
Submission instructions
Submit your assignment as a zip file and save each program as .java file. Name the program based on problem
numbers. If you submit with a teammate, select team size 2. No emailsubmissions are accepted. No late submissions
are accepted.
General instructions and hints
In those problems asking you to write a method, always call the method several times to test that it works properly
with a variety of different values for the parameters. Test it on more examples than the ones shown in this handout.
Even if the problem asks you for just one method, you can always write additional helper methods to simplify and
organize your code. Make sure you write comments to explain your code.
Comment requirements: Always comment the top of each method (what the method does, the meaning of the
input parameters, the meaning of the output value). Write comments within the methods to explain the
strategy you are using to solve the problem, and to clarify blocks of code that may be difficult to understand.
Problem 1: Mid semester survey (2 points)
Go to the following website and complete the survey. The survey should be done individually: each team member
will fill in a separate survey. To earn credit for this survey, all members of the team should submit it. If one of the
teammates does not submit, the whole team will lose the points.
The answers to the survey will not be graded: you will receive credit just for completing it.
Problem 1 rubric:
2 points for either the individual, or if a team, both team mates complete the survey.
No points if any member did not complete the survey.
Problem 2: Average length (4 points)
Write a method named avgLength which takes an array of Strings as an input parameter and returns a double. The
method should calculate and return the average length of all the strings in the array.
Examples:
avgLength(new String[]{“Hello”, “Q”}) returns 3.0
avgLength(new String[]{}) returns 0.0
avgLength(new String[]{“Hello”, “Goodbye”}) returns 6.0
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
2
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 3: Sum of diffs (4 points)
Write a method named sumOfDiffs which takes an array of integers as input and returns an integer. The return value
is calculated by summing up the values that occur when you subtract each element from the preceding element.
Examples:
sumOfDiffs(new int[]{3, 4, 5}) returns -2 (3-4 + 4-5)
sumOfDiffs(new int[]{4, 1, 19, 6}) returns -2 (4-1 + 1-19 + 19-6)
sumOfDiffs(new int[]{}) returns 0
sumOfDiffs(new int[]{3, 0, -1}) returns 4 (3-0 + 0-(-1))
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient essay papers
comments, or bad usage of comments
Problem 4: Without duplicates (4 points)
Write a method named withoutDuplicatesthat takes an array of integers and returns a copy of the array without any
duplicate elements.
Examples:
withoutDuplicates(new int[]{1, 2, 3}) returns {1, 2, 3}
withoutDuplicates(new int[]{1, 2, 1, 1, 3, 2, 3}) returns {1, 2, 3}
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 5: Reverse copy (4 points)
Write a method named reverseCopy that takes an array of integers and returns a copy of the array with its elements
in reverse order.
Example:
reverseCopy(new int[]{1, 2, 3}) returns {3, 2, 1}
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
3
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 6: Reverse in place (4 points)
Write a method named reverse that takes an array of integers and reverses the order of its elements. The original
array is modified and the method doesn’t return anything.
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 7: Tally vowels (4 points)
Write a method named tally that takes a String and returns an array of 5 integers containing the frequencies of the 5
vowels (a, e, i, o, u) in the input string. Uppercase and lowercase vowels are counted in the same way.
Example:
tally(“HEY! Apples and bananas!”) will return: {5, 2, 0, 0, 0}
Rubric:
programs that do not compile get zero points
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 8: Student averages (4 points)
Write a method named studentAverages which takes a 2D array of integers as input. Each column in the 2D array
is an assignment and each row is composed of grades for a particular student (see below for an example). Your
method should return an array of doublesrepresenting the grades for each student. Youmay assume each assignment
is scored out of 100 points.
Quiz 1 Quiz 2 Quiz 3
Maggie Simpson 50 100 0
Lisa Simpson 100 100 80
studentAverages(new int[][]{{50,100,0}, {100,100,80}}) returns {50.0, 93.33333}
Rubric:
programs that do not compile get zero points
4
+4 correct implementation (2 points for a partially correct solution. Zero points if the solution is far from
correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 9: Credit card checksum (5 points)
Most e-commerce websites these days take credit cards. Users must enter their credit card number and the merchant
verifies that the number is valid. Then Visa, Mastercard, or AmEx process the payment to the merchant and pass
the bill along to the user. However, users often mistype their credit card number by one or two digits. These
common errors are why credit cards are designed with a secret. Using just the credit card number, we can detect
(most) mistakes and errors caused by user mistakes/mistypes. The credit card number contains an error control
code called a “checksum”. Specifically, the credit card number is formatted to comply with a Luhn-10 checking
algorithm.
The Luhn-10 algorithm is a weighted algorithm. Each digit in the credit card number is multiplied by a weight.
These weights are then summed, forming the checksum. The checksum is divided by 10. If the remainder is 0, the
credit card number is valid. If the remainder is NOT 0, the user made an error and can be prompted to re-enter their
credit card data. The weighting for the Luhn-10 algorithm is as follows:
Beginning with the first (i.e., leftmost) digit in the credit card, every other number is multiplied by 2. If the product
results in a 2 digit number (e.g., 6 x 2 = 12) then the individual digits (e.g., 1 and 2) are added to the checksum. The
remaining digits of the credit card number are simply added to the checksum. That is, their weight is 1.
Some examples are given below, but this algorithm will work with your Visa or Mastercard number. Try it!
Write a method named luhnChecksum which takes an array of integers as an input parameter and returns the integer
checksum computed by the above algorithm.
Examples:
luhnChecksum(new int[]{4,5,6,3,9,2}) returns 30 (see below for full calculation)
luhnChecksum(new int[]{4,9,9,1,6,5,7}) returns 40 (see below for full calculation)
Example 1: 456392
digit: 4 5 6 3 9 2
multiplied by: 2 1 2 1 2 1
product: 8 5 12 3 18 2
checksum: 8 + 5 + 1+2 +3 + 1+8+ 2 = 30
Conclusion: This is a valid number since 30 % 10 == 0
Example 2: 4991657
digit: 4 9 9 1 6 5 7
multiplied by: 2 1 2 1 2 1 2
Product: 8 9 18 1 12 5 14
checksum: 8 + 9 +1+8 + 1 + 1+2 +5 + 1+4 = 40
Conclusion: This is a valid number since 40 % 10 == 0
Rubric:
programs that do not compile get zero points
5
+5 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Problem 10: Swear word filter (5 points)
Write a method named swearFilter(String text, String[] swear) that takes two parameters: a String containing some
text, and an array of Strings containing a list of “swear words”. Your method will return a String containing the text
contained in the first String, where each “swear word” is replaced by its first character, followed by a number of
stars equal to its number of characters minus two, followed by its last character. For example, if the swear words
are “duck”, “ship”, and “whole”, and the text contains the following story:
A duck was sailing on a ship shipping whole wheat bread. Duck that SHIP!!!
Your method would return:
A d**k was sailing on a s**p s**pping w***e wheat bread. D**k that S**P!!!
Notice that your method should recognize both uppercase and lowercase characters in a swear word.
Rubric:
programs that do not compile get zero points
+5 correct implementation (up to 3 points for a partially correct solution. Zero points if the solution is far
from correct)
-2 for swear-word matching being case-sensitive
-2 for not maintaining original upper/lower-case
-2 for not matching strings that contain swear words as substrings
-1 incorrect method signature (method name, number of parameters, and types of parameters)
-1 if there are no test cases
-1 if there are no comments, insufficient comments, or bad usage of comments
Bonus points: Early submission
If you submit the entire homework no later than 24 hours before the deadline, and the total score on the rest of this
homework assignment is at least 20 points, you will receive 2 bonus points. The bonus points will be added to the
total score of this homework assignment.
Good luck and have fun!

TIME AND FORECASTING

June 2nd, 2020

Title: TIME AND FORECASTING
1. What are the benefits of deseasonalizing data?
Data deseasonalization is the process of stripping off the seasonal patterns from time-series data when they are being released to public databases (Subanar, 2007). Data deseasonalization is also referred to as seasonal adjustment. For one to get a goodness-of-fit measure that separates the impact of the independent variables, one has to estimate his or her model with deseasonalized values for both the independent and the dependent variables.
Data deseasonalization is very crucial as it provides a more understandable series for analysts predicting the news that is contained in the time series of interests. Secondly, data deseasonalization facilitates the comparison of long-term and the short-term movements among nations and sectors of the economy (Sugiyama and Alexandre, 54). It also plays a vital role in supplying users with the required input for a business cycle analysis and also detecting the turning points as well as decomposition of the trend cycle. Data deseasonalization also helps in applying quality control through both the input and the output orientations that in turn allow for a better comparability with other methods and series.
2. Why is forecasting important for business?
Business forecasting is the process of predicting the possible future developments in the business in relation to sales, profits, and expenditures (Hanke et al. 1981). Forecasting I very critical in every business as it helps the business in planning as it is very hard to come up with a good business plan without first doing business forecasting by use of industry statistics so as to come up with the best forecast possible.
Business forecasting also helps a business in becoming successful since it contributes a lot to the success of a business. This is because one will make better decisions in the prediction of the future performance of a business if he or she has a better understanding of its historical data (Barker and Joel, 1993). Also, business forecasting will offer better management to the business since there exist external factors that affect a business and the process of forecasting helps a manager to handle the negative effects of the launch or expansion of a business.

References
Barker, Joel Arthur. Paradigms: The Business of Discovering the Future. New York, NY: HarperBusiness, 1993.
Hanke, John E., and Arthur G. Reitsch. Business Forecasting. Boston: Allyn and Bacon, 1981.
Subanar, Subanar; Mathematics Department, Universitas Gadjah Mada, Yogyakarta, Indonesia, and Suhartono, Suhartono; Statistics Department, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. THE EFFECT OF DECOMPOSITION METHOD AS DATA PREPROCESSING ON NEURAL NETWORKS MODEL FOR FORECASTING TREND AND SEASONAL TIME SERIES. Institute of Research and Community Outreach – Petra Christian University, 2007. .
Sugiyama, Alexandre Borges. Modeling foreign exchange volatility with intraday data. The University of Arizona, n.d. .

Cultural Appropriation

June 2nd, 2020

Title: Cultural Appropriation
In order to define what cultural appropriation, it is advisable that an individual is able to understand the meaning of culture which is basically a diverse set of intangible social aspects of life composed of beliefs, language system and communication. On the other hand, appropriation is defined as unlawful taking of someone’s traditional language, cultural practices and artifact’s without their permission. In general, Cultural appropriation is when an individual or a group of people decides to adopt the culture of the others especially those who are systematically oppressed. However, the practices is considered to be a wrong act, for example, pop culture is considered stereotypes because of its effects especially when emitted by the children to distinguish different races and culture and implied the power dynamic between the different cultures.
Cultural appropriation is as bad as it makes people feel bad for some reasons that are easily understandable. For example, dreadlocks that are considered a good thing by the blacks are not allowed to appear with them in the workplace due to the unprofessional way that they appear. The thing that makes them feel bad is when they open social media platforms and see their white counterparts in deadlocks during fashion models.
Cultural appropriation is also bad as it removes the actual meaning of various things. This is because it tales things that were previously meant for other things and removes their meaning and then commercializes them. Moreover, people rarely know the inventor of the thing that cultural appropriateness tries to demean. For example, black genres of music, china, Islam, headdress among others lack the application of a specific race as it can be anything such as race, movements or countries.
Another thing that makes cultural appropriateness bad is that whenever one makes an invention, he or she is not in the position to commodity his or her invention and make a profit out of it. When another person comes and sees the items and then builds it up, he takes the item and the individual will have to accept the defeat as he or she has no rights to own it.
On the other hand, cultural exchange, is basically the exchange of artists or athletes among others between two nations such that they promote mutual understanding towards each other. Therefore, despite advocating for cultural exchange there are cultures that are so strict on their dressing. For instance, the Muslim women have a form of dressing that cannot be copied by anyone just for fun as it is religious.
Also, various cultures are known to take various aspects of other people’s cultures and pass them through. An example is the culture of circumcising boys in some African nations that was passed from the Cushitic groups to other groups of people found in Africa. Therefore, it is possible and good for communities to pass some of their good cultures to others. However, those that are have different meanings to various cultures need to be respected by doing those cultures such as dressing cautiously
In conclusion, it is important for various people to have a good understanding of other people’s cultures and despite going through cultural exchanges, have respect and understanding of various cultures of others.