Thursday, June 01, 2017

Big Data and Privacy

In January of 2014, President Obama commissioned a 90-day study on the effects Big Data on government, individuals, businesses, and consumers (White House, 2014).  The report evaluated the state of Big Data and its sociotechnical implications.  It included several policy recommendations. 
The policy recommendations included:
·         Advance the Consumer Privacy Bill of Rights
·         Pass national data breach legislation
·         Extend privacy protections to non-U.S. Persons
·         Ensure data collected on students in school is used for educational purposes
·         Expand technical expertise to stop discrimination
·         Amend the Electronic Communications Privacy Act
While these recommendations are favorable to the protecting the rights of individuals in with regards to Big Data, they do not go far enough.  As is apparent in recent days, Presidential orders and agency policies are a temporary measure.  To ensure long-term protection legislation is required, or perhaps a constitutional amendment.  While the Fourth Amendment is currently being interpreted as a right to privacy, that may also change.  The United States needs a constitutional amendment stating the rights of the population with regards to privacy.
It is reasonable for an organization, government or company, to be required to disclose data collected on individuals.  For example, if a college chooses to track information on its students the students should be aware of the information being retained, who is viewing that information, and for what purpose.  The balance between advancing knowledge and the right to privacy should be addressed through a variation of informed consent.
The report does accurately recognize the opportunities that Big Data enables. There are many opportunities where analyzing medical information can lead to health-related improvements. However, people are typically very concerned about privacy and access to health information.  There are many instances where an individual keeps a health condition private, and it only becomes public knowledge after the individual’s death.
Until legislation, it is the burden of each to protect their privacy.  There are many ways to protect privacy online (Consumer Reports, 2017).  Some steps include avoiding free Wi-Fi, locking devices using longer PINs, enabling automatic updates of devices, using a password manager (such as LastPass), and monitoring social media standings (such as group membership and privacy settings).  Many recommendations (such as shredding documents with sensitive data) are not new.  More advanced approaches to ensuring privacy include the use of anonymizing browsers, such as the Tor Browser (Griffith, 2017).
            Privacy is viewed by many as an inalienable right.  If an individual is willing to forfeit that right in exchange for some good or service (such as better movie recommendations) that should be a conscience decision.  As Big Data enables greater value to marketers (and other organizations), there is greater likelihood that the consumer's rights may be compromised.  While executive orders may be an effective short-term solution, legislation should be passed and kept current to protect the rights of the populous.



References
Consumer Reports. (2017). 66 Ways to Protect Your Privacy Right Now.   Retrieved from http://www.consumerreports.org/privacy/66-ways-to-protect-your-privacy-right-now/

Griffith, E. (2017). How to Stay Anonymous Online.   Retrieved from http://www.pcmag.com/article2/0,2817,2363302,00.asp

White House. (2014). Big Data: Seizing opportunities, preserving values (report for the President). Washington DC, USA: Executive Office of the President.[WWW document] http://www. whitehouse. gov/sites/default/files/docs/big_data_privacy_report_may_1_2014. pdf.


SleepTrackCam

A video created for an assignment in CS875


SleepTrackCam


Utilizing the Animoto web site a simple video was created (Animoto, 2017).   The default music utilized was royalty free. Clipart used in the video was retrieved from a royalty free clipart collection website (Openclipart, n.d.).  The approach utilized was to tell the story of a fictions Internet of Things (IoT) device named SleepTrackCam.  The idea is that a device with a camera, microphone, and various sensors would monitor sleep.  To avoid privacy and security concerns processing would occur locally (edge computing).  The video has been published to my blog  http://alandennis.blogspot.com/2017/06/sleeptrackcam.html for those who are interested.
Having used Animoto, it raises questions regarding how presentations should be done.  Typically, a PowerPoint presentation has a great deal more text on each slide than Animoto allows.  Perhaps that is beneficial.  The use of images and simple text slides may transfer knowledge in a more succinct way.  The process requires considerable thought, comparable to expressing a complex idea in a tweet.



References
Animoto. (2017). Animoto.   Retrieved from https://animoto.com/dashboard

Openclipart. (n.d.). Clipart - High Quality, Easy to Use, Free Support.   Retrieved from https://openclipart.org/




Friday, May 19, 2017

Unlimited Time and Money Wish-List

What would you do with unlimited time and money? This post answers that question classifying the items into five categories: education, job/research, philosophical/religion, travel, and home.

Education

·         Complete the Doctorate in Computer Science
·         Learn to play the piano
·         Learn to compose music
·         Have the time to read all assigned readings in my coursework
·         Teach
·         Make an impact on the young, so they can see the world is full of choices and diverse destinations
·         Take more math classes
·         Take more statistics classes
·         Take history classes, and pay attention this time
·         Learn Spanish

Job/Research

·         Research a life changing technology, such as a cure for a terminal illness.
·         Leave a mark on the world; be it an interesting paper, idea, or book.
·         Write another non-fiction book
·         Write a fiction book
·         Master a topic
·         Work where research is the most important element
·         Publish articles
·         Go to conference, present, and be included in the proceedings
·         Share my thoughts
·         Make the future happen, rather than just think about what might be

Philosophical/Religion

·         Become the sort of person who wakes early, drinks coffee on the porch, and thinks a lot
·         Visit old churches, to take in the atmosphere
·         Visit a cathedral, and contemplate the lives spent building it.
·         Learn to paint, and express faith in art
·         Re-read the Bible
·         Read the major religions works
·         Take another philosophy class, and pay attention this time
·         Learn more about Hinduism
·         Learn more about Buddhism
·         Take time to pause, enjoy the moments, and reflect on the past

Travel

·         Run a half-marathon in all fifty states.
·         Spend a long vacation in Ireland.
·         Explorer Alaska, slowly, so that the true measure of the place can be felt.
·         Go to Maine
·         Climb Machu Picchu
·         Go to Yellowstone National Park
·         Go to Iceland
·         Spend time in Australia
·         London
·         Paris

Home

·         Build a log cabin (already a work in progress)
·         Have a study with bookshelves, leather chairs, and a good reading light.
·         Build an outdoor kitchen
·         Grow grapes and make wine
·         Grow berries, such as blueberries.
·         Have a pond/lake with fish and ducks
·         Have a dock on a pond
·         Build a treehouse, for young and old alike.
·         Have a deck with lots of comfortable chairs
·         Have a garden, and actually eat the stuff I grow

Conclusion


Most of the items in these lists do not require unlimited money or time.  As with many things, the issue is about focus and priority.  

Saturday, May 06, 2017

Quantitative and Qualitative Literature Reviews

Quantitative and qualitative literature reviews serve different purposes.   This post discusses the content and structure of each.  A quantitative study is not complete until the report is written (McGraw-Hill Companies Inc, 2006a).  It is generally intended for scholars and students.  Two important elements of a quantitative study are the introduction and the literature review.

Quantitate Study

The introduction is generally one or two paragraphs in length.  Its purpose is to frame the study and state the intention. The introduction should engage the reader, encouraging them to continue reading. 
After the introduction, the report contains the literature review.  The purpose of a literature review for quantitative research project is to further frame the research area.  The goal is to put the study into perspective.  It may include the history of the variables being studied.  The literature review should beyond a description of literature to include analysis, synthesis, and possibly a critique.
The problem statement should be included near the beginning of the literature review.  It servers to clarify the nature of the problem, why it is important to study it, why the researchers conducted the study, and why the reader should be interested in the results.  The literature review should include empirical research results, and possibly publications that evaluate or propose a theory.  The number of articles to be included in a literature review vary based upon the topic and nature of research. 
The literature review can be organized in various ways.  One way is to start with the seminal work and move forward in a chronological order.  An alternative is to start from the general concepts related to the study and move to specifics.  The first paragraph of the literature review should outline the upcoming topics.  The literature review should be organized, utilizing headings and other helpers to guide the reader.  A literature review is typically written in third person and should follow proper style, such as APA.
The research questions and hypotheses should be placed at the end of the literature review.  They should emerge from the literature that was reviewed.  They should be stated as a simple question or sentence.  They may be referred to using a shorthand notation, such as H1 and H2 or RQ1 and RQ2. 

Qualitative Study

A qualitative study is similar to a quantitative study in that it is not complete until the report has been written (McGraw-Hill Companies Inc, 2006b).  Unlike a quantitative study, a qualitative study is written in a somewhat reflective way.       While the quantitative study relies on instruments to gather data, a qualitative study relies on the author to interpret and to some extent capture information.  The literature review in a qualitative study is a summary.





References

McGraw-Hill Companies Inc. (2006a). Chapter 17: Reading and Writing the Quantitative Research Report.   Retrieved from http://highered.mheducation.com/sites/dl/free/0073049506/240132/Chapter_17.ppt

McGraw-Hill Companies Inc. (2006b). Chapter 18: Reading and Writing the Qualitative Research Report.   Retrieved from http://highered.mheducation.com/sites/dl/free/0073049506/295001/Chapter_18.ppt


Not Just Significance but Size of Effect

 Blindly trusting the output of a computer program, such as SPSS, without understanding the data my lead to misleading results.  It is possible for results to appear statistically significant when the data as a whole do not support the results.   There are also significant limitations to the chi-squared test that should be taken into consideration before embracing the results.
It is a common practice to assume that results of a Pearson Chi-squared are significant if the p value is 0.05 (chance) or less (Penn State Eberly College of Science, n.d.). A contingency table is used to enumerate the combination of categorical variables and values (Field, 2013).  The table shows the counts of each combination.  
Without knowing more about the specifics of the management style survey that is referenced in this assignment it is difficult to speak with exactness about the cause of the overall results being significant while being of low value, however, it is likely that that several combinations are highly correlated while the majority are not.  To determine the specific situation a crosstabulation would be constructed and the standardized residuals compared to 1.96.  If the absolute value for each combination is less than 1.96 that combination indicates the relationship is not significant.
Sample size and other factors can impact tests for significance (Runkel, 2012).  It is important that when results of a Chi-squared test indicate significance that an appropriate test of strength (McHugh, 2013).   Calculation of a value, such as Lambda, as a means of measuring the degree of association between conditions is an appropriate means of determining the strength of the relationship (AcaStat Software, 2015).
When doing analysis, it is important to consider the overall picture the data is showing.  Rather than assuming the values produced by a statistical package are all-knowing, the researcher must dig deeper.  When a result is shown to be statistically significant, it is an indication that additional analysis, such as effect size, is required.




References


AcaStat Software. (2015). Chi Square Measures of Association.   Retrieved from http://www.acastat.com/statbook/chisqassoc.htm

Field, A. (2013). Discovering statistics using IBM SPSS statistics: Sage.

McHugh, M. L. (2013). The Chi-square test of independence. Biochemia Medica, 23(2), 143-149. doi:10.11613/BM.2013.018

Penn State Eberly College of Science. (n.d.). 11.2 - Chi-Square Test of Independence.   Retrieved from https://onlinecourses.science.psu.edu/stat200/node/73

Runkel, P. (2012). Large Samples: Too Much of a Good Thing?  Retrieved from http://blog.minitab.com/blog/statistics-and-quality-data-analysis/large-samples-too-much-of-a-good-thing


Parametric and Non-Parametric Tests

Selecting between parametric and non-parametric tests can be a confusing task.  There are many tests which are applicable to certain types of data and situations.  This post discusses parametric and non-parametric analysis and offers advice in the selection of the appropriate test.

Parametric Analysis

Parametric tests assume a normal distribution of data (Field, 2013).  The data must also be equally dispersed, also termed homogeneity of variance.  The data must be interval or ratio data. Lastly, the observations must be independent.  If data satisfies the requirements for parametric analysis, then a parametric test should be utilized.

Non-parametric Analysis

Non-parametric tests make fewer assumptions about data than do parametric tests (Field, 2013).  For example, non-parametric analysis is well suited to ordinal data which does not satisfy the requirements of parametric tests.  Another reason to utilize a non-parametric test is if the median better represents the central tendency of the data than does the mean (Frost, 2015).  This is an indication that the data is skewed and may contain outliers.  Non-parametric tests are also applicable when the sample size is smaller than that necessary for parametric approaches.

Comparison of Tests

Table 1 contains the various types of analysis and the corresponding parametric and non-parametric tests.  It was constructed by combining materials from Dr. Miller’s lecture with other sources (Frost, 2015; Hoskin, 2014; Miller, 2017).  The table breaks down the types of test that are appropriate for a kind of analysis.  If the data is parametric in nature, then a test in the corresponding parametric column should be utilized.
Table 1: Parametric and Non-Parametric Tests
Type of Analysis
Parametric
Non-Parametric
Is a sample similar to a known population
T-Test or Z-Test
Sign Test
Comparing means of independent groups
Two-sample T-Test
Mann-Whitney U Test
Comparing two quantitative measurements from the same individual
Paired T-Test
Wilcoxon Signed-Rank
Comparing means between three or more independent groups
1-way Analysis of Variance (ANOVA)
Kruskal-Wallis
Multiple comparisons of means
2-way Analysis of variance (ANOVA)
Friedman
Estimating the degree of association between two quantitative variables
Pearson Correlation Coefficient
Spearman’s Rank Correlation Coefficient and Kendall’ Tau Coefficient

 

Conclusion

The selection of a test is guided by understanding the type of analysis to be performed and the nature of the associated data.  Enumerating and discussing each scenario and test is beyond the scope of this assignment.  This, and Dr. Miller’s, table does provide a starting point for selecting the appropriate test.



References
Field, A. (2013). Discovering statistics using IBM SPSS statistics: Sage.

Frost, J. (2015). Choosing Between a Nonparametric Test and a Parametric Test.  Retrieved from http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test

Hoskin, T. (2014). Parametric and nonparametric: Demystifying the terms. Mayo Clinic CTSA BERD Resource Retrieved from http://www mayo edu/mayo-edudocs/center-fortranslational-science-activities-documents/berd-5-6.pdf.

Miller, R. (Producer). (2017, 2/11/2017). Re-record of chat on non-parametrics. Retrieved from http://ctuadobeconnect.careeredonline.com/p29vwep0e20/?launcher=false&fcsContent=true&pbMode=normal


Maintaining the Context of Variables

Just as it is necessary for a quantitative report to clearly communicate the measures being analyzed (Huck, 2012) it is equally important that the researcher be cognizant of the connection between the numbers and what they represent. Fixation on the numbers rather than their meaning can result in inaccurate quantitative studies (Nielsen, 2004).
There are several strategies from personal and professional experience that can help keep focus on the meaning of the numbers.  It is always a good idea to perform a sanity check.  For example, if the data being studied is teacher salaries it is unlikely that a value of 10000000.00 is a reasonable value.   This applies to operations upon those values.  For example, if the mean salary was 5000, it is an indication that the numbers might not be representing what is expected.
Another personal strategy is to consider the units of measure.  Units of measure are relevant when operations such as multiplication are performed, along with more complex statistical operations.  During work with an Internet of Things sensor, the units of measure were important when dealing with light, pressure, and temperature.  Units of measure give guidance in how that measure was initially created.   For example, barometric pressure can be measured in pounds per square inch (PSI).  Converting from PSI to some other measure should remove the pounds and the inches from the units.
Lastly, look at all numbers with a degree of skepticism.  There are many instances where numbers are simply incorrect.  We should always question the correctness, accuracy, and validity of the data.  Sensors report incorrect values, people hit the wrong key, disks become corrupted, data files have errors, data transfers can contain noise, and people sometimes make mistakes.  Professional experience has shown that it is always a good idea to question the validity of the values presented.  Because someone ran a statistical operation on a set of data does not mean that data was without flaw.  We should always question the accuracy of what we analyze.




References
Huck, S. W. (2012). Reading Statistics and Research (6th ed.): Pearson.

Nielsen, J. (2004). Risks of Quantitative Studies.   Retrieved from https://www.nngroup.com/articles/risks-of-quantitative-studies/