TechMusings (BigData,Hadoop,Pig,Hive,DataScience,IoT,EAI,SOA,J2EE)

Posts

Showing posts from 2018

Association Rule Mining using Arules in R

November 27, 2018

Let's consider a dataset. The dataset has to be in a certain format ..txn format..if its not there arules doesn't work. We take a csv file and then convert it to transaction in example below. Let's create dataset CustomerId,Products 100, Savings Pre 100,Home20 101,Home20 102,Checking Zero 102,Home20 102,Gold10 103,Home20 103,Gold20 104,Checking Zero 104,Savings Pre 104,Home20 ArulesUsage.R ------------ ds<-read.csv("ProductsSmall.csv") colnames(ds) library(arules) trans<-as(split(ds[,"Products"],ds[,"CustomerId"]),"transactions") summary(trans) # Lists the most frequent items too... rules<-apriori(trans,parameter = list(support = 0.14, confidence = 0.05, minlen = 2)) inspect(rules) rules<-sort(rules,by="lift") rules_output<-rules[!is.redundant(rules)] inspect(rules_output) library(pmml) saveXML(pmml(rules_output),"Apriori_ProductsSmall.p

Spacy NER Example

October 17, 2018

1. In your virtual env..install spacy pip install --upgrade spacy. 2. Install Jupyter too,...as visualization is handy python -m pip install jupyter 3. Load default model for spacy python -m spacy download en 4. Invoke Jupyter jupyter notebook --no-browser --NotebookApp.token='' --ip='*' 5. Updating existing model to include a NER. Uber is not detected by default model. Let's add it. -----------------------------------------------------------------------------------------------NoteBook-- import spacy import random from spacy import displacy nlp = spacy.load('en') train_data = [("Uber blew through $1 million", {'entities': [(0, 4, 'ORG'),(17, 28, 'MONEY')]})] for text,_ in train_data: doc=nlp(text) displacy.render(doc, style='ent', jupyter=True) # If u use display.serve it will try to serve at port 5k # We see Uber is not picked up...let use the train_data to

Python Env Issues and Workaroud Ubuntu 16 LTS

October 17, 2018

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-tensorflow-on-ubuntu-16-04 https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04 https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04 1.Installing Python Ubuntu 16 comes with Python 2.7 and 3.5 both. To install pip3... sudo apt-get install python3-pip pip3 -V pip3 install --upgrade pip 2.Virtual Env a)sudo -H pip3 install virtualenv b) virtualenv -p /usr/bin/python3 menv35 c) source ~/venv35/bin/activate d) python -m pip install --upgrade pip Ub untu18 https://linoxide.com/linux-how-to/setup-python-virtual-environment-ubuntu/ 2# Centos python python -m pip install --upgrade pip 3. Anaconda Issues......... anaconda sucks...when one installs packages not included with it.. as dependencies fail

Remote Jupyter Notebook

October 14, 2018

Hi, If you are working on Jupyter and need to access the notebook remotely during local development..You may need a simple access without certs, pwds... Lets see how to do it.. One good link is below... https://jupyter-notebook.readthedocs.io/en/stable/public_server.html. If you don’t already have one, create a config file for the notebook using the following command line: $ jupyter notebook --generate-config In the ~/.jupyter directory, edit the notebook config file, jupyter_notebook_config.py . By default, the notebook config file has all fields commented out. The minimum set of configuration options that you should uncomment and edit in jupyter_notebook_config.py is the following: # Set options for certfile, ip, password, and toggle off # browser auto-opening c . NotebookApp . certfile = u '/absolute/path/to/your/certificate/mycert.pem' c . NotebookApp . keyfile = u '/absolute/path/to/your/certificate/mykey.key' # Set ip to '*'

Logistic Regression using German Credit Data

July 14, 2018

Logistic Regression using German Credit Data In [1]: import numpy as np In [2]: import pandas as pd In [3]: from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score , classification_report The German Credit Data contains data on 20 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants. In [9]: credit_dat = pd . read_csv ( "C:\Work\Datasets\germancreditdata.csv" ) In [10]: print ( credit_dat . head ()) Creditability Account Balance Duration of Credit (month) \ 0 1 1 18 1 1 1 9 2 1 2 12 3 1 1