Posts

Showing posts from January, 2019

Emailing using python

import smtplib import sys import email server = smtplib.SMTP('smtpip:port') server.ehlo() #msg = "Hello!" # The /n separates the message from the headers msg = "\r\n".join([   "From:from@email.com",   "To: to@email.com",   "Subject: Test Message",   "",   "Why, oh why"   ]) server.sendmail ("from@email.com", "to@email.com", msg)

Zeppelin and Anaconda

Set Anaconda As Default Python Interpreter In Zeppelin Click  anonymous  in top right corner. Click  Interpreter . Scroll down to the  python  interpreter. Click  Edit . Locate  zeppelin.python . Set value to  /home/hadoop/anaconda/bin/python Now find the  spark  interpreter. Locate  zeppelin.pyspark.python . Set value to  /home/hadoop/anaconda/bin/python https://dziganto.github.io/zeppelin/spark/zeppelinhub/emr/anaconda/tensorflow/shiro/s3/theano/bootstrap%20script/EMR-From-Scratch/ This works..tested.. conda install -c calex sklearn-pandas P.S:  zeppelin 0.7.3 don't support spark 2.3. spark 2.3 is supported by zeppelin 0.8 which will be released soon zeppelin 0.7.3 doesn't support python 3.6.. now testing with spark 2.1 and python 3.5...so far so good.. Check:::::: zeppelin home/conf should have zeppelin-env.sh. Here you can specify spark home and zeppelin port Starting Apache Zeppelin from the Command Line On all unix like platforms

nlp basics beautiful soup

import nltk import re from urllib import request url = "http://www.gutenberg.org/files/2554/2554-0.txt" content1 = request.urlopen(url).read() urlA = "http://www.bbc.com/news/health-42802191" html_content = request.urlopen(urlA).read() from bs4 import BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') inner_body = soup.find_all('div', attrs={'class': 'story-body__inner'}) inner_text = [elm.text for elm in inner_body[0].find_all(['h1', 'h2', 'p', 'li'])] text_content2 = '\n'.join(inner_text) text_content1 = content1.decode('unicode_escape') # Converts bytes to unicode tokens1 = nltk.word_tokenize(text_content1) tokens1[3:8] tokens2 = nltk.word_tokenize(text_content2) tokens2[:5] len(tokens2) tokens2_2 = re.findall(r'\w+', text_content2) len(tokens2_2) pattern = r'\w+' tokens2_3 = nltk.regexp_tokenize(text_content2, pattern) len(tokens2_3) input_text2 = nltk

Proxy Issues

----Conda Activating Virtual Env- <CondaPATH>\Scripts>activate py35 Crosscheck .condarc file is not there..or backit up.. Then in the env..set the proxy (py35) <CondaPATH>>set http_proxy=http://userid:password@proxy.foo.com:8080 Note if password has # replace it with %23 . @ is replaced with %40 ---To install a package using conda in Anaconda--------- (py35) <CondaPATH>>conda install -c conda-forge <pkgname> (py35) <CondaPATH>>conda install -c conda-forge tweepy ---To install a package using pip in Anaconda--------- (py35) <CondaPATH> python -m pip install <pkgName> (py35) <CondaPATH> python -m pip install vaderSentiment ---------------Common Proxy Errors--------------------------- CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/free/noarch/repodata.json.bz2>       Elapsed: -