TechMusings (BigData,Hadoop,Pig,Hive,DataScience,IoT,EAI,SOA,J2EE)

Posts

Showing posts from January, 2019

Emailing using python

January 29, 2019

import smtplib import sys import email server = smtplib.SMTP('smtpip:port') server.ehlo() #msg = "Hello!" # The /n separates the message from the headers msg = "\r\n".join([ "From:from@email.com", "To: to@email.com", "Subject: Test Message", "", "Why, oh why" ]) server.sendmail ("from@email.com", "to@email.com", msg)

Zeppelin and Anaconda

January 29, 2019

Set Anaconda As Default Python Interpreter In Zeppelin Click anonymous in top right corner. Click Interpreter . Scroll down to the python interpreter. Click Edit . Locate zeppelin.python . Set value to /home/hadoop/anaconda/bin/python Now find the spark interpreter. Locate zeppelin.pyspark.python . Set value to /home/hadoop/anaconda/bin/python https://dziganto.github.io/zeppelin/spark/zeppelinhub/emr/anaconda/tensorflow/shiro/s3/theano/bootstrap%20script/EMR-From-Scratch/ This works..tested.. conda install -c calex sklearn-pandas P.S: zeppelin 0.7.3 don't support spark 2.3. spark 2.3 is supported by zeppelin 0.8 which will be released soon zeppelin 0.7.3 doesn't support python 3.6.. now testing with spark 2.1 and python 3.5...so far so good.. Check:::::: zeppelin home/conf should have zeppelin-env.sh. Here you can specify spark home and zeppelin port Starting Apach...

nlp basics beautiful soup

January 05, 2019

import nltk import re from urllib import request url = "http://www.gutenberg.org/files/2554/2554-0.txt" content1 = request.urlopen(url).read() urlA = "http://www.bbc.com/news/health-42802191" html_content = request.urlopen(urlA).read() from bs4 import BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') inner_body = soup.find_all('div', attrs={'class': 'story-body__inner'}) inner_text = [elm.text for elm in inner_body[0].find_all(['h1', 'h2', 'p', 'li'])] text_content2 = '\n'.join(inner_text) text_content1 = content1.decode('unicode_escape') # Converts bytes to unicode tokens1 = nltk.word_tokenize(text_content1) tokens1[3:8] tokens2 = nltk.word_tokenize(text_content2) tokens2[:5] len(tokens2) tokens2_2 = re.findall(r'\w+', text_content2) len(tokens2_2) pattern = r'\w+' tokens2_3 = nltk.regexp_tokenize(text_content2, pattern) len(tokens2_3) input_text2 = nltk...

Proxy Issues

January 02, 2019

----Conda Activating Virtual Env- <CondaPATH>\Scripts>activate py35 Crosscheck .condarc file is not there..or backit up.. Then in the env..set the proxy (py35) <CondaPATH>>set http_proxy=http://userid:password@proxy.foo.com:8080 Note if password has # replace it with %23 . @ is replaced with %40 ---To install a package using conda in Anaconda--------- (py35) <CondaPATH>>conda install -c conda-forge <pkgname> (py35) <CondaPATH>>conda install -c conda-forge tweepy ---To install a package using pip in Anaconda--------- (py35) <CondaPATH> python -m pip install <pkgName> (py35) <CondaPATH> python -m pip install vaderSentiment ---------------Common Proxy Errors--------------------------- CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/free/noarch/repodata.json.bz2> Elapsed: - ...