Python For Bioinformatics and Your First Python for Bioinformatics Program
For more in-depth Python for Bioinformatics training visit: www.howtobioinformatics.com/py...
Hi and welcome to Python for bioinformatics, my name is Blake Allen, and I am going to show you how to make your first Python for Bioinformatics program, in under 20 minutes.
Were going to go over calculating GC content and making your first Python Program, So if you're a little more advanced and you already know how to use Python, but you'd like to learn more, go ahead and click the link below where I'll show you advanced techniques in learning python for bioinformatics.
The first thing we're going to need is some data. If you don't have any data, you can't do any bioinformatics, but the great thing is, is there is a ton of free data online ready to go.
So go ahead and open up your web browser and lets get started, I use chrome.
Go ahead and type in the Letters NCBI. In the search bar go ahead and type in BRCA 1
Click on this little tab right here that says nucleotide. Up at the top we've got a few things, go ahead and click on the homo sapiens BRCA1, FASTA tab.
Click on Send in the top right hand corner, click on send to file, and download as FASTA. Then copy that sequence.fasta to a new folder we'll be working in. Replace the name to BRCA1_BAP1.TXT, then you can open it and look at it.
Пікірлер: 87
The best video i have seen on bioinformatics
This tutorial is brilliant, please create more!
I very much enjoyed this video. I like the fact that, by the end, I'm working with real data and doing something useful. Thanks!
Really helpful! I love Python!
This was really informative and interesting!
Very informative. Thank you for providing this example.
FANTASTIC! Thank you!
very straight forward tutorial, thanks
Thanks a lot, it was really helpful. You haven't put any other videos on this subject since 2013, though.
It was helpful..thank you.keep adding
Very cool! I need to learn Python ASAP!
Very informative!
THIS IS AMAZING.
Thank you it works very well
Noel Tanner, Thanks for the Reference sequence, i was having hard time finding the correct nucleotide.
Very Helpful!
Спасибо тебе большое за этот разбор!
Thank You!
Hello, Thanks for this video! I was wondering if we could use the difflib program to do comparative genomics for two different files and create a report of differences?
Very Nice!
thanks it is very good information
Nice cool intro to bioinfo
this is outstanding iam hoping you can show more examples in jupyter notebook
Why not write it in the Python IDLE?
good work
Thanks a lot for a nice turotial! But have you tried TextWrangler instead of Textedit?
more pythonic would by to get rif of nested loop and just use build in string function count(): for line in gene: g += line.count('g'); a += line.count('a'); c += line.count('c'); t += line.count('t');
very nice
which python book could be better for references ? This is nice!
awesome
oh man, wow thanx
any more advanced python scripts to use for the analysis of sequencing data
Very clear and informative - thanks! Do you mind if I post/share?
Pls provide the exact link for dataset download in description
Hi, So I wrote the same program on PyCharm I tried opening this in Bash Shell and I get told "not a directory". I switched directories to ensure I was in the right folder. Does anyone have suggestions?
Hi! Thanks for the video!! However, can you please explain why you set the g, a, t and c at 0 in the beginning? Thanks!
@stevanbr1
10 жыл бұрын
Because you have to initialize variables to zero before you add a number to it ( g+=1 => g = g + 1), if you don't initialize variables to zero, your variable has seme thrash value, and you won't have a valid result. First time it enters 'if' with 'g', g is going to be zero, so g = 0 + 1 = 1, if you don't initialize, it will be g = #$#@$+ 1 = ?. Hope that helps :)
👏👏👏
So I have to create a folder first then create another folder to put the file inside of it?
What version of python did you use?
this is for python 2.7.x right? it doesnt work with my 3.3.x
Awesome, my first python program to know the gc content... I have a question, What is the gc content for? What does it tell me exactly? Did not understand that very well. BTW I used this squence Rattus norvegicus BRCA1 mRNA, complete cds gc content: 0.460014
I had a little trouble finding the correct Nucleotide, To save time here is the ref. # for the example in the video: NCBI Reference Sequence: NG_031859.1
@Stepwise9000
4 жыл бұрын
Now this doesn't work! :(
Good video, just wish it was more streamlined
Why not use count() or regular expressions?
@temaz3334
7 жыл бұрын
poor Python skills
@rafsanjanimuhammod309
7 жыл бұрын
No, poor programming skills.
invalid syntax on the second quote of print "number of g's " + str(g)
@MrChacha1994
4 жыл бұрын
idk if its because he's using make but If you are using windows like I am, make sure that when you use the "print" function, make sure to use parenthesis Ex: (EXACTLY LIKE THIS) print("number of g's " + str(g)")
@Paul-su7sb
3 жыл бұрын
Same here, thank you so much for the advice I am going to try it
@kareenamulchandani3356
Жыл бұрын
I think the syntax changed in Python3
Just small question. Is this what bioinformatics mostly do? Sequence genes then use a programming language for analysis?
@MrChristian331
4 жыл бұрын
In a nutshell...YES. But in addition to analysis, they can use programming for drug discovery therapeutics. They can use programming for predictive analytics to see if something will switch a gene on or turn it off before administering it experimenting with it to save time and money.
My problem so far is saving the folder as a plain txt file. My macbook will not give me the option when I select the drop down list.
@vivanranjan261
3 жыл бұрын
yes even mine
9:00 variable*
I live python. Great tutorial!
thanks so much i'll definitely be coming back
anyone know the answer ? what ,if u take the fasta format without head ,can u get rid of that gene.readline() ? And when the counter are named with A,C,T,G string, can u get rid of that line.lower() ? TQ 4 any suggestions .
@nenadsvrzikapa6893
8 жыл бұрын
+willie ekaputra yeah that just skips the line, so if the line is not there you don't need to skip it, but if you remove it then it's no longer a fasta file. Either way, this is not how an advanced Bioinformatician would solve this task.I think Blake is showing that you can make the string lower case. It usually is upper case so you don't need to be converting you don't need that line.
@queenofunderland
7 жыл бұрын
I have other question, can u then make this code a fct . with Def ... () :, so that u can open ANY Fasta saved files in yer PC and count its GC Content ?
Your web page is down, can you let me download this. Your channel blocks it from being able to download.
Thanks but I had problem while running. I used windows bash and I got " print "number of g's " + str(g) ^ SyntaxError: invalid syntax error. Even though I did the same thing that you did. Please help me
@nagaswaroopkenguntenagaraj8677
8 жыл бұрын
+Suleyman Bozkurt That maybe because you are using python 3+ where the syntax for print statement is print("number of g's "+ str(g)) [Notice the parentheses], whereas in python 2+ the syntax for print is as mentioned in the video[ print "number of g's " + str(g) ] Hope it helped! :)
@d34thcom3sripping
6 жыл бұрын
thnx boss. resolved my issues.
for beginners only
1.75x Speed would be really appreciated for this video :D
sir i am using windows 7 operating system, python and instead of coda i am using sublime text 2. i have followed everything until the TERMINAL option. it is not there in windows. can u tell me the equivalent one. so that i can finish the last step. waiting for your reply sir. thank you
@wavesofgrey-vb9gw
5 жыл бұрын
windows command line, or now powershell. you will have to add python to the path to run python from the command line
My syntax is always error in If char == "g" : Usually in (if) and in (g) Help me why
@dxamphetamin
4 жыл бұрын
'g', you need to check for a char not a string
dude why does this not work at all using windows
@LegeFles
7 жыл бұрын
did you install python?
@mannyfan165
7 жыл бұрын
yes
@LegeFles
7 жыл бұрын
Matt saying it doesn't work "at all" isn't really a helpfull comment.
Why never start with the code this man?
Someone should re-do these videos in Windows.
super inneficient code. use the count() funcion which is WAY faster!
@georgegrevera7000
6 жыл бұрын
I timed both ways on a file of 117k bases. His way used 0.02 sec. Using count() used 0.005 sec. Both are fast enough for me.
@johnfedorov8089
5 жыл бұрын
@@georgegrevera7000 The problem is scale. Had the gene sequences been longer, this would be exponentially inefficient. I'm coming from a computer science background though, where efficiency is hammered into our heads due to scalability
bevkuff
poor video making quality