Try to find out if the website has an api, or a pre-made dataset, that you can use. You can try contacting them via Twitter or something.
The Big Programming Thread - Page 961
Forum Index > General Forum |
Thread Rules 1. This is not a "do my homework for me" thread. If you have specific questions, ask, but don't post an assignment or homework problem and expect an exact solution. 2. No recruiting for your cockamamie projects (you won't replace facebook with 3 dudes you found on the internet and $20) 3. If you can't articulate why a language is bad, don't start slinging shit about it. Just remember that nothing is worse than making CSS IE6 compatible. 4. Use [code] tags to format code blocks. | ||
supereddie
Netherlands151 Posts
Try to find out if the website has an api, or a pre-made dataset, that you can use. You can try contacting them via Twitter or something. | ||
Liquid`Jinro
Sweden33719 Posts
On May 24 2018 19:20 supereddie wrote: Don't scrape websites. Scraping breaks as soon as the website changes the structure of the page. It also puts additional strain on the server. It might not even fall under the Terms of Use of the site. Try to find out if the website has an api, or a pre-made dataset, that you can use. You can try contacting them via Twitter or something. I think they do have an API, but it's such a small # of requests I really don't think it would be very disruptive (there's like 8 categories with an average of maybe 4 sub categories, so the upperbound is really low for # of requests). But I don't actually need this data since (like I said) it took like 5 minutes to copy paste it by hand, but I'd still like to know HOW to do it >_< | ||
ShoCkeyy
7815 Posts
https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3 http://www.nyu.edu/projects/politicsdatalab/localdata/workshops/BeautifulSoup.pdf Hopefully these help. | ||
Hanh
146 Posts
On May 24 2018 19:35 Liquid`Jinro wrote: I think they do have an API, but it's such a small # of requests I really don't think it would be very disruptive (there's like 8 categories with an average of maybe 4 sub categories, so the upperbound is really low for # of requests). But I don't actually need this data since (like I said) it took like 5 minutes to copy paste it by hand, but I'd still like to know HOW to do it >_< For websites that use javascript to create the dom (like this one actually - it uses AngularJS), python will be difficult to use. In this case, I prefer web browser test automation tools that can simulate clicks and query the dom. Here's a solution using puppeteer. + Show Spoiler +
Results: + Show Spoiler + [ { faction: 'Garrek’s Reavers', sets: [ { name: 'OBJECTIVE CARDS', cards: [ { num: '146', name: ' Arm\'s Length' }, { num: '154', name: ' Skritch is the Greatest, Yes-yes' }, { num: '234', name: ' Advancing Strike' }, { num: '235', name: ' Alone in the Darkness' }, { num: '243', name: ' Change of Tactics' }, { num: '257', name: ' Escalation' }, { num: '272', name: ' Master of War' }, { num: '282', name: ' Ploymaster' }, { num: '284', name: ' Precise Use of Force' }, { num: '291', name: ' Superior Tactician' }, { num: '292', name: ' Supremacy' }, { num: '305', name: ' Victorious Duel' } ] }, { name: 'POWER CARDS', cards: [ { num: '159', name: ' Musk of Fear' }, { num: '166', name: ' Bodyguard for a Price' }, { num: '171', name: ' Sneaky Stab-stab' }, { num: '311', name: ' Confusion' }, { num: '329', name: ' Great Concussion' }, { num: '331', name: ' Hidden Paths' }, { num: '332', name: ' Illusory Fighter' }, { num: '347', name: ' Quick Thinker' }, { num: '348', name: ' Ready for Action' }, { num: '360', name: ' Sidestep' }, { num: '368', name: ' Time Trap' }, { num: '369', name: ' Trap' }, { num: '372', name: ' Twist the Knife' }, { num: '373', name: ' A Destiny to Meet' }, { num: '374', name: ' Acrobatic' }, { num: '376', name: ' Awakened Weapon' }, { num: '384', name: ' Deathly Fortune' }, { num: '391', name: ' Great Strength' }, { num: '395', name: ' Incredible Strength' }, { num: '410', name: ' Shadeglass Dagger' }, { num: '412', name: ' Shadeglass Hammer' }, { num: '424', name: ' Tethered Spirit' } ] } ] }, ... | ||
spinesheath
Germany8679 Posts
On May 24 2018 18:20 Silvanel wrote: Begining was kinda meh. But later parts on decorator, generator and context manager were nice. So far all I have seen (including the part where he enforces existence of a method in a derived class with a metaclass) made me appreciate that I get to use C# instead. | ||
sc-darkness
856 Posts
| ||
bo1b
Australia12814 Posts
The trick is that in c++ case in particular, most people do not know how to write genuinely quick code, so it makes less of a difference then normal. | ||
Silvanel
Poland4601 Posts
On May 25 2018 04:27 spinesheath wrote: So far all I have seen (including the part where he enforces existence of a method in a derived class with a metaclass) made me appreciate that I get to use C# instead. Huh. I think this is just matter of perception, people have natural preference towards their main langauge. When i was learning C# (Python is my main) i had to constatly remaind myself to not be angry about some features of C#. It felt to me uncessary clumsy, overly verbose and that features repeat itself often. Its personal preference i love the way Python use decorators and context managers. | ||
spinesheath
Germany8679 Posts
On May 25 2018 15:59 Silvanel wrote: Huh. I think this is just matter of perception, people have natural preference towards their main langauge. When i was learning C# (Python is my main) i had to constatly remaind myself to not be angry about some features of C#. It felt to me uncessary clumsy, overly verbose and that features repeat itself often. Its personal preference i love the way Python use decorators and context managers. It's mostly because of the sheer amount of awful I expect to encounter should I ever touch a large python project. The features he described so far will inevitably reach out to bad devs and make them create abominations I hope to never encounter. | ||
sc-darkness
856 Posts
| ||
Excludos
Norway7685 Posts
On May 26 2018 06:12 sc-darkness wrote: Does Python have a lot of bad developers actually when people say it's such an easy language? Python is often the first (and only) language a lot of people learn when they're really educated in something else (Like Maths or Physics) but need to make a program to do calculations for them. They often have a poor or no concept of good practices, optimisation, or how half the things they're doing even works. For instance: Deep Learning is, at the moment, almost exclusively done in Python. Not because C++ wouldn't be vastly superior for it, but simply because half the people working on it have no idea how to program when they start out, and they're just grabbing the easiest tool they can learn in the shortest amount of time. so tl;dr: Yes. edit: Also, omg can people stop posting guides for OpenCV in only Python ffs?! I don't want to learn how to do facial recognition in the wrong language. On May 25 2018 15:59 Silvanel wrote: Huh. I think this is just matter of perception, people have natural preference towards their main langauge. When i was learning C# (Python is my main) i had to constatly remaind myself to not be angry about some features of C#. It felt to me uncessary clumsy, overly verbose and that features repeat itself often. Its personal preference i love the way Python use decorators and context managers. I feel the opposite a lot of the time. "Oh, it can do this? Brilliant!". When I started learning C#, I was already well drilled thinking Qt framework was the best thing in programming since..uhm..something brilliant. It fixes most of the issues I have with c++ and adds a lot of features on top of it. Then I started working with C# and quickly learned that a lot of the things I love about Qt is already baked into C#. Signals and slots, proper containers and types, and actual f'ing error messages and not just "Hey uhm, your program crashed because.. reasons.. probably" for instance. One of the languages I spent way too long dreading to learn, which ended up pleasantly surprising me, was Javascript (Especially with Node). Sure it's a clusterfuck of legacy code, but by using Lint it becomes surprisingly bearable, and the external library importer is fantastic. Go and Python on the other hand can go clog themselves into the toilets they came from. | ||
bo1b
Australia12814 Posts
| ||
Excludos
Norway7685 Posts
On May 26 2018 07:55 bo1b wrote: People do deep learning in c/c++ with a python front end, it's the same thing, just vastly easier to understand for a mathematician. This is true. A lot of the backend for this stuff is written in c++ with a Python api. That doesn't help me much when all the materials you can find about the subject is showcased with Python code. It's very difficult for an outside to learn anything about the subject whatsoever without Python. edit: Before people start linking to ML tools not based on Python, I probably already know about them. It's a complaint about the general situation, not a cry for help. | ||
nunez
Norway4003 Posts
c++ / python is a brilliant combo. it's super easy to set up bindings to a library with f.ex. pybind11, and make it accessible to domain-experts and data-scientists. but ihaving python bindings to libraries is also very useful for the c++-developer. f.ex. at work i do data-driven regression tests to check that some new project computes similar enough results to some legacy project. a small library interfaces with the legacy project, and can pull data-sets (input / output) from legacy servers. with py-bindings cross platform scripts for data-retrieval (pulled from multiple serves) and running tests is easy-peasy (btw i wish boost-tests out-of-the-box data-testing capabilities were a bit more flushed out, but as always with boost you can make it work very well if you put in a small bit of effort). i find python useful just in general wrt to building, testing and low-fi deployment, or experimentation in a library shell. in the context of py / c++ the prospect of getting better techniques and tools for doing static reflection in c++ is very nice. @excludos qt should be quarantined to the code that deals with the gui, and even there it should be used as sparingly as possible. for that it's an ok option, but for anything else there are better alternatives. if someone suggested that we use qt for signals / slots at work, he / she would also be quarantined to code that deals with gui. some never get comfortable with the complexity of c++, and seek refuge in the abstract comfort of Qt or C# or python or whatever. parading this retreat as ersatz critique seems common. | ||
nunez
Norway4003 Posts
the thing: alien covenant: | ||
nunez
Norway4003 Posts
| ||
Liquid`Jinro
Sweden33719 Posts
On May 24 2018 22:44 ShoCkeyy wrote: Here you go Jinro https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3 http://www.nyu.edu/projects/politicsdatalab/localdata/workshops/BeautifulSoup.pdf Hopefully these help. Thanks I'll take a look On May 25 2018 01:23 Hanh wrote: For websites that use javascript to create the dom (like this one actually - it uses AngularJS), python will be difficult to use. In this case, I prefer web browser test automation tools that can simulate clicks and query the dom. Here's a solution using puppeteer. + Show Spoiler +
Results: + Show Spoiler + [ { faction: 'Garrek’s Reavers', sets: [ { name: 'OBJECTIVE CARDS', cards: [ { num: '146', name: ' Arm\'s Length' }, { num: '154', name: ' Skritch is the Greatest, Yes-yes' }, { num: '234', name: ' Advancing Strike' }, { num: '235', name: ' Alone in the Darkness' }, { num: '243', name: ' Change of Tactics' }, { num: '257', name: ' Escalation' }, { num: '272', name: ' Master of War' }, { num: '282', name: ' Ploymaster' }, { num: '284', name: ' Precise Use of Force' }, { num: '291', name: ' Superior Tactician' }, { num: '292', name: ' Supremacy' }, { num: '305', name: ' Victorious Duel' } ] }, { name: 'POWER CARDS', cards: [ { num: '159', name: ' Musk of Fear' }, { num: '166', name: ' Bodyguard for a Price' }, { num: '171', name: ' Sneaky Stab-stab' }, { num: '311', name: ' Confusion' }, { num: '329', name: ' Great Concussion' }, { num: '331', name: ' Hidden Paths' }, { num: '332', name: ' Illusory Fighter' }, { num: '347', name: ' Quick Thinker' }, { num: '348', name: ' Ready for Action' }, { num: '360', name: ' Sidestep' }, { num: '368', name: ' Time Trap' }, { num: '369', name: ' Trap' }, { num: '372', name: ' Twist the Knife' }, { num: '373', name: ' A Destiny to Meet' }, { num: '374', name: ' Acrobatic' }, { num: '376', name: ' Awakened Weapon' }, { num: '384', name: ' Deathly Fortune' }, { num: '391', name: ' Great Strength' }, { num: '395', name: ' Incredible Strength' }, { num: '410', name: ' Shadeglass Dagger' }, { num: '412', name: ' Shadeglass Hammer' }, { num: '424', name: ' Tethered Spirit' } ] } ] }, ... Oh that looks good, I'll look into this also. This is the same kind of process as using Selenium right? | ||
Hanh
146 Posts
On May 26 2018 10:38 Liquid`Jinro wrote: Oh that looks good, I'll look into this also. This is the same kind of process as using Selenium right? Yes, same idea than using Selenium. | ||
Liquid`Jinro
Sweden33719 Posts
Python is often the first (and only) language a lot of people learn when they're really educated in something else (Like Maths or Physics) but need to make a program to do calculations for them. They often have a poor or no concept of good practices, optimisation, or how half the things they're doing even works. For instance: Deep Learning is, at the moment, almost exclusively done in Python. Not because C++ wouldn't be vastly superior for it, but simply because half the people working on it have no idea how to program when they start out, and they're just grabbing the easiest tool they can learn in the shortest amount of time. so tl;dr: Yes. For the people not actively developing new algorithms, what would the benefit be to working in C++ instead of Python? From what I can recall, the performance of most of the computational python libraries are basically c/c++ levels of performance (since they are just calling c/c++ anyway), and can in some cases even be improved with something like cython. (I don't work in this field, just learning because I'm interested - I'm basically what you describe: only know python... and a little bit of R) EDIT: I guess basically what bo1b said already. | ||
Silvanel
Poland4601 Posts
On May 26 2018 07:35 Excludos wrote: Python is often the first (and only) language a lot of people learn when they're really educated in something else (Like Maths or Physics) but need to make a program to do calculations for them. They often have a poor or no concept of good practices, optimisation, or how half the things they're doing even works. For instance: Deep Learning is, at the moment, almost exclusively done in Python. Not because C++ wouldn't be vastly superior for it, but simply because half the people working on it have no idea how to program when they start out, and they're just grabbing the easiest tool they can learn in the shortest amount of time. so tl;dr: Yes. edit: Also, omg can people stop posting guides for OpenCV in only Python ffs?! I don't want to learn how to do facial recognition in the wrong language. I feel the opposite a lot of the time. "Oh, it can do this? Brilliant!". When I started learning C#, I was already well drilled thinking Qt framework was the best thing in programming since..uhm..something brilliant. It fixes most of the issues I have with c++ and adds a lot of features on top of it. Then I started working with C# and quickly learned that a lot of the things I love about Qt is already baked into C#. Signals and slots, proper containers and types, and actual f'ing error messages and not just "Hey uhm, your program crashed because.. reasons.. probably" for instance. One of the languages I spent way too long dreading to learn, which ended up pleasantly surprising me, was Javascript (Especially with Node). Sure it's a clusterfuck of legacy code, but by using Lint it becomes surprisingly bearable, and the external library importer is fantastic. Go and Python on the other hand can go clog themselves into the toilets they came from. You sure have some strong feelings about Python | ||
| ||