Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: In 2022, what is the proper way to get into machine/deep learning?
539 points by newsoul on Aug 16, 2022 | hide | past | favorite | 204 comments
By getting into machine or deep learning I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research. Ofcourse, the path to both will quite different.

Standing in 2022, what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own. Resources can be both books or public courses.

The target ability:

1. To understand the theory behind the algorithms

2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned)

3. Read research publications and try to implement them.




My 2c (not exhaustive for what you want to do, probably):

1) Get some statistics/probability basics. It's full of people (you can see a lot of analyses on Kaggle) that "do machine learning" but make very silly mistakes (e.g. turn categorical data into a float and use it as a continuous variable when training a model).

2) take a look at traditional machine learning approaches. Nowadays you're swamped by DL (a lot of good suggestions on this thread, I won't chime in), and you miss the fact that, sometimes, a simple decision tree, or dimensionality reduction approaches (e.g. PCA or ICA) can yield an incredible value in a very short time on huge datasets.

I had written a fairly short post about it when I finished my georgia tech path https://www.franzoni.eu/machine-learning-a-sound-primer/

3) It can take a lot of time to become effective in ML, effective as in, what you _manually create_ is as effective as picking an existing trained model, fine tune it, and use it. This can be frustrating: low hanging fruits are pretty powerful and you don't need to understand a lot about ML algorithms to pick them up.

4) Consider MOOCs or online classes. I took Georgia Tech OMSCS, I can vouch for it and some classes force you to be a data scientist and read papers as well, and you can have "real world" recognition and discuss with your peers, which is useful!


I second learning the statistics/probability basics.

Your first model should always be something that predicts a constant value, or maybe in really complicated cases something like a linear/logistic regression. Then you have a baseline to compare more advanced approaches to. But in order to understand how to use linear regression well, you need to understand how it works in the first place.

Also experiment structure, sampling design, hypothesis testing, etc. will tell you a lot about what conclusions you can and cannot draw from observational data, which is what a lot of ML is about.


While stats and probability are very good, I can't say you need more than a good 101 level course for either. Really you're just looking for some good reasoning skills about how distributions and probability works.

This example: > It's full of people (you can see a lot of analyses on Kaggle) that "do machine learning" but make very silly mistakes (e.g. turn categorical data into a float and use it as a continuous variable when training a model).

Doesn't seem related to stats or probability at all to me. Just critical thinking skills


I prefer to use the approach "learn as you go". There's nothing to block someone from learning some basics, a library like Scikit Learn and then learn by doing examples.

But if anyone wants to become an expert into ML/DS, learning statistics and probability is fundamental. Books like A First Course in Probability, Introduction to Statistical Learning and Elements of Statistical Learning, to name a few, are very important.

A lot of the mistakes done in practice are based on a lack of understanding in sampling techniques, how statistical metrics can be misleading and so on.

First thing I learned in statistics is the difference between quantitative and qualitative information. If someone knows this before hopping into Kaggle, they know that categorical features can't be used as continuous features.


Depends on your goal. If you want to read/implement things from COLT article a 101 stats/probability won't really cut it.

Though for applied papers that's sometimes enough.

But heed Larry Wasserman's advice: "Using fancy tools like neural nets, boosting, and support vector machines without understanding basic statistics is like doing brain surgery before knowing how to use a band-aid."


You need to understand that a category cannot be magically transformed into a float. Yeah, maybe not the best example on my part.

For the 101 level, I agree; I’d say you need a good understanding of basic probs and stats, rather than a vague understanding of advanced topics.

If you can spot when somebody says something about a dataset, the assertion is true only if the data is normally distributed, but there’s no checking about the actual distribution, you’re probably good to go.


I am a CS major but I was always bad at math. Can you recommend your favorite resources for learning probability and statistics basics?


“Introduction to statistical learning” is probably the good goto resource to start with. There’s a decent open source book on probability which I use when I need more in-depth understanding, but I don’t remember the title right now.

One clarification: you don’t need an extreme understanding of stats, probability, linear algebra, imho. If you already took college level classes, you’re likely to be fine.


I was a CS major who made a 20 year career in software who was always told I was bad at math and struggled all through school with math. Somewhere about ten years into my career I began to realize that it wasn't really that I was bad with math, but that the way math is taught just doesn't work for most people. And, a lot of that math that you were expected to learn is really only directly applicable in very specific circumstances that you might not encounter in your career - which is not to say that the mental exercise of learning them weren't worthwhile!

So my point is, if it isn't making sense the way you are being taught, go explore other avenues. There's no way these machine learning algorithms would have made sense to me as a 20-something undergrad, but as a 40-something who can explore them via software rather than a whiteboard, they really aren't that complicated to get started.


As far as MOOCs are concerned there are many great courses. I compiled a list on https://courseskipper.com/best-machine-learning-courses-for-... and suggest that you just read descriptions and see if anything seems like a good starting point for you


Can you recommend a good basic stats/probability course? The last one I took was roughly in 1997 ;)


Richard McElreath's Statistical rethinking is an absolute masterpiece. https://xcelab.net/rm/statistical-rethinking/ on statistics


Thanks. BTW, I found there's a much cheaper ($80 -> $27) paperback version published a few days ago.


Can you share the link?


I just searched amazon for the title plus "paperback". Now that I look at it again, it says "by MAN (Author)" whereas the hard cover is "by Richard McElreath (Author)", so it's looking possibly scammy to me now, so caveat emptor...


I received the book today. I'm still not sure that it isn't a scam... The back cover is empty; otherwise, the quality is fine. I don't know if this is associated with Richard McElreath. Good luck to everyone.


MIT 6.041[1] is a very good course I can recommend. Not sure if MITx 6.431x on edx is the same, but it's the same teacher in any case.

[1] https://www.youtube.com/watch?v=j9WZyLZCBzs&list=PLUl4u3cNGP...


OMSCS is less intense than SCPD. OMSCS has major deliveries approximately every two-three weeks whereas SCPD every week with a similar depth. UTexas' MSDSO is even more relaxed. So if you want to save time, Stanford SCPD it is.


They are probably not silly mistakes. Label encoding can be very useful for tree based models when the categories are ordinal, or when there are a high amount of categories.


They are most of the times. You get a prediction with a meaningless float (unless the categories are ordinal, which isn’t so common), and categories can change their assigned number (happens in lots of analyses) at every run since they’re not properly sorted. Crawl a few notebooks, I spotted that error quite often.


THWG


I work in ML - I might make 3 buckets for ML careers right now:

1. ML/DL Researcher

2. Data scientist - 20/80 engineering vs modelling

3. ML Eng - 50/50 (or 70/30) engineering vs modelling

People suggesting working in engineering to support ML are right that there's a lot of demand, but it's not what you're asking for.

Becoming an ML/DL researcher working on novel techniques or new models will be hard without academic research experience. Few companies are big enough to support true research, and the ones that are have a very high bar even for people with PHDs

What I call "data scientists" apply math/ML to real problems. The people I see here have a quantitative background like physics/math/CS. Often they have more general quantitative skills that go beyond ML. People like this will might work on things like fraud where an eng pipeline exists and small improvements in the model are valuable.

There are more of these roles than "true research" and they exist at small companies because it's applied. You can get into this with demonstrated evidence in side projects + a convincing background, but professional education might be the most sure way.

Finally - there's a lot of demand for engineers who can do both modeling and the requisite engineering. A model is a small part of what goes into a production ML feature - you need a data pipeline, automated retraining/prediction, a place to deploy the model, monitoring on eng stats + data stats, and the usual application backend/frontend to do something with the results.

You might be able to get into this with some demonstrated experience in side projects assuming you're a SWE already, and depending on your standards for where you want to work.


> Becoming an ML/DL researcher working on novel techniques or new models will be hard without academic research experience

This is not correct for current DL research. I know many undergrad engineers who wrote papers in top conferences. Current DL is mostly about implementing ideas, running experiments, having good sense of data etc. rather than theory. It's an open secret in DL that theories are just there to please reviewers and mostly gibberish and often time plain wrong. e.g. batch norm paper, where what they theorised about it was proven not just false but completely opposite. Still batch norm is heavily used because it works.


I didn't mean to imply it needed to be _graduate_ research - however it would be news to me if they were publishing at top conferences independently of a lab or research branch at a company.

How do the people you know go about it?


I didn't said they were publishing independently, just they did not had any academic research experience. e.g. in FAANG, it's not hard to get into teams which publishes ML papers. Also, I know one guy who contacted one professor, and they collaborated and published few papers while doing full time engineering job.


I think you're talking past each other -- doing solid ML research vs being well paid as such a researcher.


I would add a fourth bucket -- ML Ops. Operating an ML based system is different enough from other systems that I'd consider that it's own specialty.


Could you please provide some examples of what a ML Ops person might do/manage in their day to day jobs?


- Tracking changes is different. Not only do you have to track code changes, but also training data and model changes. You need to build systems that allow for this change management.

- Monitoring -- you have to build specialized monitoring to check for model degradations. Is the model still outputting valid/correct predictions? If not you need to roll back to an old model (see above).

Those are the two main ones.


I would say this falls under #3 - ML engineer


Oh wow, those are actually fascinating problems! Thank you.


Would you (or somebody else) mind comparing/contrasting Data scientist/ML Eng a little more? I'm not sure I understand the difference (and perhaps like many roles/titles in our industry the line is blurry).

Never mind, I mentally flipped the numbers. I read 20/80 and 30/70 but it's actually 20/80 and 70/30. IOW, Data scientist spend a lot more time modeling, and ML eng spend a lot more time engineering. makes a lot of sense. I'll post this comment anyway in case it helps someone else.


I think of ML eng as more infrastructure and scalability. Possibly doing tasks like converting lab models into models that can be run at production scale. There is a blurry line between the two because it makes sense for some tasks to have shared ownership - just like you tend to have with people reaching across the stack to get something done in front-end vs. back-end web roles. As with anything, as you get more experience you get more comfortable jumping around and maintaining a larger set of concerns


Is it possible to do 3 but stick to coding and not touch any dev ops work?

My nightmare is finding a role like that and realizing I’m just a dev ops guy.


If by devops you mean managing architecture + deployments vs only writing code, I imagine that exists at places with mature infrastructure, but honestly MLOps is developing so quickly that small to medium ML teams will go through some infrastructure churn.

The best advice I have is apply for jobs with descriptions that sound like what you want to do and clarify with the hiring manager/recruiter that it actually matches what you're looking for


Can I suggest a longer, but (I think) better route?

Try the Data/ML Engineer route. Instead of going directly into ML, try to work as a “supporter” of those doing ML. There’s a HUGE gap there, specially if you’re a good programmer.

There are a lot of people in the “pure” ML space, people with science background, with phDs, etc.

But there’s not enough people to support them: taking their models to producing, building their pipelines, etc.

If you get into Data/ML engineer, you’ll be working with these people and learning from them.

It’s a longer route for sure, but I think it can yield the highest success rate.


> I think it can yield the highest success rate.

At what?

If you want to be an SRE for a data platform, sure, but this pretty thankless work:

- cleaning up dodgy data

- cleaning up behind low code data pipelines and other painful integration work with systems that suck and sometimes just don’t work (like PowerBI).

- cleaning up behind data scientists that create models in arcane and imaginative ways and expect you to “productionise” them.

- cleaning up behind brain dead scheduling systems that fail unexpectedly.

- constant churn with partners and cloud products for whatever the latest hotness is.

If you want to be solving actual problems with ML, this is a dead end.

It’s IT support for the people doing real work.

…so, it depends on your goals. Getting a job? Sure! Everyone wants a workhorse who they can dump all the annoying problematic on-call tasks to.

Learning ML, contributing to research, building models?

This isn’t a path that leads there.


> It’s IT support for the people doing real work.

This is an appalling perspective. Good MLE skills seem a lot harder to find that good ML ones.


Someone who can engineer infrastructure, pipelines and fire fight production issues is hard to find, but that’s not the point I was making.

My apologies; It is real work; the point I was making is it’s not ML work, any more than writing a yaml file is ML work.

If you want to write yaml files, any number of possibilities exist.

If you want to work with machine learning, then don’t become a data engineer. The skills are, mostly, not related ML, and more closely aligned with SRE / devops.

It’s not infrastructure and helping build models as you mature and advance: it’s almost literally just infrastructure and fire fighting… in my, limited, 3 years of experience as such.


What?

A lot of the hard part isn't the model, and especially in a world where bert, xgboost, optuna, pytorch, etc have solved much of the classic problem and forced 'real' DS to specialize on either the business consulting side (not math/engineering) or theory side (barely implemented). The rebrand of 'data analyst' (SQL, powerbi, . ..) to 'data scientist' by even top tech companies underscores this. It's not yet to where web dev has gotten in terms of global $20/hr fiverrr contractors, but already at say $40/hr for someone who can build real production models for more boring scenarios.

The result is the vast bulk of data scientists (phd, self-trained, consulting, ...) we interview are weak engineers, so going from a make-believe notebook to a trickier production scenario requires the data engineer / MLOps / etc to solve a lot that a typical DS doesn't really understand in practice. Scale, latency, distributed systems, testing, etc. Likewise, the part the DS solves has little to do with the latest neuroips paper, and more just about lifecycle tasks like getting better data, which the other folks on the team will often be involved with as well.

So 2 natural high-paying paths here:

data engineer / MLOps -> MLEngineer -> DS

data engineer -> all-in-one data analyst/scientist -> ML/AI data scientist


I agree with this. From my experience most of the data scientists I have worked with didn't exit the world of Jupyter notebooks. For them, code management, CI/CD, dev/stage/prod separation, etc. is a world of its own that they are not very comfortable with. Heck, they even used Sagemaker to create git repo for their Jupyter notebooks.

It doesn't mean that there aren't data scientists who have some engineering experience as well, but this seems to be rare. For that reason, getting those ML models that they painstakingly build to where they'll generate some real value is super hard. They just don't know where to start. Working across multiple teams and multiple functions is very challenging and it often creates friction. Therefore, creating tools and systems that will enable those data scientists to see the actual value of their labor is paramount.

That's why we're seeing a huge resurgence of so called MLOps tools and platforms that aim to solve all or some of the problems of the entire stack. We are very very early in this journey, but I believe 2020's will be for ML and AI what 2010's were for the cloud and data, ie. new Snowflakes and Databricks but for the actual ML apps. It's exciting.


Definitely agree with your first two paragraphs, but am confused by the pay paths. Can you expand on what the paths mean?


It's useful to work backwards from the knowledge a DS needs to be worth their weight. Imagine a small team of $400K/yr DS + $400K/yr DE + ... and whatever hw/sw . So say a $2-3M/yr project driving $3M+ of new growing revenue or $6-12M of annual savings. At bigger companies, even more magnitudes & pressure :)

The DS will likely:

- be close to the business case & business stakeholders to ask questions a normal lead can't

- know the relevant math + ML algorithms, and build up specializations pairing DS niches ("time series forecasting") with industry niches ("supply chains in manufacturing")

- enough engineering & performance understanding to work with a DE on going from small data sets to big ones

- have an intuitive feel for all of the above - how data/usecases/etc. go right/wrong

That's a lot!!

One path is jumping in as a low-paid intern or new grad and doing your time. But a pivot is different, esp. to get paid along the way. Most CS grads had little math ("intros to stats, combinatorics, & algs; dropped linear algebra"), weak ML ("did algs; intro to ML only covered kmeans & bayes; tried running a BERT model on some data"), and little intuition for how ML typically goes wrong ("what's class imbalance?"). So if they do get hired directly as a mid-level DS, it's probably on a team of the blind-leading-the-blind. Oops.

BUT SQL/Spark/K8S/pandas/regex are real skills. Doing the data engineering, ML operations, etc., around making an ML pipeline more than a fanciful notebook that wouldn't last a minute in production is real work. That stuff does pay well, and by working with the ML folks, you'd naturally get pulled into the ML tasks as well. DS write all sorts of bugs that surface as production evolves and the full team works together on, and new features that needs a team to make real. So taking a job that mixes engineering specialties with ML specialties is a smoother pivot path for the typical CS backgrounds I've seen. Over time, drift to more ML-y aspects of the projects happening until you can do the full hop. (Nit: That won't teach the math & deeper intuition, so I'd still do courses + projects on the side.)


In general, does the DE have higher salary than DS?

Am I understood correctly that there is much more demand for DE than for DS?


I wish I had real numbers. So instinct from what I've seen:

- a data analyst role rebranded as a DS role will be lower paid than a DE role, maybe 50% diff

- an actual DS role is probably higher paid than a DE role, but really depends on the job+co

- a great DS role and a great DE role are both super well compensated. Though maybe again DS higher than DE in most just b/c ability to more directly drive $. Unless something like an infra company, the DS will be inherently closer to the business & outcomes. ("I did this clever thing that netted 2% revenue spike that adds up to $40M/yr in new revenue, what did you do?")


NeurIPS paper, not neuroips paper


still not used to the new name ;-)


With the right mindset it can be insanely fun building infrastructure, automating things, and engineering solutions such that improvements ratchet forward and fires get put out before they have a chance to grow. While the ML people, bless them, are chasing an 0.001 improvement in the metric of the day, data engineers can be having huge impact and changing the game. Meantime ML is becoming commoditized in its most common use cases.


I first had the DE title 7 years ago (going into it having never heard of DE), and have been doing MLE/platform work for the past 5. You’re projecting your limited experience onto a poorly defined role that varies wildly from company to company. My experience is much different from yours: little firefighting, lots of actual building. Yes there is infrastructure, but any good programmer these days should be able to stand up some basic infrastructure.

Yes, don’t get into it if you want to do ML research or apply ML, but if you are interested a bit in it and find building models the least creative, most boring shit ever like I do, and prefer traditional coding, it’s a nice spot to be in.


What is the average salary for DE currently in US?


Great comments. I agree with your take on what being an ML Eng actually means. Of course this will vary to a degree from team to team and company to company, but I think you still capture it well.

I absolutely think MLEng is important and much needed, but too often under appreciated. Being this half breed part engineer part ML leaves you on a lonely island often in many orgs. The ML managers don't really understand what you do and neither do the engineering managers. It is kind of thankless unless your management really understands your role and appropriately advocates for you.

MLEng is often an engineer who wanted to get into the sexy ML space and since it is in the title it feels cool. Then you realize you're more an Ops engineer who deals with the inane code of many "true" DS/ML scientists. Thankless, indeed.


Especially in the edge / embedded space, MLEng will imply more than just doing ops.

Stuff to do could include: - Getting a network architecture to run. - Applying optimization depending on target arch (pruning, quantisation, custom cuda kernels, etc). - Integrating models (rule of thumb: a product is 95% ordinary code, 5% is ML related). - Constructing benchmarks, monitoring


Sure - fair clarification. But conversely, you could make some awesome automation that you rule with a light touch as an engineer, or as a DS you could come up with that ML that Amazon has to recommend you endless TVs as soon as you bought a TV :)


I think he meant "real work" from the perspective of the overeducated people who tend to end up in (and resent being in) corporate data science roles.

From the boss's perspective, the grungy IT support stuff is closer to real work (although the only thing that's actually respected in management, because it's what they do) than the shiny ML stuff that one star hire is allowed to do because it makes the company look cooler than it actually is.


It highly depends. I was hired for a small research group that didn't have a product in production. Got hired for programming, was on the table discussing and contributing to research within a couple months without any background in ML.


These types of anecdotes make the actual practice of both ML and AI seem rather, well, less than scientific. There is supposed to be Ph.D. level math behind all of this, yet an amateur with admittedly no ML background is part of the team.

In Star Wars, it takes Luke Skywalker years to learn to use a light saber skillfully. Then in The Force Awakens, some ex-Stormtrooper with no training picks up the light saber and within 5 minutes is a pro. Kinda ruined the mystique of Star Wars, just like people jumping into ML with no training ruins the mystique of ML.


I am very against the idea that somebody with a PhD is the only one that can do a certain kind of work. But I am ofcourse biased given that you call me out.

Creative and critical thinking is not exclusive to people with a PhD. The ability to understand ones strengths and weaknesses is not exclusive to a PhD.

I would never attempt to write or publish a paper without help of somebody with stronger mathematical or statistical knowledge. On the other hand they should not write source code for a paper without consulting somebody with a strong background in sw engineering. You complement each other. Power is in recognizing that.

You would be surprised how many software bugs I have found that invalidated entire (draft) papers. A PhD in ML doesn't save you from that.


Is mystique worth having? In fiction, sure, but I think not in R&D. I think a lot of ML and AI isn't especially scientific, but I also think there's a lot of low-hanging fruit in applying it to new areas, and both of those make it easier for an amateur to contribute.


Let me expound on this, as there's a lot of PhD hate in the comments parallel to mine.

What is unique to a PhD is that you took a very long time to master a small slice of the knowledge pie. The emphasis here is on long time: Most people simply aren't willing to go for years on a low salary and tedious job.

It doesn't mean PhD's are more (or less) creative, top coders and whatnot; it means we took the time to read all the papers, to know all previous solution attempts and who all the big players in the field are ("all" w.r.t. our niche). It also mean we can read papers much faster than other people because that is basically what we do all day.

There you have it! Now don't send me ML job offers, cause I gotta read this next obscure paper to figure out if they are legit. :D Just kidding.


I think your metaphor kinda plays against your point.

Years and years ago in the movie universe, Luke had to painfully learn to use his lightsaber from a mentor who passed down techniques and philosophy to the student.

In the current day of the movies, lightsabers are understood to be powerful, yet temperamental and exotic, weapons. Mildly trained individuals can use them, even if it's to a limited extent (i.e. flick switch; shiny side is the business end; heat bad, ouch, no touch).

To belabor the metaphor, you've also had a tradition of people, from all walks of life, using vibroblades (basic to advanced standard statistical analysis and regression) in order to achieve some level of parity against users of lightsabers.


There is PhD level math involved. And yet, ML (deep learning in particular) is much more of an empirical endeavor than many would like to admit. A deep understanding of the underlying mathematics does not necessarily give you a better model. Modern models are so complicated that no one can reason through them. Parameter spaces are non-convex and fully of ugly pathologies that make neat and tidy analysis methods useless.

From one perspective, it is disheartening that a deep understanding of the underlying methods doesn't necessarily win the day. From another, it is quite remarkable that having good implementation skills and a methodical mindset can get you quite far.


Fuck mystique.


Much 'applied' ML is based on tweaked existing models, getting them production ready and integrating them into a product. That's inherently a bit of non-trivial engineering work, but as you pointed out, not scientific per-se.


You're disillusioned if you think a PhD is what makes the difference.

Smart people will be able to contribute even if they don't have a PhD. Some PhD are useless and everyone is wondering how the hell they go through that.


In smaller teams/companies one gets to wear multiple hats. However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.


Sorry, no. The term "ML scientist" was specifically created by/for data folks to get rid of unpleasant repetitive work with math equations that has to be done but nobody looks forward to it.

If you've ever crafted a pipeline and tuned it to hum along, then watched it break with new/more/messier data, then figured out creative ways to fix it or replace parts of it with more robust parts, iterating on that and scaling it up, you would know some of the fantastic pleasure of whatever you call it, data engineering.


Sounds like what the media entertainment companies call a "pipeline engineer". Hmmm...


> However, the term "Data engineer" was specifically created by/for ML folks to get rid of unpleasant repetitive work that has to be done but nobody looks forward to it.

This may indeed be how the term "data engineer" is used sometimes, but I have my doubts that it was originally created with this meaning. Not really sure where/when the term "data engineer" was actually created, but ICDE started in 1984 [1] and the Data Engineering Bulletin was renamed in 1987 [2] (from "Database Engineering"). It seems likely that the term "data engineer" has also been used since at least then.

Of course ML did also already exist then, but it's certainly a while before the current "big data" / "deep learning" time. And regarding the topics considered "data engineering" at that time, this is from the foreword of the December 1987 issue of the Data Engineering bulletin:

> The reasons for the recent surge of interest in the area of Databases and Logic go beyond the theoretical foundations that were explored by early work [...] and include the following three motivations:

> a) The projected future demand for Knowledge Management Systems. These will have to combine inference mechanisms from Logic with the efficient and secure management of large sets of information from Database Systems.

Which sounds just as relevant today as it did back then. It also does sound like a rather challenging task, and not exactly like "unpleasant repetitive work". Or at least not any more repetitive than: change some model parameters / retrain model / evaluate results / repeat ;)

[1]: https://ieeexplore.ieee.org/xpl/conhome/1000178/all-proceedi...

[2]: http://sites.computer.org/debull/bull_issues.html


Data engineering jobs named as such started to pop up only in the past few years, coinciding with Map Reduce/Spark availability. I wouldn't be surprised if it was re-introduced by one of the companies developing those systems to distinguish themselves (like Databricks, Cloudera etc.), a sort of a marketing. In the past we had DBAs, now DBA + DevOps + unspecified everything morphed into data engineering.

I used to be a member of SIGMOD and the "data engineering" you mentioned was just an academic term.


Data engineers exist at organisations without any ML work.


Yes, but they are basically what DBAs were before with the addition of ETL. OP is asking about data engineers in the context of ML.


Can confirm.

While data engineer is an excellent role for a fresh graduate, the data engineering profession shares many similarities with the SRE/IT professonional.

The best data engineers are the folk who had job title of DBAs of yonder year.

You will always be the supporting cast, rarely the star.


SRE = Site Reliability Engineer?


yes


This, 100%. That said, most data scientists don't do what you would consider real work (meaning, I assume, interesting work with significant mathematical/analytical meat). There just isn't a lot that's both interesting and useful to private-sector rent-seekers whose opinions of your work determine whether or not you advance.

Most of the people doing real ML in industry are prestige hires--they're hired because their names draw people in, but basically get to work on whatever they want--and you need a top-10 PhD at an absolute minimum to be eligible for those.

The ugly truth about industry is that 99.9997% of it is flow capture based on power relationships, found artifacts (i.e., corruption opportunities) within the state, and the implementation of very simple processes but in a way such that the threat to executive reputations as a first priority, and profit as an important second one, are minimized. This doesn't exactly make a market for ML innovation, unless your boss for some weird reason still cares about being a co-author on your papers (which his bosses will pressure him not to let you publish, because after all, this publishing is a distraction from your paid work).

On the other hand, if you want to be able to afford a house in the Bay Area, and to be tapped for (indeed, most likely forced into, both due to losing interest in and being unhireable for IC work) management in your mid-30s... then go for industry. The poison carrot will make you sick but it will kill you more slowly than poverty, so that ain't so bad, now is it?


Strongly disagree.

There's a vast amount of work that doesn't involve unethical recommendation systems.

Expand your horizon outside the Bay Area.

The plurality of work I see is straightforward computer vision/NLP applications.


I suspect the work you're talking about could be easily handled by an intern working with Core ML and a MacBook.

The landscape is varied. There are companies doing real actual big leading edge stuff, there are companies where ML is sprinkled onto projects as a buzzword but no real interesting work happens, and companies that just need a practical small solution like the ones you mentioned, and could get by with Core ML, but don't because they hire a PhD who isn't aware of Core ML.


What does productionizing coreml look like if I wanted to stand a model up as an rpc service?


I really like your style. I think you should write a book (this is not sarcasm)


I’ve kinda developed the view that large organisations come to mirror the Russian Communist Party.

I’m interested in “flow capture based on power relationships”.

Do you have any recommended reading on this?


Every large organisation tends to be like a small government. Inefficient, drown in politics and unable to change.

There are exceptions - where someone principled dictator impose a VC style model where teams basically become independent startup and die or succeed. 100 fails, one becomes the next revenue maker for the company. That's how AWS was born.


There’s way less inefficiency and way more accountability in the public sector. Look at how efficient publicly funded schools are, for example, or publicly funded rail or healthcare. You could literally pick almost any industry.

Accountability comes from elections. If managers in companies had to be re-elected it would be interesting.


This. Corporations are great at imposing mean-spirited personal accountability (i.e., if you're perceived to have fucked up, you get fucked) but that doesn't actually solve problems or change anything. People get fired, careers end, new faces replace the old, nothing gets learned. Of course, once you get into middle management you're exempted from the stack-ranking bukkake, and executives write their own performance reviews and almost never face consequences for their actions.

Companies are fantastic at making it look like accountability exists, because people at the bottom get punished for even the smallest mistakes, but avoiding any consequences that would affect high-ranking members or force the organization to change how it does business.


That sounds a lot like the DARPA model too.


> I’ve kinda developed the view that large organisations come to mirror the Russian Communist Party.

Only the ones which have an unkillable cash cow. So, I suspect Google or large banks are mostly like that, but places like SpaceX or even large consulting firms (Delloitte, IBM etc., where managers essentially eat what they kill) cannot allow themselves to degenerate into a Chinese court.


Now this is interesting. I've always found it fascinating that when profit is on the table, democracy is nowhere to be found. I've looked, not too hard TBH, for essays and literature discussing the correlations to business model management structures and government/nation political hierarchies - not education level (propaganda), but critical analysis. I've been an employee of several of the top corporations on our planet, and the idea that corruption is not rampant is a farce. One simply lives within the environmental constraints and leaves when it gets to be too much. Does caring about corporate (and the larger realm of ethics) cast one incompatible with a modern corporate hierarchy?


Unfortunately, the only way to prevent hierarchy is to create a limited hierarchy (this is the purpose of constitutions) a priori; hierarchically naive organizations fail on this account. External parties will demand hierarchy simply because they want to know your organization (or nation) isn't wasting their time--no one wants to deliver a sales pitch to people who can't authorize purchases. If they're not careful, a group of people can end up in a state where the necessary-for-external-relations hierarchy becomes a total one. You see this with startup founders; the one who talks to the investors the most ends up in charge, and the ones who deal with employees or low-status counterparties lose power. This is why "flat" organizations can't really work; people who need things from the organization demand to know who to talk to in order to actually get things done, and eventually those "who to talk to" people end up with informal, then formal, power and it's very difficult to get them to give it back.

The large-scale failure of democracy that's happening all over the world is something different, though. Regulation is struggling to keep up with technology, and it doesn't help that nation-states have already been doing a piss-poor job of protecting people from their employers. If the US falls in the next 20 years, it won't be due to Covid or Trump or nation-level adversaries; it'll be due to the obscene power given to employers, who can literally ruin an employee's life--not just fire him, but anally ravage him in perpetuity with bad references--for any reason or none. Eventually, unless national governments start dropping serious lead pipe on employers' heads, people are going to tire of paying 30+ percent of their incomes to a government that lets bosses get away with this shit.


Interesting take. In British history the first positive step towards freedom that I note was the creation of law courts. These gave surfs some power over their lords and provided some level of fairness rather than everything being about favour.

The US does seem to lag the UK and Europe in terms of employment law in some cases (no formal employment contracts for most employees, can be fired without notice, little statutory holiday, maternity or paternity leave entitlement, etc. etc.)

It has been argued that the union movement — while susceptible to corruption — was a hugely positive force in economic and political terms for American workers. Unfortunately thatcher and Reagan saw this as such a threat that they attempted to destroy their own manufacturing base in order to smash the unions.


> If the US falls in the next 20 years, it won't be due to Covid or Trump or nation-level adversaries; it'll be due to the obscene power given to employers, who can literally ruin an employee's life--not just fire him, but anally ravage him in perpetuity with bad references--for any reason or none.

How do you define "US falls"?

>Eventually, unless national governments start dropping serious lead pipe on employers' heads, people are going to tire of paying 30+ percent of their incomes to a government that lets bosses get away with this shit.

People endured much worse in medieval times, and endure much worse right now in China.


> People endured much worse in medieval times, and endure much worse right now in China.

Really? The USA has hollowed out portions of the country equal/worse than the worst 3rd world countries.


I hope you're hyperbolic, the worst 3rd world countries have no governance (unless you count local warlords), 5 year olds working in dangerous and toxic conditions, hunger and slavery.


You don't realize what is going on in the United States. We have portions of the USA where the police don't even bother, and are run by local gangs. We also have children working, in dangerous and toxic conditions. We also have hunger, and yes we have slavery: prison labor. The USA is not what you think it is.


> We have portions of the USA where the police don’t even bother, and are run by local gangs.

The places where the police do “bother” are, ipso facto, also run by local gangs.


> We also have children working, in dangerous and toxic conditions.

Can you elaborate on that? I've never heard that parcitular thing about the US. For reference, In Kongo, there are 5 year olds today carrying heavy buckets in makeshift cobalt mines, a'la XIX century England or France (plus the toxicity of cobalt, people who work in these mines get cancer if they don't die in an accident first). Even with whole families working in such conditions, the pay is not enough and not stable enough to sustain the family, and they are often working while hungry. Is there anything comparable going on in the US?


Gee, the news appears to be scrubbed from most the 'net now, but I recently read about Mitsubishi using child labor in the US: https://flipboard.com/article/major-car-company-used-child-l... This is not as bad as your reference, but know where our police do not go anything is on the table. The US plays extreme.


I'm confused here, does migrant mean illegal or is there some program similar to farms to bring people across the border to work?


It means both; to the employer they are good low expense labor and the business is wise to hire them, to the working class they are illegals taking jobs, (their illegal status tends to be in control of their employer, btw) to the political class they are a source of outrage funding, to the workers themselves they are simply struggling to survive anyway they can - caught by bad luck and an unforgiving world.


Interesting you say that because there’s some critical analysis I’ve come across in the past that states that taxation is a key driver towards democracy and that captured wealth (such as from oil) promotes oligarchy and dictatorship. There are outliers in either direction, naturally.


Data Engineer is the outsourced part of what no ML researcher wants to do - a thankless, high-pressure, dead-end job which in no way leads to actually doing ML later - it would pigeon-hole the OP as unfit for real ML.

The best way is to take Stanford Deep Learning courses at SCPD, build a reputation, do real ML work (even if it's not a PhD, it's the same courses Stanford PhDs take).


I'm struggling to find a single resource that ELI5 step-by-step how a neural network does digit classification (learns 0-9) in a grid of 3x5, like the traffic signal countdown timer


I agree with your first sentence. I'm not sure I would recommend SCPD.

If you want to do real ML work, you pretty much need the PhD. This is a hard thing for people who have 140+ IQs but do poorly for whatever reason with formal education to accept, but it's true. Even if you get one real ML job without a doctoral degree, you won't get a second one.

Sure, other 140+ IQs can recognize very smart people with only (or not even) a bachelor's degree, but (a) your career in industry will be influenced by the opinions of not-smart but politically empowered people who rely on heuristics like educational prestige because they can't judge the genuine article, and (b) some of those 140+ are nevertheless scumbags and will use (a) against you.

If you want to be a serious player in an academic field like ML, you need to not only get the degree but start publishing and never stop. It doesn't matter all that much if your papers are any good; no one in industry will ever read them. But you need the image of a successful academic who's just slumming it and can go back any time.


I've worked in industry doing what I would consider "real ML" (i.e. shipping models that are core, revenue generating product features into production) for a decade at a range of companies from startups to fortune 500 companies and the part about needing a PhD is entirely nonsense (as is the majority of the content in this comment).

Maybe if you consider doing "real ML" exclusively working with Deep Mind or on Meta core research team, but there's a lot more to "real ML" than just these teams.

I'm curious how long you've been in industry and what types of orgs you've worked at to get this impression?

edit: In general the comments in this post are bizarrely out of touch with reality and have really shifted my perception of the avg HN commenter.


So basically no chance for the OP to ever get to ML as getting into a top 10 ML school for a PhD is a minor miracle, finishing it even bigger and that's just the initial qualification step?


Academia remains an option even if you don't get into a top-10 ML school, if your research is good. Granted, it's tougher to get published and cited, because you're less likely to know people who can push your work, but if you do good work, you can still play.

Government may be an option, although it doesn't pay as well as industry and you can end up in a comfortable but stifling role.

In corporate, though? Yeah, you pretty much need to have the appearance of star power, which means degree prestige matters. Whether you're actually any good (and, trust me, there are plenty of mediocre people from top schools) doesn't really matter, because the decision-makers are too stupid to know the difference.

There are ways to play this, though, if you're aiming at industry. Harvard isn't a top-10 CS department, but the people in corporate aren't going to know that, and so "Harvard PhD" is going to make them fellate you just because it's Harvard. That may be an avenue. Or, better yet, get a PhD in something that sounds technical but is easier and less selective.

That said, if your goal is to play the corporate game and make a lot of money, you should probably forget about ML and focus on becoming a manager as quick as possible. If your goal is to do intellectually stimulating work, you should probably not consider corporate, because your work is going to be evaluated by people who are literally 50 IQ points too dumb to do so, and while this noise factor is manipulable rather than truly random, the people who have the skills to do perform said manipulation tend to go into management, not technology.


I don't think what you consider "real ML work" is what OP is asking for. While they wrote "research", the three points they wrote at the end is not research, and I don't think they want to be an academic and/or write any papers.


There is a lot of truth in this comment; it's not pleasant, but it's truth nonetheless.


Don't group data engineer and ML engineer together - they're very different positions. Data engineers typically don't do any ML (in fact teams often create DE positions to differentiate between the people who do ML and the people who don't); for ML engineers is varies depending on the team/company - some work more like applied data scientists, others focus almost entirely on infrastructure and deployment.


The trouble is that these positions try to filter you out asking for the same concepts in interviews. Nobody cares for how good a programmer you are. Data engineer positions don't work well if you want to touch cutting-edge concepts like big-scale DNNs. Sure you'd learn about deploying things but if your day-to-day job is not about e.g. ONNX, pytorch, JAX and massive scale training/inference pipelines, then you'd still be left out.

Chicken and egg problem. Probably the optimal way is just aiming for top ML research teams right from the start as a new grad. No other way is better.


As a Data/ML Engineer, I cannot quite recommend this route: this role is unlikely to get you to a true ML role.

It will give you a ton of exposure to ML techniques and infrastructure practices, but the true modeling work is still done by PhDs with the prerequisite background/knowledge. You will be taking black boxes -- pre-built models -- and doing the data cleaning, fine-tuning, and experimentation.

I know a handful of people who transitioned to full-on ML from this role, but those individuals were quite gifted and hardworking, and probably could have learned without this exposure. There's a significant gap in knowledge that can only be attained through academic experience.


Or through a free online course like deeplearning.ai or fast.ai, and a few personal projects.

So many PhDs I've worked with have known the literature but couldn't produce something valuable to save their life. That might have been the nature of them being over billed on projects and spread too thin, but I do not think that is a requisite for building models. It certainly isn't for the rest of the engineering pipeline.


It depends on what company you're going for. If you are looking for a role in cutting-edge ML (FAANG/OpenAI/Deepmind/etc), the stuff you learn from these online courses does not provide the theoretical rigor required.

If you want a role in a small company building best-effort, out-of-the-box models, then the courses are plenty fine.


There are very few positions such as this, as most AI/ML companies models are in infancy and will be ready "some day".

Even if you find such a role it will be chaotic and not meaningful.


What about salary of pure ML vs Data/ML Engineer?

Is there difference and who gets more and how much more?


FastAI. Specifically "Deep Learning for Coders" which was recently updated. https://course.fast.ai/

Do what the instructor recommends: watch each lesson once in its entirety and then re-watch it while playing along. But don't just type their commands verbatim. Try and do something slightly different.


For the bottom up side of this, https://deeplearning.ai. Where fast.ai gets you started training models immediately, deeplearning.ai has you implementing neural nets from scratch in NumPy then transitions to TensorFlow.


This is the best way to get into deep learning: Do both FastAi, and also do Deeplearning.ai. They take opposite approaches that compliment each other perfectly. Fastai starts with the practical and doubles back to help you understand the technical. So in lesson 1 of fastai you're doing practical work, and half way through the course you start learning about taking derivatives of loss functions. Deeplearning.ai on the other hand in lesson 1 you're learning about taking derivatives of loss functions, and you do practical things with it starting about half way through the course. They both are absolutely fabulous for their own reasons, but the two together really cover each other's strengths and weaknesses in an amazing way.


This seems like a really interesting way to do a crash course in do-it-yourself AI—approach it top down and increase context as you go on.


1. Clarify your goal.

Do you want to:

a) Become an academic in mathematics/statistics.

b) Become an academic in computer science with a focus on artificial intelligence.

c) Become a MLE in "regular" statistical applications. Aka bayesian classification, "core" statistical principles.

d) Become a specialized computer vision/natural language processing focused MLE.

e) Become a generalist software engineer who can whip out the above if needed.

In no way is e) the inferior option.

Generalists who can write code fast with 100% test coverage and pristine logging are by far the segment the industry has the shortest supply of.

There are TONS of math guys. Vanishingly few Principal Engineers who can write a design document and lead a project.

(Machine learning customers are OBSESSED with test coverage and verifiability. Believe it or not, multinational corporations generally don't want to unleash a {your_adjective_here}ist algorithm on the world.)

2. Study the above, properly.

To study the math, Elements of Statistical Learning/Algorithms by Goodfellow.

Start on page 1, do every second exercise. Publish a summary of every chapter you finish with your answers to GitHub.

3. Pursue your goal in a publicly verifiable manner.

See:

https://news.ycombinator.com/item?id=32071137


The Elements of Statistical Learning is by Hastie et al, not by Goodfellow. Goodfellow wrote Deep Learning. They are both available for free on their websites.



I am going to give you some meta commentary.

> ML/DL research

I think you should apply ML deeply to a domain you care about, but see if you can find a domain that can be both generative as well as for understanding. If you are heavy into the math and don't need a grounding basis, maybe you don't need a domain to apply the ML research to, but the best scientists had a problem they were trying to solve, not just "doing research". Basically research in strong direction, for strong purpose solving a problem.

I guessed you asked a low level mechanical question. How do I get from A to B. You might already have the domain.

So to answer the actual question, I'd pick something like MNIST (digit recognition problem) and master it by hand from scratch using multiple techniques, as many techniques as I could find. So that I am applying each algorithm to a fixed problem, so that the algorithm and then later a paper the algorithm gets embedded in my mind.

Use only cleaned datasets, spend zero energy on those a the beginning. Cleaning is a separate job and two different things don't need to be learned here. In fact stick with only industry benchmark data so you can compare your results to more papers.


Andrew Ng's machine learning course on Coursera is a good introduction to the theory. https://www.coursera.org/learn/machine-learning


Not sure if it's the same material but there is a course by him on youtube as well: https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkY...

I've watched every video in that playlist. He is a fantastic teacher.


I was about to share the same link, really good solid foundation and understanding of what is behind ML


If you are looking for machine learning outside of Deep Learning, there are just 2 books

1. Elements of Statistical Learning (very frequentist treatment) by Hastie et.all [1]

2. Pattern Recognition and Machine Learning by Bishop(for a Bayesian treatment)[2]

Both are freely available online. Reading one book will get you to top 5% practitioners and reading both will get you to top 1%

[1] https://hastie.su.domains/Papers/ESLII.pdf

[2] https://www.microsoft.com/en-us/research/uploads/prod/2006/0...


Machine Learning: a Probabilistic Perspective by Murphy may be a better reference. Murphy has more up-to-date books.

> Both are freely available online. Reading one book will get you to top 5% practitioners and reading both will get you to top 1%

At which percentage do you start meeting math PhDs from top schools? You'll most probably never meet their level of understanding just by reading books or doing exercises. Having read Bishop, I don't have one tenth of the knowledge I'd need to do research at that level, you need more exposure than that.



That [1] is a link to the uncorrected 2009 copy of ESL. The main page provides links to the most up to date version.

https://hastie.su.domains/ElemStatLearn/


Have you taken undergraduate-level linear algebra, multivariable calculus and probability? Those are the prerequisites if you want to approach things a bit more rigorously. If you've covered them, get something specific to ML.

I like Hands-On ML... by Geron as a decent intro to ML book. FastAI seems a bit overrated to me - I didn't like that it uses its own helper library or the teaching style but it obviously works for other people.

Then there's more exhaustive books on theory - Elements of Statistical Learning, Pattern Recognition and Machine Learning, Bayesian Reasoning and Machine Learning, Murphy's books on probabilistic ML etc. But obviously the theory books have a lot of overlap with each other so there will be lots of material to skip after you've read one or two of them.


Fast.ai [1] would still be my course of choice, especially for a strong programmer.

After completing that, I think Kaggle competitions are a great way to master your skills.

[1] https://course.fast.ai


> Read research publications and try to implement them.

In my opinion, jump straight into this! Learn prerequisites as you need them.

I found Goodfellow's book [1] to be helpful to learn some basics.

But don't think you need to read the whole book before you start reading and implementing research papers.

If you try and build up all the fundamentals thorougly, you run the risk of going down a very deep rabbit hole e.g. learning real analysis so you can learn measure theory so you can learn measure theoretic probability theory so you can learn stats properly etc.

You can be a productive researcher and patch up the fundamentals over time.

[1] https://www.deeplearningbook.org/


Follow the HuggingFace Colab notebooks. They are well-written and language-related AIs are a great way to get started because you'll naturally have a feeling for what it should produce.

Afterwards, do a statistics class. Most algorithms these days are based on softmax, meaning the cross-entropy between two discrete/continuous probability distributions. There's a lot of choice in which distribution to use to model what and it will have strong effects on your gradients and, hence, training trajectory.

Concepts like shannon information and entropy are also very helpful for you to monitor training progress. Typical loss values will do exponential annealing and it'll be difficult to see further progress. But if you still reduce the bits of entropy in your classifier, learning is still going well. So you need to understand what to visualize and how to calculate that.

As for implementing research publications, maybe start with easy mode and go to paperswithcode.com . There, you will find papers AND their source code, so that you can look at how others implemented their paper.

As for FastAI and Kaggle, my personal impression is that it's mostly for toy problems. No real AI researcher would be willing to disclose their full source code to an international megacorp like H&M for a measly $15k in price money, yet similar terms appear to be the default on Kaggle:

https://www.kaggle.com/competitions/h-and-m-personalized-fas...

https://www.kaggle.com/competitions/dfl-bundesliga-data-shoo...

https://www.kaggle.com/competitions/feedback-prize-effective...

EDIT: Also, I strongly disagree with course.fast.ai on these points: "Myth (don’t need): Lots of math, Lots of data, Lots of expensive computers" To train a state of the art ASR AI, you need roughly 100x A100 for a month, 100,000+ hours of audio recordings, and math knowledge to find a maximum likelihood path through a logit matrix. Unless, of course, you're only working on toy problems.


fast.ai openly declares it doesn't have complete pipelines for audio, especially ASR.

They're also very open that areas like ASR and RL take a ton of compute to replicate SOTA. This shouldn't be used to substantiate any sort of criticism of fast.ai the library since it is just wrappers on pytorch. Maybe on the course.fast.ai page they could make some sort of disclaimer that some areas like ASR and yet to achieve the quantum leap of performance that we've seen in other domains.

I disagree completely that fast.ai is for toy problems. It's a development tool that in my experience provides for more rapid iteration than starting in vanilla pytorch. Use it for toys, for challenges, for research, or in industry. You can always rewrite your solution in pytorch once stable if there is benefit in doing so.

I agree with others that fast.ai is a great place to start.


Regarding the Kaggle competitions, I also agree with that you need access to powerful hardware. Some competitions have very big datasets that require you to have TBs available to download and uncompress the files. Also, having your own GPU will allow you to iterate faster. I tried a competition with just a Colab instance and I felt kinda handicapped, unfortunately.


> to train a state of the art ASR AI, you need roughly 100x A100 for a month, 100,000+ hours of audio recordings, and math knowledge to find a maximum likelihood path through a logit matrix.

Do you think there is a distinction between the kinds of problems that take some kind of "raw" signal data (audio, images etc) as input, where deep learning approaches appear to be fruitfully applied, and other kinds of problems that appear in many places in business and the public sector where the input space is not some kind of raw signal data but instead tabular data.

I have heard some people argue that the latter kinds of tabular-data style problems can be effectively tackled with a variety of statistical methods, and that deep learning style approaches do not offer an advantage.

Maybe fast.ai is commenting on the latter class of problem.


I agree with you :) Most business-style "automate my Excel" problems can be solved pretty well with regular statistical models, meaning you don't need AI to solve them.

So my impression is that fast.ai teaches you how to use AI methods, but with example problems that didn't need AI to be solved well.


ASR is one of the hardest tasks for a novice to ML/DL to do.


In my opinion, it's still A LOT easier than Optical Flow, Depth Recognition, or Robot Navigation... So I'd say:

- Text Generation/Transformation: Easy

- Speech Recognition: Medium

- Optical Flow / Depth Recognition: Hard

- Actually controlling robots with AI: Nightmare


Great suggestions and perspective! Thanks!


You could do worse than go back and make sure your foundation maths is solid. Revise some Discrete Maths books and understand identification, classification, sets, equivalence... Make sure you've a solid ground on concepts like dimensions, functions, differentiation, integration, extrapolation, interpolation, then toughen up your Linear Algebra, optimisation, solving, regression, before getting into approximation and gradient descent.

Sure, most of this sounds as dull as a broken clock, but in my observation it makes the difference between students who can just use machine learning tools by copying textbook cases and adopting a lot of fancy new terminology, and those that understand what they're doing.

That difference really kicks in once you get off the beaten track of popular use-cases, into applying ML to new, unproven applications. Then you need a deeper understanding of why some algorithms may be useful and others are inappropriate.


> You could do worse than go back and make sure your foundation maths is solid.

This. Though I have no textbook I'd recommend; all of the ones I used were a very hard slog to read, let alone grasp the maths in them.


There are already good recommendations here for getting into the "standard practice" of ML.

To really understand what is going on though, the path I am having some early success with (as a long time developer / data pipeline guy, but newly into the standard python / ML practice) is to run through Kochenderfer's "Algorithms for optimization" from 2019 (MIT press), including implementing the exercises, as optimization is the cornerstone of the majority of ML methods. Some of the most fun I've had in a long time.

Freely available here:

https://algorithmsbook.com/optimization/

From there on, I'm less sure, but expect I might experiment with implementing my own deep learning methods just for fun, or similar.


As always, it depends.

I started out without *any* background knowledge a few years ago. Found the Data Scientist career track of Datacamp pretty helpful, since it goes beyond programming and includes the mathematical and statistical theories as well. (https://www.datacamp.com/tracks/data-scientist-with-python)

It's basic, but a solid foundation to build upon.

If you're already familiar with most of these topics, fast.ai is the way to go!


I can only talk from my own experience but there's other way than just straight up go to ML/DL and crank out new models. That way is to come from the application domain side. [0]

For me that was robotics, with the motivation that traditional method felt like it wouldn't scale outside static environments so I started to look in to ML/DL (Deep Reinforcement Learning really) and from the looks of it I'm not alone. [0]

Now I do research in it, without a PhD nor taking any courses in it at my masters. (except one DL course where we had to code everything including gradient flow from scratch. No framework)

Frankly, going pure DL at research level today seems like a steep uphill; the top labs and research institute(including industries) are the ones that are producing (and notably training) most of the SOTA models. Getting in to those circles are your best bet but then a PhD at a top university under a top professor is the best bet, and competition to get in to those are insane

[0] https://www.natolambert.com/writing/path-into-ai


I gave a more poorly expressed version of what you said.

With one change, I think the author has a domain in mind or plan. They really asked how to get to

> Read research publications and try to implement them.

They might want to be able to use the body of AI and working code to apply it to a different but specific problem. Be a power user of existing models.

But totally agree and well said!


The dilemma is that although there are recommendations like fast.ai to get your hands dirty quickly into ML/DL, none of the good AI researchers and practitioners got there via these quick tutorials. They got there via rigorous linear algebra, traditional ML, statistics and related computer science knowledge.

I would say try fast.ai for a quick taste of what ML/DL is like, and then go back to linear algebra, deep learning and stats courses from top schools while picking a personal project goal to achieve (e.g. reproducing a popular CVPR/ICML paper results or building your own XXX) Once you go through a full lifecycle of building something from scratch, you will have a much better understanding about where you are and wanna go from there.


I suggest first understanding machine learning in general before jumping into deep learning. The book ISLR2 is very accessible and starts with linear regression and works through many other methods including neural networks. There is a Edx course based on the book.

https://www.statlearning.com/

https://www.edx.org/course/statistical-learning


It might not be exactly what you're looking for but I stumbled upon this video by Sebastian Lague the other day: https://youtu.be/hfMk-kjRv4c

As someone who's only ever dabbled into minimaxir's GPT-2 packages this was an extremely approachable exploration (and explanation) of how a neural network works. I can't recommend it enough.


+1 for fast.ai it is and was probably best course for noobs for years (i rember 2017/18 courses as big help in DL even that i got exp with machine learning before that [about 1 year of mostly classical ML] :P)


1. Watch the 3blue1brown neural network series for a gentle refresher on the underlying maths and the big picture of neural networks (and to be inspired): https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_6700...

2. Run through the catalogue of StatQuest videos for topics of interest in machine learning etc. This includes step-by-step maths and code explainers: https://www.youtube.com/c/joshstarmer/playlists

3. Watch more 3blue1brown videos if you need to step further back to refresh on calculus and linear algebra (that's most of the maths you'll need).

If you're hooked and can't get enough of the above content, then congratulations and welcome to the Matrix.


1. The theory by now is filling bookshelves, so forget learning about "the theory". Learn the foundation (to see where you're niche --e.g. optimization for fully connected NNs; last I heard it should be done by second-order methods if you have the computing power-- fits in) and then learn the niche's theory.

This will require at least upper undergraduate level math BTW.

2. You could get by knowing the theory in a handwavy way. Not ideal but I've seen people do it. For implementation that is enough in many case.

3. Again, "research" is too general. While you might understand some experimental ICML papers, it's very unlikely you will understand a single COLT paper if you don't know a lot of math.


Off-topic but Norvig's talk (As We May Program) intersection of hacking-skills + statistical-thinking + domain-knowledge (https://vimeo.com/215418110)


Fastai. Updated series of lectures and notebooks for 2022. High level as well as building neural nets from scratch. Doing it at the moment and enjoying. Good as a starting point for more in depth studies.


part two of the course covers building everything from scratch and covers almost all of what OP wants, so this is a very good choice. anyone interested in a study group?


i am currently doing the 2019 part 2 of the course......a study group would be amazing


please email me at its.shrey.arora at g m a i l


yes! Let's do it


please email me at its.shrey.arora at g m a i l


Is there any benefit in learning how to code the algo from scratch? In most cases, We will just use standard libraries, unless the goal is to build up intution?


I think it's still helpful as it indeed will give you more understanding. You wouldn't want to go for optimised algo's etc unless that rocks your boat, but just basic implementations is good to learn the basics from imho. And pretty satisfying to boot, again, imho.


> what are the best resources for a CS student/decent programmer to get into the field of ML and DL on their own

It would be helpful to know more about your background and motivations.

Are you currently enroled in a Bachelor of Science (B.Sc.) full time stdy program at a university, and your goal is to be a research scientist (either staff scientist or professor or research fellow) in the area of machine learning?

If this is true, does your university offer a Master's program in Machine Learning, or are your grades such that you could apply for such a program elsewhere after completing your first degree? You could then enter a Ph.D. programm in machine learning itself, or in computer science with an applied ML topic such as ML for NLP (Natural Language Processing) or ML for IR (Information Retrieval = search engines) or ML for robotics etc. The choice of doctoral advisor and Ph.D. topic will steer you towards a particular direction, in which you can then find employment to conduct research under the direction of others, and potentially, become a research group leader yourself after gaining the necessary experience. Time: M.Sc.: 1-2 years; Ph.D.: 3-8 years; postdoctoral/pre-tenure time: e.g. 2-k years, depending on ability and luck/timing). It's a lot of fun to get paid for doing science, so I chose that path (but with multiple deviations due to startups and industry jobs along the way).

The more people know, the easier it is to recommend you useful materials.


https://www.deeplearningbook.org/ and http://incompleteideas.net/book/the-book-2nd.html are excellent resources for supervised and reinforcement learning, respectively, and some knowledge of statistics and probability go a long way. But I think by far the most important thing is to just start training models, even very small ones, and developing an intuition for what works and what the failure modes are.

- Get really comfortable with matplotlib or your graphing library of choice. Plot your data in every way you can think of. Plot your models' outputs, find which samples they do best and worst on.

- Play around with different hyperparameters and data augmentation strategies and see how they affect training.

- Try implementing backprop by hand -- understanding the backward pass of the different layers is extremely helpful when debugging. I found Karpathy's CS231n lectures to be a great starting point for this.

- Eventually, you'll want to start reading papers. The seminal papers (alexnet, resnet, attention is all you need, etc) are a good place to start. I found https://www.youtube.com/c/YannicKilcher (especially the early videos) to be a very useful companion resource for this.

- Once you've read some papers and feel comfortable with the format, you'll want to try implementing something. Important tricks are often hidden away in the appendices, read them carefully!

- And above all, remember that machine learning is a dark art -- when your dataloader has a bug in its shuffling logic, or when your tensor shapes get broadcast incorrectly, your code often won't throw an error, your model will just be slightly worse and you'll never notice. Because of this, 90% of being a good ML researcher/engineer is writing tests and knowing how to track down bugs. http://karpathy.github.io/2019/04/25/recipe/ perfectly summarizes my feelings on this.


I second Karpathy's version of cs231n (2016). He's an amazing lecturer.

A good alternative to Goodfellow is "Dive into Deep Learning" (https://d2l.ai), which is free and more up-to-date, interactive, and practical, IMO. Videos of a 2019 Berkeley course based on it are available too (https://courses.d2l.ai/berkeley-stat-157/).


Goodfellow's book is a bad recommendation for people don't already know the material in it.


Stanford CS231n: Convolutional Neural Networks for Visual Recognition [1] The assignments are excellent and will let you implement a deephish network from practically scratch, before diving into modern frameworks and applications.

This is not an instant-gratification with fancy results kind of course. But put in the work, and you will learn some very cool stuff.

[1]: http://cs231n.stanford.edu/index.html


This is the best intro route. I 2nd this


Don't know about 2022, but in 2020 the fast.ai video series was a good way to get started. It has been revised a few times since then, so chances are it is still good.

If you want something more theoretical there is a book by Hopcroft et al. that was released in draft form a number of years ago. It appears to be out for real now: Foundations of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan. Blurb and video lectures: https://www.microsoft.com/en-us/research/publication/foundat... I just found these so haven't looked at them yet. The book draft (2014, wow) is here: https://www.cs.cornell.edu/jeh/book11April2014.pdf I didn't stick with it long enough to make much progress, unfortunately.

Kaggle.ai problems are a good set of practical projects even if you're not aiming to be competitive at them (which takes a lot of effort and resources). The Fast.ai vids are ok as preparation for them.



> By getting into machine or deep learning I mean building upto a stage to do ML/DL research.

> The target ability:

> 1. To understand the theory behind the algorithms

> 2. To implement an algorithm on a dataset of choice. (Data cleaning and management should also be learned)

> 3. Read research publications and try to implement them.

There are many different ways that people do ML/DL research these days. Some people do more theory-work which will necessarily be more focused on mathematics, and others do more of an applied approach which will be more focused on coding and iterating.

For theory-driven work, I think Michael I Jordans list is still pretty solid:

> https://news.ycombinator.com/item?id=1055389

I would focus on the fundamentals first though:

1. get a solid background in mathematics

  - analysis (a suggestion is Baby Rudin)

  - probability (Grimmet and Stirzaker, maybe something with measure theory after)

  - statistics (Casella and Berger or Wasserman's book is a good start)
2. get a solid foundation in statistical machine learning

  - Introduction to Statistical Learning is a fantastic start

  - Then choose 1 or both of the following:

    - Elements of Statistical Learning for a Frequentist Approach

    - Pattern Recognition & Machine Learning for a Bayesian Approach
3. get a baseline understanding of deep learning

  - the deep learning book by Goodfellow is decent

  - start reading papers here and trying to implement them
If you get through to this last step, you are probably solid enough to get a job building models. If that's the route you want, then begin iterating on learning about new approaches in papers (look for papers with code / data) and implementing them.

If you want to go the academic route, you have enough of a view of the field to begin specializing further. Choose a sub-domain and dig deep if you want to do more deep learning work. Maybe revisit Michael I Jordan's list if you're still confused about where to go. A lot of those books will feel a lot more familiar.

Best of luck!


A lot of people here recommend fast.ai which is solid, but not really that useful if you want to do research.

I would start with math foundations: basic linear algebra, stats, probability and some analysis. CS undergrad level is plenty of math for start.

Then I would try to understand back prop on intimate level: learn how to calculate gradients, maybe take a look on how autograd works as well.

Then you should know a bit to pick your next steps by yourself.


To piggyback on the OPs question, I for one think the part in parenthesis is actually most important:

>(Data cleaning and management should also be learned)

There are many students and graduates who either didn't want to do research in the first place or didn't get that research grant or position and looking to get employed in private sector with their degree. Many universities and colleges have now also retooled some of their statistics degrees as dedicated "data science" curriculum who either know basics of ML/DL or have the prerequisite background to learn quickly.

However, in my experience (I am extrapolating from my own past job search experiences) while "understanding theory behind the algorithms" counts still for something, it is much less than one would think. Familiarity with the software technologies and practical implementation is what counts much more. This includes not only "data management", a phrase which makes it sound like the data simply exists somewhere and only needs to be managed (not unlike a Kaggle competition), but also the data pipeline management from generation/collection to analysis and communication of the results, and deploying the software the implements it all, and so on. I suppose (never been on that end of the interview table) given any two candidates to interview, it is very difficult to evaluate how deeply one understands theory of some algorithm compared to other if they both demonstrate some basic understanding (and what is the practical use of possible difference in insight from such differential, anyway?). Likewise, I assume it is somewhat easier to gauge whether someone seems to able start delivering results or contributing to their on-going work quickly if they have the relevant technical skills and/or domain knowledge.


Entering the world of machine learning is quite the experience. And as any explorer is aware, a compass can occasionally be useful for determining whether you are traveling in the proper direction.

You should use this video's title as a compass even though it says "machine learning roadmap." Investigate it, follow your interest, pick up a new skill, and then put what you've learned to use when determining your next moves. Investigate it, follow your interest, pick up a new skill, and then put what you've learned to use when determining your next moves.

Video https://www.youtube.com/watch?v=pHiMN_gy9mk. Interactive Machine Learning Roadmap - https://dbourke.link/mlmapttps://www.charliewalks.com to read it.


My best advice is to find people who do what you want to do and try to learn as much as possible from them. If you're interested in doing ML/DL research I think the best way to get into the field is to reach out to professors. I studied ML/DL (books, projects, classes, reimplementing papers) for several years in undergrad, but discussing and debating ideas is the one thing that took my understanding to a much deeper level. A good professor will also point out gaps in your knowledge that you might be missing.

A second bit of advice: Programming (and execution) skills are IMO heavily undervalued by people looking to get into ML. The faster you can write code, debug, and implement new things, the easier it is to produce good research.

Some books I liked: PR & ML (Bishop), Deep Learning Book (Goodfellow), AI: A Modern Approach (Norving), Elements of Statistical Learning (Friedman)


> I mean building upto a stage to do ML/DL research. Applied research or core theory of ML/DL research.

The vast majority of people who do this have graduate degrees. I'm biased, but I think getting a graduate degree in the subject would be the default suggestion. Are you considering it?


There is the "high bias low-variance introduction to Machine Learning [for physicists]" -> https://news.ycombinator.com/item?id=17772211

Quotes from the comments:

> For all my hacker news peeps that wants to learn ML and/or DL, you need to drop everything right now, go print this on the office printer, and sit outside with coffee for the next two weeks and read through this entire thing. Turn off the computer and phone. Stop checking HN for two weeks. Trust me, nothing better than this will come around on HN anytime soon.

> The authors are wrong to label this book as useful only to people with a physics background, and in fact it will be useful for everyone who wants to learn modern ML.


Read “AI a modern approach“ and do the MIT and Stanford courses that are available in Youtube. Then you can go deeper into the branches presented in the book and courses. The problem I’m seeing now, ist that everybody seems to think AI/ML is NN. Nothing further from truth!


There was a thread a few days which has some recent comments on this topic,”How do you break into a career in machine learning? (2020)” [0]

0: https://news.ycombinator.com/item?id=32342925



If you really want to do research then no question you need to go to grad school for at minimum an MS. If you are still a student that means getting in touch with ML professors at your school and trying to get published before you graduate. Top ML programs in the US are extremely competitive and you likely don't stand a shot of getting in without a few NeurIPS/ICML/CVPR papers.

If you just want to work as an ML Engineer then take as many courses as you can on the subject before you graduate and get internships/apply to jobs. Nothing special here.


I've thought about going in this direction multiple times, but I have this impression that it's pretty much unavoidable that you'd be working with annoying ETL processes and "cleaning data" (parsing, transforming from one set of columns to another, etc) over and over again. Is this true? I hate that stuff, but I'm particularly interested in graphical data structures and I even like probability and statistics and linear algebra, so I've been torn.


It's an unavoidable part of the job, but it's definitely not most of the work.

I feel it's kind of like writing tests in SWE - something you do because it's beneficial even if it's not enjoyable.


There's so much resources it's hard to pick… If you're already studying CS and like to learn by building, I'd recommend to do that for ML too!

Pick a real problem, try to build a ML solution for it and while doing so keep a list of things you'd like to dig deeper into. Then go back to that list and pick one item to study, and iterate.

Happy to have a chat and give you specific pointers if you'd like (email in profile), I got my master in ML in 2016 and applied it in the industry since.


There is a early recording of Andrew Ng Stanford course before he became the star of the field.

It is mostly math behind ML.

There are also his lecture notes from 2014 I suppose.

It is really all one need to know. Machine learning or Deep learning is just pattern recognition which uses NNs as the basis for representation foe the "matcher".

The real problem of ML is to find someone who would pay you for this because we are in a bubble.

It is math and algorithms that matters and there are just a few of them. All the tooling could be mastered in a month.


Lots of good advice here, but I'd put Google's free course, "AI for all humans: A course to delight and inspire!" out there as a wonderful entree: https://cloud.google.com/blog/topics/developers-practitioner...


If you want a good theoretical foundation based on maths, Caltech's Learning From Data course is good: https://work.caltech.edu/telecourse

You need to be not afraid of doing proofs of theorems (most of them have to do with stats because machine learning is basically stats on steroids).


That's a multi-year process. Being proficient with TensorFlow and PyTorch is not sufficient to do useful research. I suggest you begin by implementing a neural network from scratch in your favorite programming language. Writing up the code for matrix multiplication, dot product, back propagation, etc. will teach you a lot.



I'd highly recommend reading this book, which is available free online: https://www.deeplearningbook.org/ . It covers the foundations up to 2016 very well, with useful references for understanding the underlying math/theory.


Learning how to do machine learning and learning how to clear data is a bit like asking what the best way to become an award winning author with very neat penmanship is.

One has nothing to do with the other.

That said data engineering at scale pays a lot more than deep learning but is also a lot less fun. Figure out which you'd rather do.


I recently asked reddit how to label images by downloading a pre trained network that used image net, I got no answers.

I don't know how long it would take to train such network with a cheap laptop.

There are tutorials, but I don't see any cookie cutter thing.

I thought there would be demos for this, since image labeling is an old problem.


FastAI has ready-to-run code that does just this. They seem to have an ImageNet package https://github.com/fastai/imagenette


this link seems to be a ML competition, not trained datasets.


If you want to do research, I'd recommend an academic path. Get an MS from a reputable university. If you can't do that, try to follow their syllabus on your own (textbooks, articles).

MOOC are a starting point, they often offer classes with little pre-requisites, but that'll only get you so far.


Describe the new calculus that leads to the design of multihead attention units, identify 3-4 other useful novel units using your pattern, propose hardware that can perform those operations with greater speed or efficiency.


imo the best way to learn is to jump into a project. if you want some basic intuition on neural networks, 3blue1brown has some great videos. for everything else, just google things and you'll find lots of resources. medium and towardsdatascience articles have saved my life so many times while working on ML/DL uni projetcs. if you're looking to play around with current research, use https://paperswithcode.com.


If you want to quickly review fundamental stats, you can read: ThinkStats book.

After that, I'd recommend: Statistical models: Theory & Pratice by Freedman.


A nice mix of application and background: https://pyimagesearch.com/


The Machine Learning course by Andrew Ng in Edx.

A basic ( and not at all! ) approach to know if it is funny for you


On a related note, to get into ML/DL you'll have to read a lot of arXiv papers. I built Smort.io to easily annotate and collaborate on arXiv papers.

Just add smort.io/ before any arXiv URL to read it in Smort.

Demo: https://smort.io/demo/home


try to join all these competition because it will keep you busy


It's probably a bit too late to get into ML. It's oversaturated with a lot of wannabe "Machine Learning enthusiasts" If you still want to get into the field a masters/phd is a much safer way to get proper ML jobs and then prosper in them.


Not sure why this is downvoted, perhaps because it's condescending. But I've observed the same thing. The data science job market is not that great right now; It's tough even if you have a master's degree.


Why would you possibly want to do such a thing towards yourself?


Fast.ai




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: