@neeleshsethi9914

Perfect timing , had assignment given to me to fine tune model to classify research as cancer vs non cancer. Thank you very much.

@himanabdollahpouri6108

Thanks for the great video. It would have been also amazing to include a general classification model like Logistic regression and train it on the train set and see how its performance is for the test set. Very curious to see how much extra accuracy we got by fine-tuning an LLM vs training a classifier ourselves

@huzaifaimran9468

Amazing channel! Absolutely love it! Thank you for giving reasons of behavior of models! Really love the theory

@고순식-j7d

Waited for this video - this will be my weekend task - Many thanks !

@sumanchaudhary8757

Thank you very much professor🙏

@SkiFastNY

I just finished fine-tuning chapter, excellent content. When paddling short sentences shouldn't Dataset return attention_mask as well as part of X output. This is in order to not pay any attention to potential extra EOS padding tokens, for example: [buy,coin,EOS,EOS,EOS] with [1,1,1,0,0] as attention mask? One EOS token should be attended to, but should not rest of EOS tokens be excluded from consideration?

@marvinlomo5845

Hi Sebastian, thanks for this video. A question I have is, can we combine LLMs with traditional machine learning on tabular data? Because LLMs are poor with tabular data, can we augment an LLM with a traditional ML model to improve its performance on tabular data? I guess the LLM may enable dialogue with the ML model for explainability.

@0730pleomax

Kinda makes me wonder—should I go full fine-tuning on a small model, or just LoRA a large one? Any rule of thumb?

@hopelesssuprem1867

Well explained as always, Sebastian. I have a question: for study goals I've also implemented GPT and LoRA from scratch. When I started to train GPT on 5000 text symbols (for simplicity) for 3 iterations it took 3 minutes, but when I started to fine-tune this pretrained GPT with LoRA on the same data it also took 3 minutes. The most interesting thing that the time of training is always the same not depending what rank I set or how mady data I use if to compare with pretraining GPT process.

My question is: is this behaviour normal or not? I undersand that this is hard to say something clearly without looking at code where the problem is, but if to think logically when we reduce the num of trainable params this should lead to speeding up the training process. On the other hand, despite frozen weights doing forward and back prop we use the whole weigts matrix to update LoRA A and B matrices. I'm a litte confused. What do you think about it?

@moyoajayi2397

Nice work Sebastian! Around 38:41 when you discuss the “historic” mass length for SMS messages, I do not think it would be 120 exactly, right? 

The 120 max length is the maximum number of tokens and it’s unlikely that it’s equal to the max number of characters. But I think it would close but a little greater like 140-160.

@haijieding3634

Very detailed walkthrough. thank you so much for sharing it. One question on coding the label into 1 or 0: given the LLM can generate words, it it better to keep the spam/ham than turning it 1 or 0 which is not semantically meaningful anymore?

@sumedhnvda

thanks

@AlokSaboo

Can we extend this for multiple labels/topics?

@0730pleomax

Since when do we need an LLM to do classification instead of a simple BERT model? Lol. But fair enough—LLMs do generalize better across diverse inputs when properly tuned.