I have a paragraph of a certain topic.
"The legendary world of Pokémon first reached Australian and New
Zealand shores in 1998 with Pokémon Red Version and Pokémon Blue
Version for Game ...
I have generated BERT Word Embeddings for sentences for sentiment classification dataset and want to feed these word embeddings to the embedding layer of the Keras. Since Keras have a defauly ...
I need to implement a solution which can recognize pronouns associated with the noun in a sentence. Say I have an paragraph about a person, I wanna count how many times the person has been referenced (...
Below is the code, I am executing. The error occurs on the 3rd line (vectors.init_sims(True))
fname = get_tmpfile(path_to_embedding_file)
vectors = KeyedVectors.load(fname, mmap='r')
my target is training a span prediction model
which can predict the position in the BERT output sequence
my input's shape is (batch_size, max_sequence_len(512),embedding_size(768))
output's shape will ...
I'm working on interpreting code in the book Hands-On Machine Learning with Sci-Kit Learn, Keras & Tensorflow, but came across an issue with the code in the NLP section. The author changed some of ...
I have a set of documents stored in a list and a query in a string format. I need to search and rank these documents with respect to the query.
I tried to use TF_IDF.However, it does help because it ...
I am trying to finetune my own model. I am writing code in t5 trivia Colab example.
Here are my steps:
query the data from the bigquery and put them into pd.dataframe
turn pd.dateframe into tf.data....
I am trying to remove stopwords during an NLP pre-processing step. I use the remove_stopwords() function from gensim but would also like to add my own stopwords
# under this method, these custom ...
I have two lists, a and b. They look like this:
a = [
I have a dataset that has 100,000 records
data in this dataset are 2 columns
When I apply BOW of my model I get big list of features
That is fine, I managed to work with them
I'm new to Allennlp, and this is my first time trying it out. I already installed all the required libraries !pip install allennlp !pip install --pre allennlp-models and my code should be fine too, ...
I'm trying out document de-duplication on an NY-Times corpus that I've prepared very recently. It contains data related to financial fraud.
First, I convert the article snippets to a list of ...
A valid address is the one where parcels can be delivered(postally correct address)
The csv files have following columns:
orderid: unique order id for each order
ordercustomeraddress: address of the ...
I was trying to run src = Seq2SeqTextList.from_df(df, cols='fr').split_by_rand_pct(0.2).label_from_df(cols='en', label_cls=TextList) from course 7 (fasiai-nlp) in a terminal instead of a Jupyter ...
I have a set of 20 small document which talks about a particular kind of issue (training data). Now i want to identify those docs out of 10K documents, which are talking about the same issue.
For the ...
I'm testing Conv, GRU, LSTM and simple Dense and I don't get 70 to 80%
My network converges very fast and overfits in the first seasons, could it be the data?
Layer (type) Output Shape ...
I want to add a CNN layer with max-pooling before a Bi-LSTM layer for a sentiment classification task but I am getting an error.
Here is the code I am using.
model = Sequential()
Kindly suggest me techniques to generate sentences about a plot or a graph(example histogram) in python.
I am trying to make use of Natural language generation here.
For example I have a bar chart ...
I am getting an error when importing textHero
#import the texthero library
import texthero as hero
import pandas as pd
Error : AttributeError: module 'nltk' has no attribute 'data'
I have installed ...
I am getting the following error when I run a keras-bert model.
ValueError: Error when checking model input:the list of Numpy arrays
that you are passing to your model is not the size the model ...
I want to do NER (Named entity extraction) from a document. The document has actually transaction instructions and the entities I want to extract are either common ones (like BIC code or bank name) ...
I'm trying to write a Lucene filter which replaces terms like 'what's' with 'what is', 'can't' with 'cannot', etc.
In incrementToken() if the term is one of the strings I'm replacing, I calculate a ...
How can we calculate the per-word perplexity of the test data? I have written my own code for LDA using Gaussian Mixture model with Gibbs sampling and evaluate the test doc in the 12 topics (number of ...
One can theoretically download a stanza's model via Python as follows (mirror):
stanza.download('en') # This downloads the English models for the neural pipeline
However, the ...
One can download a stanza's model via Python as follows (mirror):
stanza.download('en') # This downloads the English models for the neural pipeline
How can I download a stanza's ...
What is the best approach, if there is one, for vectorizing non-word text data? For example, word text data might be the following sentence:
the cat walked over to the door
This sentence can be ...
I'm trying to translate input sequence approach into batch of documents using the feature_column API of Tensorflow 2. I took an code snippet from here, because I found intuitive the tf.feature_column....
I need help. I have a document 'X'with the titles from 500 papers. I have a folder 'Y'where the references of these papers are saved as 500 unique papers and the papers are converted to word.
how can ...
I am new in NLU and I am doing a project on document embedding. I want to fine-tune the doc2vec model in gensim on my small dataset to see if it can help for document clustering. I read the tutorial ...
In Natural Language Processing, what does it mean to annotate a corpus?
Does it simply mean to add labels to text (i.e. "positive, negative & neutral" classes in a sentiment analysis ...
I am trying to split units of text by their dependency trees (according to SpaCy). I have experimented with much of the docs provided by spacy, but I cannot figure out how to accomplish this task. To ...
I am relatively new to ML. The following is my problem statement:
I have data on company employees. Data is made up of skills the employees possess.
For example, a description could look like this:
I got error message when running the following command:
model_1 = rnn_model(input_dim=13,units=200,activation='relu')
Can't really figure out why am I getting this error
ValueError: Layer softmax was ...
I am trying to POS_TAG French using the Hugging Face Transformers library. In English I was able to do so given a sentence like e.g:
The weather is really great. So let us go for a walk.
the result ...
I'm trying to fine-tune the BERT model using just my text corpus and for this, I would use Masked Language Modeling, using simpletransformers library, using the link here. Now the use-case or ...
I have some data from CDC in csv format which includes geo-coordinates in a single text column as follows. The name of the (Pandas) DataFrame is 'geotest':
# My original data from CDC:
Like we have ["Siri", "Restaurant", "Ramesh"]
and generate any sentence containing these words.
Ex - Siri is with Ramesh at restaurant
I have worked with Spacy and so far, found very intuitative and robust in NLP.
I am trying to make out of text sentences search which is both ways word base as well as content type base search but so ...
I am doing some research related to fake news on twitter. I have isolated timelines of specific fake news stories in python, and I wanted to find out if there was a way to determine if each the text ...
I am currently working on a project where I want to classify some text. For that, I first had to annotate text data. I did it using a web tool and have now the corresponding json file (containing the ...
I want to use the BERT Word Vector Embeddings in the Embeddings layer of LSTM instead of the usual default embedding layer. Is there any way I can do it?
I'm trying to implement Siamese like transformer architecture. Similar work has been done in SentenceBERT paper. I'm facing an issue. To seperate hypothesis and premise, I modify this line from ...
I am implementing Neuralcoref by Huggingface for my specific problem. And instead of only replacing pronouns it also replaces nouns?
I have this text:
CanSat shall fit in a cylindrical envelope of ...
Given an article, I need to index the important keywords of it using the Entity extraction concept. I am using Spacy NER pipeline to train on my dataset and I created the training dataset in Spacy ...
I'm using the code below for evaluate the pos tagger that I trained in spacy 2.3.0 (built from source- last commit : 9860b8399ed2a3d1d680e1c1cd31d85926422709):
def evaluate(nlp, examples):
from sklearn_crfsuite import scorers
I am trying to use the sklearn's crfsuite but it is showing an error that no module named 'sklearn_crfsuite', also checked the ...
I'm trying to debug a model that uses 1D convolutions to classify text that was labeled by humans as being "appropriate" vs "not appropriate" to be posted on some website. Looking ...
I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to ...
First, apologies for being long-winded.
I'm not a mathematician, so I'm hoping there's a "dumbed down" solution to this. In short, I'm attempting to compare two bodies of text to generate ...