Skip to content

tf-idf Returning DTM with more rows than number of documents? #10631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fsfeng opened this issue Feb 14, 2018 · 2 comments
Closed

tf-idf Returning DTM with more rows than number of documents? #10631

fsfeng opened this issue Feb 14, 2018 · 2 comments

Comments

@fsfeng
Copy link

fsfeng commented Feb 14, 2018

I've encountered a strange issue where the number of documents that I'm feeding into tf-idf is 2306041, but when I apply np.shape to the output, the number of rows is 2306047. What could be the cause of the problem?

@fsfeng
Copy link
Author

fsfeng commented Feb 14, 2018

Not sure why but issue is resolved if I convert the dataframe column to list

@fsfeng fsfeng closed this as completed Feb 14, 2018
@jnothman
Copy link
Member

Could it be a sparse dataframe, and thus related to pandas-dev/pandas#14167?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants