You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R's data.table library has a function fread that can take a shell command as input to construct a dataframe.
This is highly useful for reading data formats that require a small amount of wrangling (without creating additional files). There are many examples of these formats in bioinformatics (sam, bam, gff, etc.)
As far as I can tell there is no (straightforward) equivilent in pandas.
Is there interest in a pull request for an additional function to add this functionality?
Example solution
importioimportpandasimportsubprocessdefread_shell(command, shell=False, **kwargs):
""" Takes a shell command as a string and and reads the result into a Pandas DataFrame. Additional keyword arguments are passed through to pandas.read_csv. :param command: a shell command that returns tabular data :type command: str :param shell: passed to subprocess.Popen :type shell: bool :return: a pandas dataframe :rtype: :class:`pandas.dataframe` """proc=subprocess.Popen(command,
shell=shell,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
output, error=proc.communicate()
ifproc.returncode==0:
withio.StringIO(output.decode()) asbuffer:
returnpandas.read_csv(buffer, **kwargs)
else:
message= ("Shell command returned non-zero exit status: {0}\n\n""Command was:\n{1}\n\n""Standard error was:\n{2}")
raiseIOError(message.format(proc.returncode, command, error.decode()))
Expected usage
command="samtools view eaxample.bam | head | cut -f 1,2,3,4,5,6,7 -d '\t'"read_shell(command, shell=True, sep='\t', header=None) # note options passed to pandas.read_csv
The text was updated successfully, but these errors were encountered:
@timothymillar's code above solved my issue, and at first I agreed it may be out of scope to be included in pandas, but having used it with for a variety of different CLIs dozens of times over the past few weeks, I'd vote to add it to pandas.
Problem description
R's data.table library has a function
fread
that can take a shell command as input to construct a dataframe.This is highly useful for reading data formats that require a small amount of wrangling (without creating additional files). There are many examples of these formats in bioinformatics (sam, bam, gff, etc.)
As far as I can tell there is no (straightforward) equivilent in pandas.
Is there interest in a pull request for an additional function to add this functionality?
Example solution
Expected usage
The text was updated successfully, but these errors were encountered: